Sep 2022 - Summer School

Summer School and Workshop

Photo

location_on Stockholm

Dates: 11th September - 17th September 2022

Lead Institution: KTH

Location: Stockholm

Venue: SCANDIC FORESTA

Agenda


Time
Sunday 9/11
Monday 9/12
Tuesday 9/13
Wednesday 9/14
Thursday 9/15
Friday 9/16
Saturday 9/17
8:00
-
9:00
Arrival
Breakfast
8:45 Welcome and intro
Breakfast
Breakfast
Breakfast
Breakfast
Breakfast
9:00
-
10:00
Keynote 1:
Nikos Laoutaris, IMDEA, Spain 
Keynote 2: Ricardo Baeza-Yates, Northeastern University, USA 
Keynote 3:
Ingmar Weber, QCRI, Qatar
Keynote 4:
MIT, USA
WS Keynote:
Uppsala University,
Sweden
checkout
10:00
-
10:30
Break
Break
Break
Break
Break

10:30 - 12:30

Marie Curie Fellow Presentations


Tutorial on "Opinion Formation in Social Networks: Models and Computational Problems" (Part 2) by Stefan Neumann
Tutorial on "Data Democratisation with Deep Learning" (Part 2) by Georgia Koutrika
Tutorial on "Recommender Systems" (Part 2) by Flavian Vasile 
"Introduction to Conformal Prediction" by Lars Carlsson, Royal Holloway, University of London

12:30
-
14:00
Lunch
Lunch
Lunch
Lunch
Lunch

14:00 - 15:00
Tutorial on "Opinion Formation in Social Networks: Models and Computational Problems" (Part 1) by Aris Gionis et al. (KTH, Sweden)
Tutorial on "Data Democratisation with Deep Learning" (Part 1) by Georgia Koutrika (Athena Research Center, Greece)
Tutorial on "Recommender Systems" (Part 1) by Flavian Vasile, Criteo
Tutorial on "Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs" (Part 2) by Dimitris Sacharidis
Tutorial on " AI/Wearables/Predicting Sleep quality" (Part 2) by Joao Palotti et al.

15:00 - 15:30
Break
Break
Break
Break
Break

15:30 - 17:00
Marie Curie Fellow Poster Session
SOCIAL EVENT & Dinner in Stockholm
Tutorial on "Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs" (Part 1) by Dimitris Sacharidis (ULB, Belgium)
Tutorial on " AI/Wearables/Predicting Sleep quality" (Part 1) by Joao Palotti, (QCRI. Qatar)


17:00 - 18:00






19:30
-
20:30
Dinner
Dinner
Dinner
Dinner
Dinner



Keynote 1 - Nikolaos Laoutaris

Speaker Affiliation Talk email Mode
Nikolaos Laoutaris IMDEA Networks Institute The arguments and the vision for a [Personal] Data Internetwork (PDI)

Abstract
In this talk I will argue for the creation of PDI, an overlay network over the existing Internet for connecting data owners, data marketplaces, and data consumers in a fair, trustworthy, and democratic manner that copies the paradigm of the Internet. PDI is motivated on the one hand from the thirst of many AI/ML algorithms of B2B and B2C services for training data, and on the other, from the need to tackle privacy matter proactively, via explicit control, instead of reactively, by attempting to tame tracking and surveillance. PDI is aligned with a plethora of initiatives and activities across industry, regulation, and research funding, including Gaia-X, International Data Spaces, the European Data Act, Digital Europe and Horizon Europe. I will explain how PDI can answer existential challenges for the new data economy such as how to price data, how to buy data, how to protect ownership rights upon data, and how to address the fragmentation vs. monopoly tussle on the data marketplace scene. I will also explain why technologies such as data watermarking, information centric networks, trusted execution, and federated learning, can contribute in the realisation of PDI in practice.
Bio
Nikolaos Laoutaris is a research professor at IMDEA Networks Institute in Madrid, and director of its Data Transparency Group (DTG). Prior to that he was director of data science at Eurecat and chief scientist of the Data Transparency Lab which he co-founded in 2014 during his 10 year tenure as a researcher in Telefonica. Before that, Nikolaos was a postdoc fellow at Harvard University, a Marie Curie postdoc fellow at Boston University, and a PhD student in computer science at the University of Athens. His main research interests include privacy, transparency, data protection, economics of networks and information, intelligent transportation, distributed systems, protocols, and network measurements. More information at: https://networks.imdea.org/team/research-groups/data-transparency-group/ and http://laoutaris.info/.

 

Keynote 2 - Ricardo Baeza-Yates

Speaker Affiliation Talk email Mode
Ricardo Baeza-Yates Institute for Experiential AI Responsible AI

Abstract
In the first part we cover five current specific problems that motivate the needs of responsible AI: (1) discrimination (e.g., facial recognition, justice, sharing economy, language models); (2) phrenology (e.g., biometric based predictions); (3) unfair digital commerce (e.g., exposure and popularity bias); (4) stupid models (e.g., minimal adversarial AI) and (5) indiscriminate use of computing resources (e.g., large language models). These examples do have a personal bias but set the context for the second part where we address four challenges: (1) too many principles (e.g., principles vs. techniques), (2) cultural differences; (3) regulation and (4) our cognitive biases. We finish discussing what we can do to address these challenges in the near future to be able to develop responsible AI.
Bio
Ricardo Baeza-Yates is Director of Research at the Institute for Experiential AI of Northeastern University. Before, he was VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California, from 2006 to 2016. He is co-author of the best-seller Modern Information Retrieval textbook published by Addison-Wesley in 1999 and 2011 (2nd ed), that won the ASIST 2012 Book of the Year award. From 2002 to 2004 he was elected to the Board of Governors of the IEEE Computer Society and between 2012 and 2016 was elected for the ACM Council. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow, among other awards and distinctions. He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989, and his areas of expertise are web search and data mining, information retrieval, bias on AI, data science and algorithms in general.


Keynote 3 - Ingmar Weber

Speaker Affiliation Talk email Mode
Ingmar Weber Qatar Computing Research Institute (QCRI) Big Data for Big Challenges: Data Analysis for Sustainable Development and Humanitarian Crises

Abstract
The world is facing seemingly insurmountable challenges, ranging from the climate crisis, and extreme poverty, to ongoing wars and conflicts. Intergovernmental organizations, such as the United Nations, were created to help more effectively tackle challenges that require coordinated global action. However, such responses require insight to be both targeted and effective. Such insights can be hard to obtain – especially when relevant and trustworthy data is missing. To address existing data gaps, non-traditional data sources, such as social media advertising data and satellite imagery, have been used to (i) monitor displacement and migration, (ii) track digital gender gaps, and (iii) map wealth inequalities. For example, changes in the distribution of cars detected in satellite imagery are indicative of internal displacement in Ukraine. And gender gaps in the number of female vs. male users targetable for advertising are strong indicators of internet access gender gaps. In this talk, I’ll give an overview of our work with international partners on using such data both for operational and policy purposes.
Bio
Dr. Ingmar Weber is currently the Research Director for Social Computing at the Qatar Computing Research Institute (QCRI). Recently, he has been awarded an Alexander von Humboldt Professorship in AI, Germany’s most valuable research award, and he’ll be joining Saarland University in fall 2022. His interdisciplinary research looks at what non-traditional data sources can tell us about the offline world and society at large. Working closely with sociologists and demographers he has pioneered the use of online advertising data for complementing official statistics on international migration, digital gender gaps, and poverty. His work is regularly featured in UN reports, and analyses performed by his team have been used to improve operations by UN agencies and NGOs ranging from Colombia to the Philippines. Prior to joining QCRI, Ingmar was a researcher at Yahoo Research Barcelona. As an undergraduate, he studied mathematics at the University of Cambridge before pursuing a Ph.D. at the Max-Planck Institute for Computer Science.


Keynote 4 - Dean Eckles

Speaker Affiliation Talk email Mode
Dean Eckles
MIT

Long ties: Formation, social contagion, and economic outcomes



Abstract

Network structure can affect when, where, and how widely new ideas, products, and behaviors are adopted. Classic work in the social sciences has emphasized that "long ties" provide access to novel and advantageous information. In our empirical work, we show how particular life events (migration, education) are associated with forming long ties and how having long ties is associated with beneficial economic outcomes. Counties in the United States with more long ties (and more strong long ties) have higher incomes, lower unemployment, and more economic mobility, even after adjusting for other measures of social connections.

These stylized facts are consistent with some models of contagion. In widely-used models of biological contagion, interventions that randomly rewire edges (generally making them "longer") accelerate spread. However, there are other models relevant to social contagion, such as those motivated by myopic best-response in games with strategic complements, in which individuals adopt if and only if the number of adopting neighbors exceeds a threshold. Recent work has argued that highly clustered, rather than random, networks facilitate spread of these "complex contagions". Here we show that minor modifications to this model, which make it more realistic, reverse this result: we allow very rare below-threshold adoption, i.e., rarely adoption occurs when there is only one adopting neighbor. In a version of "small world" networks, allowing adoptions below threshold to occur with order 1/√n probability — even only along some "short" cycle edges — is enough to ensure that random rewiring accelerates spread. Hypothetical interventions that randomly rewire existing edges or add random edges (versus adding "short", triad-closing edges) in hundreds of empirical social networks reduce time to spread.

In summary, we provide an empirical and theoretical view of the outsized role of long ties in the spread of valuable information and behaviors, even when those behaviors spread via threshold-based contagions.

This is joint work based on two papers: one on threshold-based contagions with Elchanan Mossel, M. Amin Rahimian, Subhabrata Sen, and one on formation of long ties and economic outcomes with Eaman Jahani, Samuel Fraiberger, and Michael Bailey.

Bio

Dean Eckles is an Associate Professor of Marketing at MIT Sloan. His substantive research examines people’s interactions with and through communication technologies, especially how these technologies mediate, amplify, and direct social influence. This work sometimes requires or benefits from new analytical methods, so Eckles also works on applied statistics, design of field experiments, and causal inference. Prior to joining MIT, he was a scientist at Facebook, where he worked on many product areas and analytical methods, including News Feed, messaging, advertising, tools for randomized experiments, and survey methods. Eckles previously worked in research at Nokia and Yahoo. Eckles received his BA in philosophy, a BS and MS in cognitive science,  an MS in statistics, and a PhD in communication, all from Stanford University.


Keynote 5 - Thomas B. Schön

Speaker Affiliation Talk email Mode
Thomas B. Schön Uppsala University Formulating flexible probabilistic models

Abstract
One of the key lessons to take away from contemporary machine learning is that flexible models offer the best predictive performance. This has implications in many situations. In this lecture, I will try to make this concrete by looking at a few constructions that we are working with. I will start with a classification task from ECG interpretation and then continue to the more under-researched area of how to formulate and solve regression problems using deep learning. There are currently several different approaches used for deep regression and there is still room for innovation. I will illustrate this landscape in general and introduce our rather general deep regression method which has a clear probabilistic interpretation. We show good performance on several computer vision regression tasks, system identification problems and 3D object detection using laser data.
Bio
Thomas B. Schön is the Beijer Professor of Artificial Intelligence in the Department of Information Technology at Uppsala University. In 2018, he was elected to The Royal Swedish Academy of Engineering Sciences (IVA) and The Royal Society of Sciences at Uppsala. He received the Tage Erlander prize for natural sciences and technology in 2017 and the Arnberg prize in 2016, both awarded by the Royal Swedish Academy of Sciences (KVA). He was awarded the best PhD thesis award by The European Association for Signal Processing in 2013. He received the best teacher award at the Institute of Technology, Linköping University in 2009.


Aristides Gionis and Stefan Neumann

Speaker Affiliation Talk email Mode
Aristides Gionis KTH Royal Institute of Technology

Opinion Formation in Social Networks: Models and Computational Problems




Stefan Neumann KTH Royal Institute of Technology


Abstract
Social networks are widely used nowadays to engage in conversations about a variety of topics. Over time, these discussions can have a significant impact on people’s opinions. Works in sociology and other areas have provided mathematical models to simulate such opinion formation processes, and in the past decade it has become popular to consider the computational aspects of these models, motivated by the widespread use of online social networks. The goal is to obtain a better understanding of real-world phenomena, such as increasing polarization and filter bubbles. In this tutorial we aim to provide an overview of the opinion formation literature. We will present the most important opinion models that are studied in this context, and will discuss some of the computational challenges that have arisen recently. We will also reflect on emerging applications and directions for future work.
Bio
Aristides Gionis is a WASP professor in KTH Royal Institute of Technology and adjunct professor in Aalto University. He has been a fellow in the ISI Foundation, Turin, and a visiting professor in the University of Rome. His previous appointment was with Yahoo! Research, Barcelona. He obtained his PhD in 2003 from Stanford University. He is currently serving as an action editor in the Data Management and Knowledge Discovery journal (DMKD) and an associate editor in the ACM Transactions on the Web (TWEB). He has contributed in different areas of data science, such as algorithmic data analysis, web mining, social-media analysis, data clustering, and privacy-preserving data mining.


Stefan Neumann is a postdoc at KTH Royal Institute of Technology in the group of Aris Gionis. He is broadly interested in social network analysis and graph algorithms. Stefan received his Ph.D. from University Vienna under the supervision of Monika Henzinger. During that time, he also visited Eli Upfal at Brown University. His Ph.D. thesis won the Heinz Zemanek Award from the Austrian Computer Society and an Award of Excellence from the Austrian federal government. Stefan received his Master’s degree from Saarland University and Max Planck Institute for Informatics.


Georgia Koutrika

Speaker Affiliation Talk email Mode
Georgia Koutrika Athena Research Center Data Democratisation with Deep Learning: An Analysis of Text-to-SQL Systems

Abstract
The Web has democratized access to knowledge, and search engines have arguably played a paramount role in this by enabling users to search for information in web pages using keywords or simple natural language questions. However, a vast amount of data used in a wide range of tasks, from business operations, medical and scientific research to activities in our everyday lives, lives in relational databases. These remain inaccessible to most users that are not familiar with a low-level query language such as SQL for formulating their queries. During the past decades, there has been an increasing research focus on text-to-SQL systems that enable users to query data using natural language. The recent advances in deep neural networks along with the creation of two large datasets made for training text-to-SQL systems have led to an explosion of recent text-to-SQL efforts, shaping a very exciting and fast-paced research field. In this tutorial, we dive into the text-to- SQL field. First, we will introduce the text-to-SQL problem and explain its challenges. Then, we will discuss about available benchmarks and explain their advantages and shortcomings. We will zoom in on the recent advances of deep learning techniques for text-to-SQL translation. Finally, we will discuss open challenges and research opportunities.
Bio
Georgia Koutrika is Research Director at Athena Research Center in Greece. She has more than 15 years of experience in multiple roles at HP Labs, IBM Almaden, and Stanford. She has received a PhD and a diploma in Computer Science from the Department of Informatics and Telecommunications, University of Athens in Greece. Her work focuses on natural language data interfaces, data exploration, recommendations, and data analytics, and has been incorporated in commercial products, described in 14 granted patents and 26 patent applications in the US and worldwide, and published in more than 100 papers in top-tier conferences and journals.


Flavian Vasile

Speaker Affiliation Talk email Mode
Flavian Vasile Criteo AI Lab Technological Advances in Real-World Recommendation f.vasile@criteo.com
Abstract

This tutorial covers the main latest advances in recommendation systems, namely the recent focus in offline-online metrics alignment for Recommendation and the introduction of deep leaning techniques for user and item representation coupled with maximum inner-product search (MIPS) for fast retrieval. The tutorial assumes a good understanding of linear algebra and classical Machine Learning techniques and aims to bring insights to the audience on the real-world problems faced by recommendation systems practitioners. The tutorial will be divided into two parts, each containing a theoretical section and a practical session that illustrates the concepts with simple coding examples.

Bio

Flavian Vasile is part of the Criteo AI Lab where he works as Principal Scientist. His main focus being on the development of Deep Learning-based Recommendation Systems and on introducing aspects of Causal Inference to Recommendation. His current research interests include Deep Sequential Models for Recommendation and understanding Recommendation as a decision-making system with reward uncertainty. Among his recent research publications, the work on Causal Embeddings for Recommendation received the best paper award at RecSys ‘18 and he co-organized the ‘18–‘20 REVEAL Workshop series in conjunction with ACM RecSys.


Dimitris Sacharidis

Speaker Affiliation Talk email Mode
Dimitris Sacharidis Athena Research Center Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

Abstract
Nowadays, the vast amount of existing published research works creates major obstacles regarding the traditional knowledge discovery required by common research processes and other relevant tasks. However, at the same time, the increased popularity of the Open Science movement, makes large amounts of scholarly metadata available through open Scholarly Knowledge Graphs (SKGs), paving the way for reliable research impact assessment processes, that can alleviate the aforementioned issue. The main objective of the tutorial is to inform and educate the audience about the opportunities and challenges that open SKGs create in the field of research impact assessment, presenting the respective state-of-the-art and highlighting common pitfalls.
Bio
Dimitris Sacharidis is a Marie Curie Postdoctoral Fellow at the Information Management Systems Institute (IMSI) and at the Hong Kong University of Science and Technology (HKUST). He received his BSc in Electrical and Computer Engineering from the National Technical University of Athens (NTUA) in 2001, and his MSc in Computer Science from the University of Southern California (USC) in 2004. In 2008 he obtained the PhD degree in Electrical and Computer Engineering from NTUA under the supervision of Prof. T. Sellis. During 2008-2009, he was a Visiting Lecturer at the Department of Computer Science and Technology in the University of Peloponnese. His research interests include data streams, privacy, security, preferences and ranking in data management applications.


 Lars Carlsson

Speaker Affiliation Talk email Mode
Lars Carlsson Royal Holloway University Introduction to Conformal Prediction

Abstract

How good is your prediction? In risk-sensitive applications, it is crucial to be able to assess the quality of a prediction, however, traditional classification and regression models don't provide their users with any information regarding prediction trustworthiness. In contrast, conformal classification and regression models associate each of their multi-valued predictions with a measure of statistically valid confidence, and let their users specify a maximal threshold of the model's error rate - the price to be paid is that predictions made with a higher confidence cover a larger area of the possible output space. This tutorial aims to provide its attendees with the knowledge necessary to implement conformal prediction in their daily data science work, be it research or practice oriented, as well as highlight current research topics on the subject.

Since its development the framework has been combined with many popular techniques, such as Support Vector Machines, k-Nearest Neighbours, Neural Networks, Ridge Regression etc., and has been successfully applied to many challenging real world problems, such as the early detection of ovarian cancer, the classification of leukaemia subtypes, the diagnosis of acute abdominal pain, the assessment of stroke risk, the recognition of hypoxia in electroencephalograms (EEGs), the prediction of plant promoters, the prediction of network traffic demand, the estimation of effort for software projects and the back calculation of non-linear pavement layer moduli. The framework has also been extended to additional problem settings such as semi-supervised learning, anomaly detection, feature selection, outlier detection, change detection in streams and active learning. The aim of this symposium is to serve as a forum for the presentation of new and ongoing work and the exchange of ideas between researchers on any aspect of Conformal Prediction and its applications.


Bio

Lars is mainly engaged in two newly started companies, Universal Prediction AB, involved in consulting services around AI and advanced analytics, and  RTHS AB, which is developing a product to measure various properties related to blood pulse waves e.g. blood pressure. Part of the product is heavily reliant on machine-learning methods. Other than this, Lars is also a visiting professor in computer science at Royal Holloway. In the period 2018-2021, Lars held a position as Head of AI within Stena Line and was leading all advanced initiatives within the Stena conglomerate.  Before that, Lars spent almost 15 years in AstraZeneca and participated both in strategic, scientific and management initiatives. Most of his efforts were spent on developing scientifically sound methods, such as conformal prediction, within the drug discovery phases. Lars holds a PhD in Naval Architecture and Ocean Engineering and Scientific Computing and he is a graduate of the Swedish National Graduate School in Scientific Computing. He also did part of his PhD at Lawrence Livermore National Laboratory. The main research objective was to enable fast solutions of the incompressible Navier-Stokes equations on massively parallel computers. Lars has authored or co-authored more than 60 peer reviewed articles. He has also been contributing to book chapters and done editorial and chair work for COPA.



Joao Palotti


Speaker Affiliation Talk email Mode
Joao Palotti
Qatar Computing Research Institute (QCRI)
 Practical data processing in Python: a use case of sleep staging with wearable devices


Abstract

With the rise of wearables, we now have smart devices that can seamlessly collect an extensive range of physiological measures. The availability of these new data sources brings opportunities in several areas, such as sleep medicine. This talk aims to introduce the students to some of the latest developments in sleep staging classification while learning practical Python data processing tips.

Bio

Joao Palotti obtained his Ph.D. in Information Retrieval at TU Wien. His thesis was part of the European projects Khresmoi and Kconnect, in which he worked on learning-to-rank methods to address the knowledge gap between medical documents and human expertise. He later worked as a post-doc at MIT on digital health,  as a visiting professor at CMU Qatar, and as an applied scientist at QCRI. In addition, he develops several open-source projects, such as TrecTools, pySocialWatcher and HypnosPy, aiming to either facilitate the life of fellow researchers or bring research insights into practice. After working as a data scientist consultant for several companies, such as the World Bank, Joao joined the ML team at Earkick, a swiss start-up on mental health.