Conference "Data science for better decisions"

On Tuesday 17 December 2019, Statistics Flanders and CBS, Statistics Netherlands jointly organise this one-day international conference on how data science can contribute to better official statistics for decision making. In a fast-moving digital environment, how can new data sources and techniques help in supporting better decisions? What are the opportunities from the boom in the availability of data to inform policy decisions, and what are the risks in using new tools and techniques to handle big data sources?

Go to Practical information - Route - Registration

Programme

Presentation: Annelies Beck (VRT)

In the morning a number of keynote speakers will comment on the position of big data, data science and associated techniques (such as machine learning and artificial intelligence) in the broader data landscape, and discuss how digital developments affect policy making and the data sources required to support policy decisions.

09:00 Welcome with coffee
Auditorium, Brussels Department for the Environment, Tour & Taxis
10:00

Introduction
Roeland Beerten, chief statistician Statistics Flanders
Tjark Tjin-A-Tsoi, director-general CBS, Statistics Netherlands

10:20 Diane Coyle, Bennett Professor of Public Policy at the University of Cambridge and recipient of the Indigo Prize for innovation in economics
Statistics for the digital age
The digitalisation of the economy is posing a number of challenges for economic statistics. These range from classification and data collection challenges to the tracking of behaviour and business model changes or calculation of quality-adjusted price indices. Economic measurement needs to adapt to the structural changes if it is to paint a credible picture of the economy. What are the key issues and how should statistical offices respond in order to serve their users?
11:10 Kenneth Cukier, Senior Editor at The Economist and the host of its weekly podcast on technology
Copernicus meets Coca-Cola: What AI and big data mean for national statistics
Since the 1600s statistics have been used to keep track of state affairs. But as its importance and prominence has grown in recent decades, the methods have become polished yet brittle -- and fail to admit new techniques that can either do a better job, a different job, or a more timely job. In a stimulating and humorous talk, Kenneth Cukier, a senior editor at The Economist, will explain the importance of AI and big data relative to state statistics, and challenge practitioners to reexamine their mission and craft.
 
Lunch break
Afternoon
In the afternoon four parallel sessions will look more closely at different data science techniques and their application, as well as the conditions that support the efficient use of these techniques. The day will close with a panel discussion.
 
13:00-14:45 Parallel sessions

Session 1. Machine learning - Chair: Bart Buelens

Machine learning algorithms detect patterns in data and use these to predict missing data. Data can be missing because it was not collected or observed, or simply because the prediction is about the future. Machine learning algorithms do not aim to model underlying real-world systems explicitly, rather they employ computational techniques to achieve optimal predictive accuracy. Consequently, they are often described as black-box systems, lacking transparency. This session addresses issues related to decision making when machine learning is involved.

Bart Buelens, Senior Data Scientist, Flemish Institute for Technological Research (VITO), Belgium
Machine learning

A machine that learns by itself is often seen as a form of artificial intelligence. Nowadays, applications of machine learning are widespread: from recommender systems to credit card fraud detection and navigation apps. A bird's eye view of the field of machine learning is given, with an emphasis on applications where algorithmic results are used for decision making. Machine learning results are considered in terms of bias and variance, highlighting the importance of appropriate uncertainty quantification. The talk is illustrated throughout with successful as well as failed examples of machine learning for decision making.

Joep Burger, Team Methodology Heerlen, Statistics Netherlands
The use of machine learning in official statistics: two case studies

Driven by the increasing availability of big and complex data such as images and text, machine learning (ML) is becoming a popular addition to the statistician's toolbox. Two case studies on the use of ML in official statistics will be presented. In the first case study, we try to predict someone's propensity to move from a person's digital footprint in two decades of register data, comparing logistic regression with a random forest. In the second case study, we explore the possibilities to learn statistical information such as poverty from aerial or satellite images, using a convolutional neural network.

Chang Sun, Ph.D. Candidate at Maastricht University Institute of Data Science, Netherlands
Use a secure environment to analyze personal data from multiple sources in a privacy-preserving manner.

With the current development in data science domain such as machine learning and data mining technologies, an increasing amount of data are collected and analyzed by a variety of data parties respectively. However, there is a big drawback to train a machine learning model on a single data source. It might lead to incomplete or incorrect knowledge discovery which probably confuses or misleads the society. To tackle this problem, Chang Sun and her colleagues developed a secure infrastructure to analyze personal data from multiple sources in a privacy-preserving manner. As a use case, they applied the infrastructure at CBS and De Maastricht Studie to study how social-economic factors affect people with Diabetes. This infrastructure enables statistics offices to discover more potential social issues and make greater use of data by collaborating with other data sources.


Session 2. Natural Language Processing - Chair: Piet Daas

Natural language processing investigates how large amounts of data consisting of natural language can be processed and analyzed via computers. Some examples: data from social media are used to test the number of messages and the sentiment with regard to certain subjects. The usefulness of this sentiment analysis has, for example, been demonstrated in the context of consumer confidence. Web scraping, where data is extracted from websites, is used in several research areas, including in the context of job vacancy statistics.

Piet Daas, senior-methodologist and CBS big data specialist, professor by special appointment of Big Data in Official Statistics at Eindhoven University of Technology, Netherlands
Natural Language Processing

Converting text to a form that can be interpreted by machine has challenged researchers in various disciplines since the initiation of this field of research in the 1950s. In recent years more and more applications are becoming available that are used by many of us on a daily basis, such as Spam filters, search engines, and Siri/Alexa/Google assistant. In this presentation, the focus is on extracting information from text. First an overview is given on the ways by which this can be achieved. Subsequently, a number of examples are given to reveal how text can be (successfully) used in an official statistical context.

Martina Hahn, Head of Methodology and Innovation in Official Statistics, Eurostat
The Web Intelligence Hub – use and analysis of web scraped data for different statistical domains.

In the context of implementing the Trusted Smart Statistics paradigm, Eurostat, together with Cedefop, the European Agency for vocational training, and the ESS, will develop a Web Intelligence hub (WIH). The WIH aims at providing the ESS with the key building blocks for harvesting information from the web. The Hub will implement and maintain a portfolio of text processing and analytic services at various levels (e.g. text parsing, mining, classification, interpretation). It will build on developments of the web scraping projects of the ESSnets Big Data and on the Cedefop project, which uses online job advertisements to extract information on skills demand in Europe. Activities will initially focus on establishing a modular system for scraping and analysing online job advertisements and will be gradually extended to other information domains, such as information on enterprises or information relevant for ICT statistics.

Paul Keuren, Statistical researcher/ Software Engineer at Centraal Bureau voor de Statistiek, CBS
Adjusting text-analysis to source motive

Text sources/suppliers obtain textual data from multiple sources. For this presentation two separate sources (Chamber of Commerce data and webscrape) are considered and compared. Chamber of Commerce data is investigated further to help demonstrate what quick wins this data can deliver.


Session 3. Images and visualisation - Chair: Edwin de Jonge

This session involves two aspects of the use of images: the use of images as a data source, and visualisation of data to a wide audience. The basic data for data science can consist of images such as satellite images or images from Google street view, which poses new challenges. Furthermore there is the rapidly growing field of data visualisation where abstract information is being made available more efficiently than ever before. This session will give an overview of some projects where images are used as data, as well in data visualisation and dashboard displays to translate abstract data into user-friendly information.

Edwin de Jonge, statistical consultant, methodologist at Statistics Netherlands
Images and visualization

Chris Bonham, Senior Data Scientist at the Data Science Campus, Office for National Statistics, UK
Remote sensing and machine learning to identify vegetation in urban residential gardens

Given their environmental and emotional benefit identifying and understanding the features of urban green spaces is becoming of greater importance. Current approaches often assume residential gardens are almost exclusively covered by natural vegetation and do not take in to account urban areas such as steps, patios and paths. The Data Science Campus and Ordnance Survey (OS) have used remote sensing and machine learning techniques to improve upon the current approach used within the Office for National Statistics to identify the proportion of vegetation in UK residential gardens. A test library of labelled images was created by taking 100 images randomly sampled from Bristol and Cardiff and independently classified to provide a ground truth. Application of several algorithms to the labelled data indicated sensitivity to the presence of shadows. Consequently a neural network classifier was developed specifically to be insensitive to the effects of shadow. Results support the conclusion that a neural network can more accurately classify vegetation and is less susceptible to the effect of shadows when compared with the other algorithms. Additional information can be found at: https://datasciencecampus.ons.gov.uk/projects/green-spaces-in-residential-gardens/

Karim Douïeb, data scientist and data visualisation designer, co-founder of Jetpack.AI
Why official statistics are key to understand social issues?

This talk will illustrate how openly available socio-demographic data about Belgium have been crucial in the context of visual exploration of two studies. The first one is intended to bring awareness to the immigration situation in Brussels and to the challenges lying ahead. The second one is about a potential heath crisis related to the consumption of opioids in Belgium.


Session 4. Preconditions for effective data science deployment - Chair: Johan Van der Valk and Sofie De Broe

The usefulness of data science for supporting decisions doesn't just depend on statistical and technical standards. Several ethical and organisational issues determine important preconditions for delivering good data science. First of all, there are significant debates around the ethics and privacy dimensions of the growing data science field, to be taken into account when techniques are applied to real-life data. Secondly, developing data science methods for official statistics requires active collaboration between NSI's and international institutions such as the UN and Eurostat, given the global nature of many of the new data sources and policy decisions. Thirdly, providers of Big Data are often private companies. How to organise a sustainable relationship with them? Finally, partnerships are set up with universities and companies to optimise the use of these promising techniques for the development, production and quality improvement of official statistics. This is also a challenge.

Johan Van der Valk, coordinator cross-border statistics, and Sofie De Broe, Scientific Director of the Center for Big Data Statistics, both CBS, Statistics Netherlands
Preconditions for effective "data science" deployment.

This presentation elaborates on the non-methodological challenges for successful application of big data in official statistics. Applying big data in official statistics requires specific conditions that differ from the production of traditional statistics. Important elements are: questioning existing statistics, stimulating co-creation with external and international partners, allowing the development and implementation of new statistical products. To achieve sustainable results, collaboration with the outside world of other data producers, data providers and data users is essential. This requires a change of culture and attitude and a specific data ecosystem. We will present some examples to illustrate our views.

Jasmine Grimsley, Senior Data Scientist, Data Science campus, Office for National Statistics, UK
Ethical maintenance of AI systems.

With the increasing adoption of AI in all aspects of our life, it has become critically important to be confident that they perform in an ethical way over time. There are ethical frameworks in place to ensure a prototype is fair, unbiased and effective. This discussion will explore how AI systems have the potential to depart from an ethical ideal over time. This identifies a need for maintenance programs to ensure that over time AI tools performs in a safe, reliable, timely, and trustworthy. Issues explored will include accurate and unbiased performance evaluation and effective maintenance of systems as their working environment evolves. This may include changing societal values, populations, new and unforeseen kinds of data, and policy and other changes.

Marc Ponsen, PhD. in the area of Artificial Intelligence and data scientist, and Bob van de Berg, product developer, both CBS, Statistics Netherlands
Big data ontology enrichment for cross-border job placements and labour market statistics (CBS)

CBS, VDAB and UWV work together to create a cross-border ontology for the labour market, based on the already existing 'Competent' ontology developed by VDAB . This ontology will be enriched with cross-border skills and occupations that will be derived from millions of Dutch and Flemish vacancy texts. This ontology will be the basis to create new statistics on cross-border demand and supply on the Flemish and Dutch labour market.

15:10-16:15

Panel Discussion

Working with new and uncharted data sources requires different ways of working in statistical offices compared to the traditional way of producing statistics based on surveys or administrative data. The experts in the panel will discuss some of the new topics in this context, including the challenges and opportunities of these new data types and their associated methodologies, the different approaches needed to acquire data from external data producers rather than collecting the data in-house, communicating the strengths and limitations of new data science approaches to non-technical experts, and some of the ethical challenges surrounding a data-driven society.

Moderator: Philippe Van Impe, CEO, DigitYzer

Experts

  • Diego Kuonen, CEO Statoo Consulting, Professor of Data Science University of Geneva, Principal Scientific and Strategic Big Data and Data Science Advisor for the Directorate and the Board of Management of the Swiss Federal Statistical Office (FS0), Co-Author of FSO's Data Innovation  Strategy, Switzerland
  • Martina Hahn, Head of Methodology and Innovation in Official Statistics, Eurostat
  • Sofie De Broe, Scientific Director of the Center for Big Data Statistics, CBS, Statistics Netherlands
  • Roeland Beerten, Chief statistician Statistics Flanders, Belgium
16:15 Drinks reception


Practical information

When Tuesday 17 December 2019, from 10:00 a.m. to 16:15 p.m.
Where Brussels (Belgium), on the Tour & Taxis site (Havenlaan 86C, 1000 Brussel)
Language English
Who Everyone who has an interest in data, statistics and data science and how they can support decision-making.
Price

Registration is free.

Coffee, tea and water are available all day. We provide a simple lunch and conclude with a drinks reception. These are free, but we ask you to register for lunch and the reception


Route

Free shuttle service There is a shuttle service between the railway station Brussels North station and the site. Large buses are used during peak hours and small shuttle buses during off-peak hours. The shuttle service works according to the 'first come, first served' principle. During peak hours there may not be enough room for all those waiting at the bus.
Bus stop at Brussels North. If you leave the railway station via the main entrance, buses (with a "Tour & Taxis" sign) stop to the left of the stairs. This is at Bolivarplein at the side entrance of the Proximus building.
Bus stop on the site Tour&Taxis. There is a bus stop with a bus shelter at the entrance of the site between the Herman Teirlinck building and ‘het pakhuis’.
On foot On foot from the railway station Brussels North
On foot: 20 minutes
The recommended walking route from the railway station Brussels North runs via the ‘Willebroekkaai’. People who use blind guidance tiles can also follow this route.
Route on Google maps
By car You can plan your route in Google Maps

Parking nearby
Parking up-site (at 500 meters from Thurn & Taxis) – Willebroekkaai 35 – 1000 Brussels
WTC-parking (close to the railway station Brussels North) - Simon Bolivarlaan – 1000 Brussel


Registration
 

The maximum number of registrations has been reached. Registration is closed.
 

 

 

 

Jointly organised with

Logo CBS

Map
Nederlands