Session 1. Machine learning
Machine learning algorithms detect patterns in data and use these to predict missing data. Data can be missing because it was not collected or observed, or simply because the prediction is about the future. Machine learning algorithms do not aim to model underlying real-world systems explicitly, rather they employ computational techniques to achieve optimal predictive accuracy. Consequently, they are often described as black-box systems, lacking transparency. This session addresses issues related to decision making when machine learning is involved.
Session 2. Natural Language Processing
Natural language processing investigates how large amounts of data consisting of natural language can be processed and analyzed via computers. Some examples: data from social media are used to test the number of messages and the sentiment with regard to certain subjects. The usefulness of this sentiment analysis has, for example, been demonstrated in the context of consumer confidence. Web scraping, where data is extracted from websites, is used in several research areas, including in the context of job vacancy statistics.
Session 3. Images and visualisation
This session involves two aspects of the use of images: the use of images as a data source, and visualisation of data to a wide audience. The basic data for data science can consist of images such as satellite images or images from Google street view, which poses new challenges. Furthermore there is the rapidly growing field of data visualisation where abstract information is being made available more efficiently than ever before. This session will give an overview of some projects where images are used as data, as well in data visualisation and dashboard displays to translate abstract data into user-friendly information.
Session 4. Preconditions for effective data science deployment
The usefulness of data science for supporting decisions doesn't just depend on statistical and technical standards. Several ethical and organisational issues determine important preconditions for delivering good data science. First of all, there are significant debates around the ethics and privacy dimensions of the growing data science field, to be taken into account when techniques are applied to real-life data. Secondly, developing data science methods for official statistics requires active collaboration between NSI's and international institutions such as the UN and Eurostat, given the global nature of many of the new data sources and policy decisions. Thirdly, providers of Big Data are often private companies. How to organise a sustainable relationship with them? Finally, partnerships are set up with universities and companies to optimise the use of these promising techniques for the development, production and quality improvement of official statistics. This is also a challenge.