HomeSeminar


XXXV INTERNATIONAL SEMINAR ON STATISTICS: Quality of Multisource Statistics

Dates: 13-14 March 2024

Type of course: On-site / Online

Place: Basque Government Central Office, Lakua, Euskaldun Berria conference hall

            c/ Donostia-San Sebastián, 1; 01010 Vitoria-Gasteiz

Timetable: 9 a 14h.

Langauge: English (translation into Spanish available)

Duration: 10 h.

This seminar will be held on-site, but we offer the opportunity to participate online. Please mark your preference on the registration form.

Seminar description

Many statistical offices, especially in Europe, are moving from single-source statistics to multi-source statistics. By combining data sources, statistical offices can produce more detailed and more timely statistics and respond more quickly to events in society. By combining survey data with already available administrative data and big data, statistical offices can save data collection and processing costs and reduce the burden on respondents. However, multi-source statistics come with new problems that need to be overcome before the resulting output quality is sufficiently high and before those statistics can be produced efficiently. In this seminar we will introduce and discuss some techniques for dealing with multisource statistics that have been developed and recent years, and – in particular – approaches that can be used to assess the quality of the resulting estimates.

Objectives

The seminar aims to address three objectives:

1. Provide an overview of problems and situations that are encountered when trying to produce multisource statistics.

2. Discuss methods that can be used when producing multisource statistics.

3. Explain approaches that can be used to assess the accuracy of estimates obtained by these methods.

The applications of the discussed methods and approaches will be carried out using R and R Studio. For some applications some R packages need to be installed, such as poLCA, plyr and dplyr.

Optionally, attendees may bring their own laptops or devices to follow the practices.

Target audience

  • Statisticians
  • Information and technology professionals
  • Data science professionals
  • Students and researchers of public and private sectors

Wednesday 13 March from 9:00h to 14:00h

1. Introduction to (measuring quality of) multisource statistics

Multisource statistics vs. single source statistics. Data configurations in multisource statistics. Kinds of errors that can occur for such statistics.

2. The bootstrap

Description of the bootstrap technique to measure the accuracy of statistics (in particular, variance). The bootstrap plays an essential role in most of the approaches that we will discuss during the course.

3. Correcting for (item) non-response error

Item non-response. Estimation of variance due to sampling and imputation. Multiple Imputation (MI), as an alternative approach to measure variance due to sampling and imputation.

Thursday 14 March from 9:00h to 14:00h

4. Using Latent Class Analysis for correcting for measurement error

Correction of values for (categorical) target variables present in multiple data sources. Latent Class Analysis (LCA) to correct measurement errors. MILC: a combination of LCA with MI to estimate the variance of the corrected data.

5. Correcting for selection error

A pseudo-weights approach to estimate the selection error when using a non-probability sample. Using bootstrap to estimate the variance of the resulting estimates.

6. Correcting for under-coverage error

The capture/recapture technique to estimate the size of (sub) populations. Variance estimation of the population size estimates. Example: estimation of the number of homeless people in the Netherlands.

References: 

  • Agafitei, M., F. Gras, W. Kloek, F. Reis and S. Váju (2015), Measuring Output Quality for Multisource Statistics in Official Statistics: Some Directions. Statistical Journal of the IAOS 31, pp. 203–211. (Introduction/Bootstrap)
  • Bishop, Y.M.M., S.E. Fienberg and P.W. Holland (1975), Discrete Multivariate Analysis. MIT press. (Under-coverage)
  • Boeschoten, L., D. Oberski and T. de Waal (2017), Estimating Classification Errors under Edit Restrictions in Composite Survey-Register Data using Multiple Imputation Latent Class Modelling (MILC). Journal of Official Statistics 33, pp. 921–962. (Latent Class Analysis)
  • De Waal, T., A. van Delden and S. Scholtus (2020), Multisource Statistics: Basic Situations and Methods. International Statistical Review 88, pp. 203–228. (Introduction)
  • Efron, B. and R.J. Tibshirani. 1993. An Introduction to the Bootstrap. London: Chapman & Hall/CRC. (Bootstrap)
  • Elliott, M.R. and R. Valliant (2017), Inference for Nonprobability Samples. Statistical Science 32(2), pp. 249-264. (Selection error)
  • Liu, A.-C., S. Scholtus and T. de Waal (2023), Correcting Selection Bias in Big Data by Pseudo Weighting. Journal of Survey Statistics and Methodology 11, pp. 1181–1203. (Selection error)
  • Van Delden, A., S. Scholtus and J. Burger (2016), Accuracy of Mixed-Source Statistics as Affected by Classification Errors. Journal of Official Statistics 32, pp. 619–642. (Bootstrap for classification error)

Arnout van Delden

Arnout van Delden studied crop protection at Wageningen University. In 2001 he obtained his PhD degree making use of various simulation studies and statistical analyses. Since 2001 he works at Statistics Netherlands where he currently is a senior methodologist. He has been doing research on the use of administrative data and since 2016 he started working on methods for data integration. He works on topics like linkage of sources with different unit types, statistical matching, quantification of measurement errors, use of machine learning - such as text mining- in official statistics, and measuring of input, processing and output quality estimation for administrative data and multisource statistics. He contributed to various European projects and is co-editor of a recent (2023) book on business statistics.

Ton de Waal

Ton de Waal studied mathematics at Leiden University and Eindhoven University of Technology. In 1993, he started to work at Statistics Netherlands, where he currently is senior methodologist. He obtained his PhD degree in 2003. Since 2014, Ton is also professor in Data Integration at Tilburg University. He is co-author of two books on statistical disclosure control and one book on statistical data editing and imputation. His current fields of interest include imputation of missing data, correction for measurement error, correction for selection error, correction for linkage error, combining estimates for probability and nonprobability samples, statistical matching, and measuring quality of multisource statistics.

Type of course: On-site / Online

Registration period: from 5 February to 4 March

Ordinary fee: 133,60€

Reduced fee (*): 43,67€

(*) for university students and unemployed graduates (documentation will be required to prove this).


Click on the following button to register: Registration


Your feedback.  Help us to make our web better

How would you rate the information on the site?
Very useful
Useful
Barely useful
Not useful at all
Would you like to make a suggestion?
Yes, I would
Send