Name: Advances in data analysis using aggregated data
Start: 2024-12-16T17:15:00Z
Location: University College, London

Advances in data analysis using aggregated data

Dec 16, 2024·

Boris Béranger

· 1 min read

Slides

Abstract

The necessity for faster and more efficient statistical modelling techniques has been motivated by the rise of big and complex data. For example, the huge volume of internet data collected on a daily basis implies that simple statistical models cannot be fitted on a regular computer, and sometimes even be stored. One strategy is to reduce the amount of data by aggregating it into summaries and perform an analysis on the summaries themselves. For a general aggregated function, we propose a likelihood-based approach to fit statistical models defined at the underlying data level. Theoretical guarantees about those maximum likelihood estimators are established, including consistency results for generic continuous aggregation functions. We then dive into the important, yet (almost) unexplored, topic of summary design. Focusing on the family of (univariate) random bin histogram aggregation functions and develop methodology to provide some answers to the burning question: how many bins do we need and where to place them? Some simulation experiments are provided to illustrate the insights drawn from the methodology.

Date

Dec 16, 2024

Event

CMStatistics2024

Location

University College, London

Invited talk in the session Recent advances in symbolic data analysis, organised by Dr Andrej Srakar and Dr Yaser Samadi. Other presenters: Lynne Billard, Yaser Samadi and Abdolnasser Sadeghkhani.

Last updated on Aug 12, 2025