Symbolic data analysis (SDA) is an emerging technique for the analysis of large and complex datasets where individual level data are summarised into group-based distributional summaries (symbols) such as random rectangles or histograms.Likelihood-based methods have been recently developed, allowing to fit models for the underlying data while only observing distributional summaries. However, while powerful, when working with random histograms this approach rapidly becomes computationally intractable as the dimension of the underlying data increases. In this talk we first introduce a composite likelihood setting for the analysis of random histograms in K dimensions using lower-dimensional marginal histograms. We apply this approach to bypass the well known computational issues in the analysis of spatial extremes over large number of spatial locations, and show large computational savings compared to existing model fitting procedures. Logistic regression models are a popular method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. The second part of the talk focuses on summarising a collection of predictor variables into histograms in order to perform inference. Based on composite likelihoods, we derive an efficient one-versus rest approximate composite likelihood model for histogram-value random variables. We demonstrate that this procedure can achieve comparable classification rates than state-of-the-art subsampling algorithms for logistic regression.
Recording of the seminar: