Robust Functional Data Analysis

Supported by the National Science Foundation. Grant DMS 0604396.


Project Abstract
The analysis of samples of curves is a field of growing importance in Statistics. Samples of curves arise in longitudinal studies, where random processes are observed on groups of individuals. Often some of these curves are atypical compared with the rest of the sample, due either to individual peculiarities or to measurement errors. The most common techniques for functional data analysis are very sensitive to outlying curves, which may lead to invalid statistical inference. Outlier-resistant multivariate techniques are, in most cases, not directly applicable to functional data, where the number of observations per curve is usually larger than the sample size. Therefore, the investigator's goal is to develop robust methods for functional data analysis that provide valid statistical inference even in presence of a significant proportion of outlying curves. In particular, outlier-resistant estimators for the mean and the variance components are proposed and studied. The properties of these estimators (such as consistency, asymptotic distribution and breakdown point) are studied theoretically and empirically, the latter by simulation and analysis of real datasets. Algorithms and computer software implementing these methods are being developed.
Examples of functional data are human growth curves, gene expression profiles, and daily weather and environmental indicators (such as precipitation, temperature, pressure, pollution level), to mention just a few. Thus, detection of atypical growth curves can provide new insights into the effect of diseases or other unusual circumstances on human growth, and detection of unusual gene expression profiles can help understand the genetic causes of abnormal biological processes or diseases. These examples illustrate the potential for application of the methods being developed by the investigator to areas beyond Statistics, such as public health and environmental sciences.

Publications

Computer programs

Spatial median and spherical PCs

The following Matlab functions compute the spatial median and the spherical principal components (see Gervini 2008a).
Spatial median: SpMed , Spherical PCs: SpPC , external function used: gsj

Semiparametric t models

The following Matlab functions compute semiparametric estimators for the mean and principal components of curves observed on sparse, irregular grids (of course, they also work for regular grids). For details, see Gervini (2008b).
The programs for Normal-based models (non-robust) are EMnormal0 (mean-only model) and EMnormal (mean+PC model).
The respective programs for t-based models (robust) are EMt0 and EMt
External function used: B-spline basis, bspl


Note: This is research in progress. More publications and related Matlab programs will be added periodically.

Last updated:  6 Mar 2008, 12:00 hs

Back