Distributional data analysis

Distributional data analysis is a branch of nonparametric statistics that is related to functional data analysis. It is concerned with random objects that are probability distributions, i.e., the statistical analysis of samples of random distributions where each atom of a sample is a distribution. One of the main challenges in distributional data analysis is that the space of probability distributions is, while a convex space, is not a vector space.

Notation edit

Let   be a probability measure on  , where   with  . The probability measure   can be equivalently characterized as cumulative distribution function   or probability density function   if it exists. For univariate distributions with  , quantile function   can also be used.

Let   be a space of distributions   and let   be a metric on   so that   forms a metric space. There are various metrics available for  .[1] For example, suppose  , and let   and   be the density functions of   and  , respectively. The Fisher-Rao metric is defined as

 

For univariate distributions, let   and   be the quantile functions of   and  . Denote the  -Wasserstein space as  , which is the space of distributions with finite  -th moments. Then, for  , the  -Wasserstein metric is defined as

 

Mean and variance edit

For a probability measure  , consider a random process   such that  . One way to define mean and variance of   is to introduce the Fréchet mean and the Fréchet variance. With respect to the metric   on  , the Fréchet mean  , also known as the barycenter, and the Fréchet variance   are defined as[2]

 

A widely used example is the Wasserstein-Fréchet mean, or simply the Wasserstein mean, which is the Fréchet mean with the  -Wasserstein metric  .[3] For  , let   be the quantile functions of   and  , respectively. The Wasserstein mean and Wasserstein variance is defined as

 

Modes of variation edit

Modes of variation are useful concepts in depicting the variation of data around the mean function. Based on the Karhunen-Loève representation, modes of variation show the contribution of each eigenfunction to the mean.

Functional principal component analysis edit

Functional principal component analysis(FPCA) can be directly applied to the probability density functions.[4] Consider a distribution process   and let   be the density function of  . Let the mean density function as   and the covariance function as   with orthonormal eigenfunctions   and eigenvalues  .

By the Karhunen-Loève theorem,  , where principal components  . The  th mode of variation is defined as   with some constant  , such as 2 or 3.

Transformation FPCA edit

Assume the probability density functions   exist, and let   be the space of density functions. Transformation approaches introduce a continuous and invertible transformation  , where   is a Hilbert space of functions. For instance, the log quantile density transformation or the centered log ratio transformation are popular choices.[5][6]

For  , let  , the transformed functional variable. The mean function   and the covariance function   are defined accordingly, and let   be the eigenpairs of  . The Karhunen-Loève decomposition gives  , where  . Then, the  th transformation mode of variation is defined as[7]  

Log FPCA and Wasserstein Geodesic PCA edit

Endowed with metrics such as the Wasserstein metric   or the Fisher-Rao metric  , we can employ the (pseudo) Riemannian structure of  . Denote the tangent space at the Fréchet mean   as  , and define the logarithm and exponential maps   and  . Let   be the projected density onto the tangent space,  .

In Log FPCA, FPCA is performed to   and then projected back to   using the exponential map.[8] Therefore, with  , the  th Log FPCA mode of variation is defined as  

As a special case, consider  -Wasserstein space  , a random distribution  , and a subset  . Let   and  . Let   be the metric space of nonempty, closed subsets of  , endowed with Hausdorff distance, and define   Let the reference measure   be the Wasserstein mean  . Then, a principal geodesic subspace (PGS) of dimension   with respect to   is a set  .[9][10]

Note that the tangent space   is a subspace of  , the Hilbert space of  -square-integrable functions. Obtaining the PGS is equivalent to performing PCA in   under constraints to lie in the convex and closed subset.[10] Therefore, a simple approximation of the Wasserstein Geodesic PCA is the Log FPCA by relaxing the geodesicity constraint, while alternative techniques are suggested.[9][10]

Distributional regression edit

Fréchet regression edit

Fréchet regression is a generalization of regression with responses taking values in a metric space and Euclidean predictors.[11][12] Using the Wasserstein metric  , Fréchet regression models can be applied to distributional objects. The global Wasserstein-Fréchet regression model is defined as

 
(1)

which generalizes the standard linear regression.

For the local Wasserstein-Fréchet regression, consider a scalar predictor   and introduce a smoothing kernel  . The local Fréchet regression model, which generalizes the local linear regression model, is defined as

 
where  ,   and  .

Transformation based approaches edit

Consider the response variable   to be probability distributions. With the space of density functions   and a Hilbert space of functions  , consider continuous and invertible transformations  . Examples of transformations include log hazard transformation, log quantile density transformation, or centered log-ratio transformation. Linear methods such as functional linear models are applied to the transformed variables. The fitted models are interpreted back in the original density space   using the inverse transformation.[12]

Random object approaches edit

In Wasserstein regression, both predictors   and responses   can be distributional objects. Let   and   be the Wasserstein mean of   and  , respectively. The Wasserstein regression model is defined as

 
with a linear regression operator
 
Estimation of the regression operator is based on empirical estimators obtained from samples.[13] Also, the Fisher-Rao metric   can be used in a similar fashion.[12][14]

Hypothesis testing edit

Wasserstein F-test edit

Wasserstein  -test has been proposed to test for the effects of the predictors in the Fréchet regression framework with the Wasserstein metric.[15] Consider Euclidean predictors   and distributional responses  . Denote the Wasserstein mean of   as  , and the sample Wasserstein mean as  . Consider the global Wasserstein-Fréchet regression model   defined in (1), which is the conditional Wasserstein mean given  . The estimator of  ,   is obtained by minimizing the empirical version of the criterion.

Let  ,  ,  ,  ,  ,  ,  ,  , and   denote the cumulative distribution, quantile, and density functions of  ,  , and  , respectively. For a pair  , define   be the optimal transport map from   to  . Also, define  , the optimal transport map from   to  . Finally, define the covariance kernel   and by the Mercer decomposition,  .

If there are no regression effects, the conditional Wasserstein mean would equal the Wasserstein mean. That is, hypotheses for the test of no effects are

 
To test for these hypotheses, the proposed global Wasserstein  -statistic and its asymptotic distribution are
 
where  .[15] An extension to hypothesis testing for partial regression effects, and alternative testing approximations using the Satterthwaite's approximation or a bootstrap approach are proposed.[15]

Tests for the intrinsic mean edit

The Hilbert sphere   is defined as  , where   is a separable infinite-dimensional Hilbert space with inner product   and norm  . Consider the space of square root densities  . Then with the Fisher-Rao metric   on  ,   is the positive orthant of the Hilbert sphere   with  .

Let a chart   as a smooth homeomorphism that maps   onto an open subset   of a separable Hilbert space   for coordinates. For example,   can be the logarithm map.[14]

Consider a random element   equipped with the Fisher-Rao metric, and write its Fréchet mean as  . Let the empirical estimator of   using   samples as  . Then central limit theorem for   and   holds:  , where   is a Gaussian random element in   with mean 0 and covariance operator  . Let the eigenvalue-eigenfunction pairs of   and the estimated covariance operator   as   and  , respectively.

Consider one-sample hypothesis testing

 
with  . Denote   and   as the norm and inner product in  . The test statistics and their limiting distributions are
 
where  . The actual testing procedure can be done by employing the limiting distributions with Monte Carlo simulations, or bootstrap tests are possible. An extension to the two-sample test and paired test are also proposed.[14]

Distributional time series edit

Autoregressive (AR) models for distributional time series are constructed by defining stationarity and utilizing the notion of difference between distributions using   and  .

In Wasserstein autoregressive model (WAR), consider a stationary density time series   with Wasserstein mean  .[16] Denote the difference between   and   using the logarithm map,  , where   is the optimal transport from   to   in which   and   are the cdf of   and  . An   model on the tangent space   is defined as   for   with the autoregressive parameter   and mean zero random i.i.d. innovations  . Under proper conditions,   with densities   and  . Accordingly,  , with a natural extension to order  , is defined as

 

On the other hand, the spherical autoregressive model (SAR) considers the Fisher-Rao metric.[17] Following the settings of ##Tests for the intrinsic mean, let   with Fréchet mean  . Let  , which is the geodesic distance between   and  . Define a rotation operator   that rotates   to  . The spherical difference between   and   is represented as  . Assume that   is a stationary sequence with the Fréchet mean  , then   is defined as

 
where   and mean zero random i.i.d innovations  . An alternative model, the differenced based spherical autoregressive (DSAR) model is defined with  , with natural extensions to order  . A similar extension to the Wasserstein space was introduced.[18]

References edit

  1. ^ Deza, M.M.; Deza, E. (2013). Encyclopedia of distances. Springer.
  2. ^ Fréchet, M. (1948). "Les éléments aléatoires de nature quelconque dans un espace distancié". Annales de l'Institut Henri Poincaré. 10 (4): 215–310.
  3. ^ Agueh, A.; Carlier, G. (2011). "Barycenters in the {Wasserstein} space" (PDF). SIAM Journal on Mathematical Analysis. 43 (2): 904–924. doi:10.1137/100805741. S2CID 8592977.
  4. ^ Kneip, A.; Utikal, K.J. (2001). "Inference for density families using functional principal component analysis". Journal of the American Statistical Association. 96 (454): 519–532. doi:10.1198/016214501753168235. S2CID 123524014.
  5. ^ Petersen, A.; Müller, H.-G. (2016). "Functional data analysis for density functions by transformation to a Hilbert space". Annals of Statistics. 44 (1): 183–218. arXiv:1601.02869. doi:10.1214/15-AOS1363.
  6. ^ van den Boogaart, K.G.; Egozcue, J.J.; Pawlowsky-Glahn, V. (2014). "Bayes Hilbert spaces". Australian and New Zealand Journal of Statistics. 56 (2): 171–194. doi:10.1111/anzs.12074. S2CID 120612578.
  7. ^ Petersen, A.; Müller, H.-G. (2016). "Functional data analysis for density functions by transformation to a Hilbert space". Annals of Statistics. 44 (1): 183–218. arXiv:1601.02869. doi:10.1214/15-AOS1363.
  8. ^ Fletcher, T.F.; Lu, C.; Pizer, S.M.; Joshi, S. (2004). "Principal geodesic analysis for the study of nonlinear statistics of shape". IEEE Transactions on Medical Imaging. 23 (8): 995–1005. doi:10.1109/TMI.2004.831793. PMID 15338733. S2CID 620015.
  9. ^ a b Bigot, J.; Gouet, R.; Klein, T.; López, A. (2017). "Geodesic PCA in the Wasserstein space by convex PCA" (PDF). Annales de l'institut Henri Poincare (B) Probability and Statistics. 53 (1): 1–26. Bibcode:2017AnIHP..53....1B. doi:10.1214/15-AIHP706. S2CID 49256652.
  10. ^ a b c Cazelles, E.; Seguy, V.; Bigot, J.; Cuturi, M.; Papadakis, N. (2018). "Geodesic PCA versus Log-PCA of histograms in the Wasserstein space". SIAM Journal on Scientific Computing. 40 (2): B429–B456. Bibcode:2018SJSC...40B.429C. doi:10.1137/17M1143459.
  11. ^ Petersen, A.; Müller, H.-G. (2019). "Fréchet regression for random objects with Euclidean predictors". Annals of Statistics. 47 (2): 691–719. arXiv:1608.03012. doi:10.1214/17-AOS1624.
  12. ^ a b c Petersen, A.; Zhang, C.; Kokoszka, P. (2022). "Modeling probability density functions as data objects". Econometrics and Statistics. 21: 159–178. doi:10.1016/j.ecosta.2021.04.004. S2CID 236589040.
  13. ^ Chen, Y.; Lin, Z.; Müller, H.-G. (2023). "Wasserstein regression". Journal of the American Statistical Association. 118 (542): 869–882. doi:10.1080/01621459.2021.1956937. S2CID 219721275.
  14. ^ a b c Dai, X. (2022). "Statistical inference on the Hilbert sphere with application to random densities". Electronic Journal of Statistics. 16 (1): 700–736. arXiv:2101.00527. doi:10.1214/21-EJS1942.
  15. ^ a b c Petersen, A.; Liu, X.; Divani, A.A. (2021). "Wasserstein F-tests and confidence bands for the Fréchet regression of density response curves". Annals of Statistics. 49 (1): 590–611. arXiv:1910.13418. doi:10.1214/20-AOS1971. S2CID 204950494.
  16. ^ Zhang, C.; Kokoszka, P.; Petersen, A. (2022). "Wasserstein autoregressive models for density time series". Journal of Time Series Analysis. 43 (1): 30–52. arXiv:2006.12640. doi:10.1111/jtsa.12590. S2CID 219980621.
  17. ^ Zhu, C.; Müller, H.-G. (2023). "Spherical autoregressive models, with application to distributional and compositional time series". Journal of Econometrics. arXiv:2203.12783. doi:10.1016/j.jeconom.2022.12.008.
  18. ^ Zhu, C.; Müller, H.-G. (2023). "Autoregressive optimal transport models". Journal of the Royal Statistical Society Series B: Statistical Methodology. 85 (3): 1012–1033. doi:10.1093/jrsssb/qkad051. PMC 10376456. PMID 37521164.