November 19, 2006
Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of inter-dependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory (RMT) developed, initially within physics, and more recently widely in mathematics. While some of the central objects of study in RMT are identical to those of multivariate statistics, statistical theory was slow to exploit the connection. However, with vast data collection ever more common, data sets now often have as many or more variables than the number of individuals observed. In such contexts, the techniques and results of RMT have much to offer multivariate statistics. The paper reviews some of the progress to date.
Similar papers 1
August 19, 2019
This article is due to appear in the Handbook of Statistics, Vol. 43, Elsevier/North-Holland, Amsterdam, edited by Arni S. R. Srinivasa Rao and C. R. Rao. In modern day analytics, there is ever growing need to develop statistical models to study high dimensional data. Between dimension reduction, asymptotics-driven methods and random projection based methods, there are several approaches developed so far. For high dimensional parametric models, estimation and hypothesis tes...
November 5, 2015
The classic likelihood ratio test for testing the equality of two covariance matrices breakdowns due to the singularity of the sample covariance matrices when the data dimension $p$ is larger than the sample size $n$. In this paper, we present a conceptually simple method using random projection to project the data onto the one-dimensional random subspace so that the conventional methods can be applied. Both one-sample and two-sample tests for high-dimensional covariance matr...
March 9, 2024
In this paper, we propose a new modified likelihood ratio test (LRT) for simultaneously testing mean vectors and covariance matrices of two-sample populations in high-dimensional settings. By employing tools from Random Matrix Theory (RMT), we derive the limiting null distribution of the modified LRT for generally distributed populations. Furthermore, we compare the proposed test with existing tests using simulation results, demonstrating that the modified LRT exhibits favora...
January 23, 2012
This paper deals with the problem of estimating the covariance matrix of a series of independent multivariate observations, in the case where the dimension of each observation is of the same order as the number of observations. Although such a regime is of interest for many current statistical signal processing and wireless communication issues, traditional methods fail to produce consistent estimators and only recently results relying on large random matrix theory have been ...
February 12, 2019
The advent of modern technology, permitting the measurement of thousands of characteristics simultaneously, has given rise to floods of data characterized by many large or even huge datasets. This new paradigm presents extraordinary challenges to data analysis and the question arises: how can conventional data analysis methods, devised for moderate or small datasets, cope with the complexities of modern data? The case of high dimensional data is particularly revealing of some...
March 30, 2020
We study general singular value shrinkage estimators in high-dimensional regression and classification, when the number of features and the sample size both grow proportionally to infinity. We allow models with general covariance matrices that include a large class of data generating distributions. As far as the implications of our results are concerned, we find exact asymptotic formulas for both the training and test errors in regression models fitted by gradient descent, wh...
It is clear that conventional statistical inference protocols need to be revised to deal correctly with the high-dimensional data that are now common. Most recent studies aimed at achieving this revision rely on powerful approximation techniques, that call for rigorous results against which they can be tested. In this context, the simplest case of high-dimensional linear regression has acquired significant new relevance and attention. In this paper we use the statistical phys...
March 26, 2017
Testing independence among a number of (ultra) high-dimensional random samples is a fundamental and challenging problem. By arranging $n$ identically distributed $p$-dimensional random vectors into a $p \times n$ data matrix, we investigate the problem of testing independence among columns under the matrix-variate normal modeling of data. We propose a computationally simple and tuning-free test statistic, characterize its limiting null distribution, analyze the statistical po...
January 12, 2021
Many applications benefit from theory relevant to the identification of variables having large correlations or partial correlations in high dimension. Recently there has been progress in the ultra-high dimensional setting when the sample size $n$ is fixed and the dimension $p$ tends to infinity. Despite these advances, the correlation screening framework suffers from practical, methodological and theoretical deficiencies. For instance, previous correlation screening theory re...
April 30, 2011
For a long time, detection and parameter estimation methods for signal processing have relied on asymptotic statistics as the number $n$ of observations of a population grows large comparatively to the population size $N$, i.e. $n/N\to \infty$. Modern technological and societal advances now demand the study of sometimes extremely large populations and simultaneously require fast signal processing due to accelerated system dynamics. This results in not-so-large practical ratio...