November 19, 2006
Similar papers 5
October 28, 2016
In this paper we derive the optimal linear shrinkage estimator for the high-dimensional mean vector using random matrix theory. The results are obtained under the assumption that both the dimension $p$ and the sample size $n$ tend to infinity in such a way that $p/n \to c\in(0,\infty)$. Under weak conditions imposed on the underlying data generating mechanism, we find the asymptotic equivalents to the optimal shrinkage intensities and estimate them consistently. The proposed ...
September 23, 2024
Testing for change points in sequences of high-dimensional covariance matrices is an important and equally challenging problem in statistical methodology with applications in various fields. Motivated by the observation that even in cases where the ratio between dimension and sample size is as small as $0.05$, tests based on a fixed-dimension asymptotics do not keep their preassigned level, we propose to derive critical values of test statistics using an asymptotic regime whe...
November 24, 2023
In this paper, we introduce a class of improved estimators for the mean parameter matrix of a multivariate normal distribution with an unknown variance-covariance matrix. In particular, the main results of [D.Ch\'etelat and M. T. Wells(2012). Improved Multivariate Normal Mean Estimation with Unknown Covariance when $p$ is Greater than $n$. The Annals of Statistics, Vol. 40, No.6, 3137--3160] are established in their full generalities and we provide the corrected version of th...
October 19, 2014
In this paper, we are concerned with the independence test for $k$ high-dimensional sub-vectors of a normal vector, with fixed positive integer $k$. A natural high-dimensional extension of the classical sample correlation matrix, namely block correlation matrix, is raised for this purpose. We then construct the so-called Schott type statistic as our test statistic, which turns out to be a particular linear spectral statistic of the block correlation matrix. Interestingly, the...
May 11, 2015
When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number $n$ of acquired samples (statistical replicates) is far fewer than the number $p$ of obser...
February 28, 2020
This paper investigates a statistical procedure for testing the equality of two independent estimated covariance matrices when the number of potentially dependent data vectors is large and proportional to the size of the vectors, that is, the number of variables. Inspired by the spike models used in random matrix theory, we concentrate on the largest eigenvalues of the matrices in order to determine significance. To avoid false rejections we must guard against residual spikes...
January 18, 2016
To model modern large-scale datasets, we need efficient algorithms to infer a set of $P$ unknown model parameters from $N$ noisy measurements. What are fundamental limits on the accuracy of parameter inference, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as ...
December 22, 2023
In this paper, we develop invariance-based procedures for testing and inference in high-dimensional regression models. These procedures, also known as randomization tests, provide several important advantages. First, for the global null hypothesis of significance, our test is valid in finite samples. It is also simple to implement and comes with finite-sample guarantees on statistical power. Remarkably, despite its simplicity, this testing idea has escaped the attention of ea...
July 13, 2017
In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix of the responses and to plug this estimator in a Lasso criterion, in order to obtain a sparse estimator of the coefficient matrix. The properties of our approach are investigated both from a theoretical and a numerical point of view. More ...
November 8, 2021
Analytical understanding of how low-dimensional latent features reveal themselves in large-dimensional data is still lacking. We study this by defining a linear latent feature model with additive noise constructed from probabilistic matrices, and analytically and numerically computing the statistical distributions of pairwise correlations and eigenvalues of the correlation matrix. This allows us to resolve the latent feature structure across a wide range of data regimes set b...