November 19, 2006
Similar papers 5
October 19, 2014
In this paper, we are concerned with the independence test for $k$ high-dimensional sub-vectors of a normal vector, with fixed positive integer $k$. A natural high-dimensional extension of the classical sample correlation matrix, namely block correlation matrix, is raised for this purpose. We then construct the so-called Schott type statistic as our test statistic, which turns out to be a particular linear spectral statistic of the block correlation matrix. Interestingly, the...
May 11, 2015
When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number $n$ of acquired samples (statistical replicates) is far fewer than the number $p$ of obser...
February 28, 2020
This paper investigates a statistical procedure for testing the equality of two independent estimated covariance matrices when the number of potentially dependent data vectors is large and proportional to the size of the vectors, that is, the number of variables. Inspired by the spike models used in random matrix theory, we concentrate on the largest eigenvalues of the matrices in order to determine significance. To avoid false rejections we must guard against residual spikes...
March 3, 2016
In this paper, we will introduce the so called naive tests and give a brief review on the newly development. Naive testing methods are easy to understand and performs robust especially when the dimension is large. In this paper, we mainly focus on reviewing some naive testing methods for the mean vectors and covariance matrices of high dimensional populations and believe this naive test idea can be wildly used in many other testing problems.
January 18, 2016
To model modern large-scale datasets, we need efficient algorithms to infer a set of $P$ unknown model parameters from $N$ noisy measurements. What are fundamental limits on the accuracy of parameter inference, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as ...
December 22, 2023
In this paper, we develop invariance-based procedures for testing and inference in high-dimensional regression models. These procedures, also known as randomization tests, provide several important advantages. First, for the global null hypothesis of significance, our test is valid in finite samples. It is also simple to implement and comes with finite-sample guarantees on statistical power. Remarkably, despite its simplicity, this testing idea has escaped the attention of ea...
November 8, 2021
Analytical understanding of how low-dimensional latent features reveal themselves in large-dimensional data is still lacking. We study this by defining a linear latent feature model with additive noise constructed from probabilistic matrices, and analytically and numerically computing the statistical distributions of pairwise correlations and eigenvalues of the correlation matrix. This allows us to resolve the latent feature structure across a wide range of data regimes set b...
July 13, 2017
In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix of the responses and to plug this estimator in a Lasso criterion, in order to obtain a sparse estimator of the coefficient matrix. The properties of our approach are investigated both from a theoretical and a numerical point of view. More ...
May 29, 2020
This paper investigates a statistical procedure for testing the equality of two independently estimated covariance matrices when the number of potentially dependent data vectors is large and proportional to the size of the vectors, that is, the number of variables. Inspired by the spike models used in random matrix theory, we concentrate on the largest eigenvalues of the matrices in order to determine significant differences. To avoid false rejections we must guard against re...
October 9, 2012
This paper introduces a new framework to study the asymptotical behavior of the empirical distribution function (e.d.f.) of Gaussian vector components, whose correlation matrix $\Gamma^{(m)}$ is dimension-dependent. Hence, by contrast with the existing literature, the vector is not assumed to be stationary. Rather, we make a "vanishing second order" assumption ensuring that the covariance matrix $\Gamma^{(m)}$ is not too far from the identity matrix, while the behavior of the...