December 4, 2015
We examine the fundamental aspects of statistical mechanics, dividing the problem into a discussion purely about probability, which we analyse from a Bayesian standpoint. We argue that the existence of a unique maximising probability distribution $\{p(j\vert K)\}$ for states labelled by $j$ given data $K$ implies that the corresponding maximal value of the information entropy $\sigma(\{(p_j\vert K)\}) = -\sum_j (p_j \vert K)\ln{(p_j\vert K)}$ depends explicitly on the data at...
October 10, 2016
We consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of $m$ points in $n$ dimensions, $n,m \rightarrow \infty$ and $\alpha = m/n$ stays finite. Using exact but non-rigorous methods from statistical physics, we determine the critical value of $\alpha$ and the distance between the clusters at which it becomes information-theoretically possible to reconstruct the membership into clusters better than chance. We also determin...
November 26, 2005
In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here we reformulate the clustering problem from an information theoretic perspective which avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster "protot...
July 10, 2003
The problem of assigning probability distributions which objectively reflect the prior information available about experiments is one of the major stumbling blocks in the use of Bayesian methods of data analysis. In this paper the method of Maximum (relative) Entropy (ME) is used to translate the information contained in the known form of the likelihood into a prior distribution for Bayesian inference. The argument is inspired and guided by intuition gained from the successfu...
July 8, 2016
In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical n...
December 11, 2013
We developed an optimal in the natural sense algorithm of partition in cluster analysis based on the densities of observations in the different hypotheses. These densities may be characterized, for instance, as the multivariate so-called "quasi-Gaussian distribution". We describe also the possible applications in technical diagnosis, demography and philology.
May 22, 2018
Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowa- days. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. I claim that the most signif- icant challenge for clustering is model selection. In contrast with other common computa...
October 15, 2013
We propose a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions. Previous attempts to use information theory to define clusters in an assumption-free way are based on maximizing mutual information between data and cluster labels. We demonstrate that this intuition suffers from a fundamental conceptual flaw that causes clustering performance to deteriorate as the amount of data increases. Instead, we re...
September 6, 2002
Using a variational technique, we generalize the statistical physics approach of learning from random examples to make it applicable to real data. We demonstrate the validity and relevance of our method by computing approximate estimators for generalization errors that are based on training data alone.
January 10, 2020
Despite its well-known shortcomings, $k$-means remains one of the most widely used approaches to data clustering. Current research continues to tackle its flaws while attempting to preserve its simplicity. Recently, the \textit{power $k$-means} algorithm was proposed to avoid trapping in local minima by annealing through a family of smoother surfaces. However, the approach lacks theoretical justification and fails in high dimensions when many features are irrelevant. This pap...