July 20, 2023
Flexible modeling of how an entire distribution changes with covariates is an important yet challenging generalization of mean-based regression that has seen growing interest over the past decades in both the statistics and machine learning literature. This review outlines selected state-of-the-art statistical approaches to distributional regression, complemented with alternatives from machine learning. Topics covered include the similarities and differences between these approaches, extensions, properties and limitations, estimation procedures, and the availability of software. In view of the increasing complexity and availability of large-scale data, this review also discusses the scalability of traditional estimation methods, current trends, and open challenges. Illustrations are provided using data on childhood malnutrition in Nigeria and Australian electricity prices.
Similar papers 1
July 18, 2022
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable...
July 22, 2024
We propose a novel regression adjustment method designed for estimating distributional treatment effect parameters in randomized experiments. Randomized experiments have been extensively used to estimate treatment effects in various scientific fields. However, to gain deeper insights, it is essential to estimate distributional treatment effects rather than relying solely on average effects. Our approach incorporates pre-treatment covariates into a distributional regression fr...
June 12, 2018
There is growing evidence that converting targets to soft targets in supervised learning can provide considerable gains in performance. Much of this work has considered classification, converting hard zero-one values to soft labels---such as by adding label noise, incorporating label ambiguity or using distillation. In parallel, there is some evidence from a regression setting in reinforcement learning that learning distributions can improve performance. In this work, we inve...
December 23, 2017
Linear regression is a fundamental and popular statistical method. There are various kinds of linear regression, such as mean regression and quantile regression. In this paper, we propose a new one called distribution regression, which allows broad-spectrum of the error distribution in the linear regression. Our method uses nonparametric technique to estimate regression parameters. Our studies indicate that our method provides a better alternative than mean regression and qua...
October 4, 2023
We propose a novel machine learning approach to probabilistic forecasting of hourly day-ahead electricity prices. In contrast to recent advances in data-rich probabilistic forecasting that approximate the distributions with some features such as moments, our method is non-parametric and selects the best distribution from all possible empirical distributions learned from the data. The model we propose is a multiple output neural network with a monotonicity adjusting penalty. S...
November 13, 2023
Structured additive distributional regression models offer a versatile framework for estimating complete conditional distributions by relating all parameters of a parametric distribution to covariates. Although these models efficiently leverage information in vast and intricate data sets, they often result in highly-parameterized models with many unknowns. Standard estimation methods, like Bayesian approaches based on Markov chain Monte Carlo methods, face challenges in estim...
September 4, 2021
This paper introduces to readers the new concept and methodology of confidence distribution and the modern-day distributional inference in statistics. This discussion should be of interest to people who would like to go into the depth of the statistical inference methodology and to utilize distribution estimators in practice. We also include in the discussion the topic of generalized fiducial inference, a special type of modern distributional inference, and relate it to the c...
May 29, 2020
Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data, which can also be used for targets other than the original mean estimation. We propose a novel forest construction for multivariate responses based on their joint conditional distribution, independent of the estimation target and the data mod...
April 9, 2018
To obtain a probabilistic model for a dependent variable based on some set of explanatory variables, a distributional approach is often adopted where the parameters of the distribution are linked to regressors. In many classical models this only captures the location of the distribution but over the last decade there has been increasing interest in distributional regression approaches modeling all parameters including location, scale, and shape. Notably, so-called non-homogen...
August 24, 2023
The problem of modeling the relationship between univariate distributions and one or more explanatory variables has found increasing interest. Traditional functional data methods cannot be applied directly to distributional data because of their inherent constraints. Modeling distributions as elements of the Wasserstein space, a geodesic metric space equipped with the Wasserstein metric that is related to optimal transport, is attractive for statistical applications. Existing...