Flexible domain prediction using mixed effects random forests

Image credit: Krennmair & Schmid

Abstract

This paper promotes the use of random forests as versatile tools for estimating spatially disaggregated indicators in the presence of small area-specific sample sizes. Small area estimators are predominantly conceptualized within the regression-setting and rely on linear mixed models to account for the hierarchical structure of the survey data. In contrast, machine learning methods offer non-linear and non-parametric alternatives, combining excellent predictive performance and a reduced risk of model-misspecification. Mixed effects random forests combine advantages of regression forests with the ability to model hierarchical dependencies. This paper provides a coherent framework based on mixed effects random forests for estimating small area averages and proposes a non-parametric bootstrap estimator for assessing the uncertainty of the estimates. We illustrate advantages of our proposed methodology using Mexican income-data from the state Nuevo León. Finally, the methodology is evaluated in model-based and design-based simulations comparing the proposed methodology to traditional regression-based approaches for estimating small area averages.

Publication
In Journal of the Royal Statistical Society Series C (Applied Statistics)
Dr. Patrick Krennmair
Dr. Patrick Krennmair
Research Associate in Applied Statistics

I am working as a research associate at the Chair of Applied Statistics at Freie Universität Berlin and as a consultant for the statistical consulting unit fu:stat.

Related