Tree-Based Machine Learning in Small Area Estimation

Dr. Patrick Krennmair, Nora Würz, Timo Schmid

July 2022

Abstract

Reliable estimators of the spatial distribution of socio-economic indicators are essential for evidence-based policy-making. As the accuracy of direct estimates from survey data decrease with spatially finer target levels, small area estimation approaches are promising. In this article, we outline new approaches that combine small area methodology with machine learning methods. The presented semi-parametric approach is promising as it avoids the assumptions of linear mixed models in contrast to classical small area models and builds on random forests. These tree-based machine learning predictors have the advantage of robustness against outliers and implicit model-selection. As for classical small area models, we account for hierarchically dependent data. We present point estimators applicable to full as well as aggregated auxiliary data access and outline their uncertainty measure. We compare methods based on a reproducible and illustrative example using open-source income data from Austria.

Type

Journal article

Publication

In The Survey Statistician:Emerging Methods

Tree-Based Machine Learning in Small Area Estimation

Abstract

Dr. Patrick Krennmair

Research Associate in Applied Statistics

Related