Research note: Imputing large group averages for missing data, using rural-urban continuum codes for density driven industry sectors


Understanding the effects and consequences of missing data imputation is vital to the ability to obtain meaningful and reliable statistics and coefficients in the examination of any quantitatively-based phenomena. Over time a series of sophisticated methods have been developed to handle the issue of missing data imputation however, these sophisticated methods may not always be appropriate or attainable. In these specific cases more traditional approaches to missing data imputation must be employed and driven by the research project, theoretical framework, and the data. In this research note we offer a brief account of one such instance, implementing a large-group mean imputation approach to handling missing data. The analysis is drawn from a much larger project and shows the effect of proper group selection in terms of mean imputation using a cross-validation approach based on the imputed data's relation to known values. Ultimately, the results show that the use of Rural-Urban Continuum codes are superior to currently used group-means in the U.S., thus introducing a new, and more efficient, approach to the handling of missing data using group-mean imputation. © 2009 Springer Science & Business Media BV.

Publication Title

Journal of Population Research