An experiment on selecting most informative variables in socio-economic data

L. Jenkins

Abstract


In many studies where data are collected on several variables, there is a motivation to find if fewer variables would provide almost as much information. Variance of a variable about its mean is the common statistical measure of information content, and that is used here. We are interested whether the variability in one variable is sufficiently correlated with that in one or more of the other variables that the first variable is redundant. We wish to find one or more ‘principal variables’ that sufficiently reflect the information content in all the original variables. The paper explains the method of principal variables and reports experiments using the technique to see if just a few variables are sufficient to reflect the information in 11 socioeconomic variables on 130 countries from a World Bank (WB) database. While the method of principal variables is highly successful in a statistical sense, the WB data varies greatly from year to year, demonstrating that fewer variables wo
uld be inadequate for this data.

Full Text:

PDF


DOI: https://doi.org/10.5784/19-0-181

Refbacks

  • There are currently no refbacks.





ISSN 2224-0004 (online); ISSN 0259-191X (print)

Powered by OJS and hosted by Stellenbosch University Library and Information Service since 2011.


Disclaimer:

This journal is hosted by the SU LIS on request of the journal owner/editor. The SU LIS takes no responsibility for the content published within this journal, and disclaim all liability arising out of the use of or inability to use the information contained herein. We assume no responsibility, and shall not be liable for any breaches of agreement with other publishers/hosts.

SUNJournals Help