A Latent Class Model for Multivariate Binary Data Subject to Missingness

Samah Zakaria, Mai Sherif Hafez, Ahmed M. Gad


When researchers are interested in measuring social phenomena that cannot be measured using a single variable, the appropriate statistical tool to be used is a latent variable model. A number of manifest variables is used to define the latent phenomenon. The manifest variables may be incomplete due to different forms of non-response that may or may not be random. In such cases, especially when the missingness is nonignorable, it is inevitable to include a missingness mechanism in the model to obtain valid estimates for parameters. In social surveys, categorical items can be considered the most common type of variable. We thus propose a latent class model where two categorical latent variables are defined; one represents the latent phenomenon of interest, and another represents a respondent’s propensity to respond to survey items. All manifest items are considered to be categorical. The proposed model incorporates a missingness mechanism that accounts for forms of missingness that may not be random by allowing the latent response propensity class to depend on the latent phenomenon under consideration, given a set of covariates. The Expectation-Maximization (EM) algorithm is used for estimating the proposed model. The proposed model is used to analyze data from 2014 Egyptian Demographic and Health Survey (EDHS14). Missing data is artificially created in order to study results under the three types of missingness: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).


binary variables; latent class model; item non-response; non-random missingness; response propensity.

Full Text:



G. Verbeke and G. Molenberghs, “Modeling Through Latent Variables,” Annual Review of Statistics and Its Application, vol. 4, pp. 267–282, 2017.

R. van Bork, M. Rhemtulla, L. J. Waldorp, J. Kruis, S. Rezvanifar, and D. Borsboom, “Latent Variable Models and Networks: Statistical Equivalence and Testability,” Multivariate Behavioral Research, vol. 56, no. 2, pp. 175–198, 2021.

D. J. Bartholomew, M. Knott, and I. Moustaki, Latent Variable Models and Factor Analysis, 3rd ed., Wiley series in probability and statistics, 2011.

S. Jeon, J. Lee, J. C. Anthony, and C. H., “Latent class analysis for multiple discrete latent variables: a study on the association between violent behavior and drug-using behaviors,” Structural Equation Modeling: A Multidisciplinary Journal, vol. 24, pp. 911–925, 2017.

J. W. Lee and H. Chung, “Latent class analysis with multiple latent group variables,” Communications for Statistical Applications and Methods, vol. 24, pp. 173–191, 2017.

K. J. Petersen, P. Qualter, and N. Humphrey, “The Application of Latent Class Analysis for Investigating Population Child Mental Health: A Systematic Review,” Frontiers in Psychology, vol. 10, p. 1214, 2019.

E. Kim, H. Chung, and S. Jeon, “Joint latent class analysis for longitudinal data: an application on adolescent emotional well-being,” Communications for Statistical Applications and Methods, vol. 27, pp. 241–254, 2020.

K. J. Petersen, N. Humphrey, and P. Qualter, “Latent Class Analysis of Mental Health in Middle Childhood: Evidence for the Dual-Factor Model,” School Mental Health, vol. 12, pp. 786–800, 2020.

A. Robitzsch, “Regularized Latent Class Analysis for Polytomous Item Responses: An Application to SPM-LS Data,” Journal of Intelligence, vol. 8, p. 30, 2020.

J. W. Lee and H. Chung, “A multivariate latent class profile analysis for longitudinal data with a latent group variable,” Communications for Statistical Applications and Methods, vol. 27, pp. 15–35, 2020.

J. H. M. Janssen, S. van Laar, M. J. de Rooij, J. Kuha, and Z. Bakk, “The Detection and Modeling of Direct Effects in Latent Class Analysis,” Structural Equation Modeling: A Multidisciplinary Journal, vol. 26, no. 2, pp. 280–290, 2019.

Z. Bakk and J. Kuha, “Relating latent class membership to external variables: An overview,” British Journal of Mathematical and Statistical Psychology, vol. 74, pp. 340–362, 2020.

Z. Bakk and J. Kuha, “Two-step estimation of models between latent classes and external variables,” Psychometrika, vol. 83, no. 4, pp. 871–892, 2018.

J. Kuha, S. Butt, M. Katsikatsou, and C. J. Skinner, “The Effect of Probing ‘Don’t Know’ Responses on Measurement Quality and Non-response in Surveys,” Journal of the American Statistical Association, vol. 113, no. 521, pp. 26–40, 2018.

R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, 2nd ed., John Wiley and Sons, 2002.

H. Du, C. Enders, B. T. Keller, T. N. Bradbury, and B. R. Karney, “A Bayesian Latent Variable Selection Model for Nonignorable Missingness,” Multivariate Behavioral Research, 2021, doi: 10.1080/00273171.2021.1874259.

N. Rose, M. von Davier, and B. Nagengast, “Modeling Omitted and Not-Reached Items in IRT Models,” Psychometrika, vol. 82, pp. 795–819, 2017.

J. F. Cursio, R. J. Mermelstein, and D. Hedeker, “Latent trait shared-parameter mixed models for missing ecological momentary assessment data,” Statistics in Medicine, vol. 38, no. 4, pp. 660– 673, 2019.

H. Jung, J. L. Schafer, and B. Seo, “A latent class selection model for nonignorably missing data,” Computational Statistics and Data Analysis, vol. 55, pp. 802–812, 2011.

O. Harel and J. L. Schafer, “Partial and latent ignorability in missing-data problems,” Biometrika, vol. 96, no. 1, pp. 37–50, 2009.

J. Kuha, M. Katsikatsou, and I. Moustaki, “Latent variable modelling with non-ignorable item non-response: multigroup response propensity models for cross-national analysis,” Journal of the Royal Statistical Society, Series A, vol. 181, pp. 1169–1192, 2018.

S. Bacci and F. Bartolucci, “A Multidimensional Latent Class IRT Model for Non-Ignorable Missing Responses,” Structural Equation Modeling: A Multidisciplinary Journal, 2014, [Online]. Available: https://arxiv.org/abs/1410.4856

S. K. Sterba, “A Latent Transition Analysis Model for Latent-State-Dependent Nonignorable Missingness,” Psychometrika, vol. 81, no. 2, pp. 506–534, 2016.

El-Zanaty, Associates, and I. International, “Egypt demographic and health survey 2014 (Data file and code book),” Ministry of Health and Population, Egypt, 2014. [Online]. Available: https://www.dhsprogram.com/data/available-datasets.cfm

L. M. Collins and S. T. Lanza, Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences. Wiley series in probability and statistics, 2010.

L. K. Muthén and B. O. Muthén, Mplus User’s Guide, 8th ed., Los Angeles, CA: Muthén & Muthén, 1998-2017.

L. A. Goodman, “Exploratory latent structure analysis using both identifiable and unidentifiable models,” Biometrika, vol. 61, pp. 215–231, 1974.

G. Xu, “Identifiability of restricted latent class models with binary responses,” The Annals of Statistics, vol. 45, no. 2, pp. 675–707, 2017.

W. Koo and H. Kim, “Bayesian nonparametric latent class model for longitudinal data,” Statistical Methods in Medical Research, vol. 29, no. 11, pp. 3381–3395, 2020.

J. Tein, S. Coxe, and H. Cham, “Statistical Power to Detect the Correct Number of Classes in Latent Profile Analysis,” Structural Equation Modeling, vol. 20, no. 4, pp. 640–657, 2013.

D. Gordon, S. Nandy, C. Pantazis, S. Pemberton, and P. Townsend, “The distribution of poverty in the developing world,” University of Bristol, Centre for International Poverty Research, UK, 2003.

L. J. Beesley, J. M. G. Taylor, and R. J. A. Little, “Sequential imputation for models with latent variables assuming latent ignorability,” Aust. N. Z. J. Stat., vol. 61, no. 2, pp. 213–233, 2019.

S. Zakaria, M. S. Hafez, and A. M. Gad, “Bayesian Estimation of Latent Class Model for Survey Data Subject to Item Nonresponse,” Pakistan Journal of Statistics and Operation Research, vol. 15, no. 2, pp. 303–318, 2019.

DOI: http://dx.doi.org/10.18517/ijaseit.11.5.14910


  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development