Impact of Feature Selection and Data Augmentation for Pregnancy Risk Detection in Indonesia

Setio Basuki, Muhammad Irfan, Yufis Azhar


This paper aims to develop an automatic system for pregnancy risk detection in Indonesia. The system requires a sophisticated approach to achieve the required performance as a sensitive field. Existing works are developed using small-sized datasets and limited classification features. Moreover, all features treated equally make the detection results hard to interpret which features contribute more. To address these issues, we propose to combine more complex features, data augmentation methods, and feature selection techniques. We prefer to use all 118 pregnancy indicators and 400 instances from Puskesmas as an original dataset. Next, the new datasets are used to build two data augmentation methods, i.e., GMM and CTGAN. Each data augmentation method generates 2,000 new synthetic instances. Following this, five machine learning methods combined with three feature selection approaches, i.e., RFE, Random Forest, and Chi-Square, are implemented in all datasets. Through experiments, we observed that feature selection techniques play an essential role in improving classification accuracies. While the GMM-based augmentation demonstrated performance improvement, the CTGAN-based synthetic dataset depicted low performances. The best accuracy on all experiment settings reached 95%. By using Random Forest combined with RFE on a GMM-based dataset, the highest accuracy was achieved using only five features. Another notable result is that both XGBoost and Decision Tree reached the same 95% accuracy on the GMM-based dataset on only nine features. The overall results show that appropriate data augmentation and feature selection are a matter for achieving better performance in this research.


Ctgan; data augmentation; feature selection; pregnancy risk detection.

Full Text:



Kementerian Kesehatan Republik Indonesia, “Profil Kesehatan Indonesia 2015,†M. K. Dr. drh. Didik Budijanto, M.Kes;Yudianto, SKM, M.Si; Boga Hardhana, S.Si, MM ; drg. Titi Aryati Soenardi, Ed. Jakarta: Kementerian Kesehatan Republik Indonesia, 2016, p. 403.

M. Irfan, S. Basuki, and Y. Azhar, "Giving more insight for automatic risk prediction during pregnancy with interpretable machine learning," Bull. Electr. Eng. Informatics, vol. 10, no. 3, pp. 1621–1633, 2021.

L. Davidson and M. R. Boland, "Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes," Brief. Bioinform., vol. 22, no. 5, pp. 1–29, 2021.

A. Akbulut, E. Ertugrul, and V. Topcu, "Fetal health status prediction based on maternal clinical history using machine learning techniques," Comput. Methods Programs Biomed., vol. 163, pp. 87–100, 2018.

J. M. Bautista, Q. A. I. Quiwa, and R. S. J. Reyes, "Machine learning analysis for remote prenatal care," IEEE Reg. 10 Annu. Int. Conf. Proceedings/TENCON, vol. 2020-Novem, pp. 397–402, 2020.

L. Davidson and M. R. Boland, "Enabling pregnant women and their physicians to make informed medication decisions using artificial intelligence," J. Pharmacokinet. Pharmacodyn., vol. 47, no. 4, pp. 305–318, 2020.

F. Sarhaddi, I. Azimi, S. Labbaf, H. Niela-vilén, and N. Dutt, "Long-Term IoT-Based Maternal Monitoring: System Design and Evaluation," MDPI Sensors, vol. 21, pp. 1–21, 2021.

M. W. L. Moreira, J. J. P. C. Rodrigues, A. M. B. Oliveira, K. Saleem, and A. J. V. Neto, "Predicting hypertensive disorders in high-risk pregnancy using the random forest approach," IEEE Int. Conf. Commun., 2017.

M. W. L. Moreira, J. J. P. C. Rodrigues, V. Furtado, C. X. Mavromoustakis, N. Kumar, and I. Woungang, "Fetal Birth Weight Estimation in High-Risk Pregnancies Through Machine Learning Techniques," IEEE Int. Conf. Commun., vol. 2019-May, pp. 1–6, 2019.

M. Tahir, T. Badriyah, and I. Syarif, "Classification Algorithms of Maternal Risk Detection For Preeclampsia With Hypertension During Pregnancy Using Particle Swarm Optimization," Emit. Int. J. Eng. Technol., vol. 6, no. 2, pp. 236–253, 2018.

R. Chu et al., "Predicting the Risk of Adverse Events in Pregnant Women With Congenital Heart Disease," J. Am. Heart Assoc., vol. 9, no. 14, p. e016371, 2020.

E. Purwanti, I. S. Preswari, and Ernawati, "Early risk detection of pre-eclampsia for pregnant women using artificial neural network," Int. J. online Biomed. Eng., vol. 15, no. 2, pp. 71–80, 2019.

H. Sufriyana, Y. W. Wu, and E. C. Y. Su, "Artificial intelligence-assisted prediction of preeclampsia: Development and external validation of a nationwide health insurance dataset of the BPJS Kesehatan in Indonesia," EBioMedicine, vol. 54, 2020.

L. Yang et al., "Predictive models of hypertensive disorders in pregnancy based on support vector machine algorithm," Technol. Heal. Care, vol. 28, no. S1, pp. S181–S186, 2020.

E. Malacova et al., "Stillbirth risk prediction using machine learning for a large cohort of births from Western Australia, 1980–2015," Sci. Rep., vol. 10, no. 1, pp. 1–8, 2020.

S. Bhadra et al., "Quantifying leaf chlorophyll concentration of sorghum from hyperspectral data using derivative calculus and machine learning," Remote Sens., vol. 12, no. 13, 2020.

P. W. Hatfield et al., "Augmenting machine learning photometric redshifts with Gaussian mixture models," Mon. Not. R. Astron. Soc., vol. 498, no. 4, pp. 5498–5510, 2020.

D. A. B. Oliveira, "Augmenting Data Using Gaussian Mixture Embedding for Improving Land Cover Segmentation," 2020 IEEE Lat. Am. GRSS ISPRS Remote Sens. Conf. LAGIRS 2020 - Proc., pp. 333–338, 2020.

A. Arora, N. Shoeibi, V. Sati, A. González-Briones, P. Chamoso, and E. Corchado, "Data augmentation using gaussian mixture model on csv files," Adv. Intell. Syst. Comput., vol. 1237 AISC, no. January, pp. 258–265, 2021.

M. Javeed, M. Gochoo, A. Jalal, and K. Kim, "Hf-sphr: Hybrid features for sustainable physical healthcare pattern recognition using deep belief networks," Sustain., vol. 13, no. 4, pp. 1–27, 2021.

H. Elmoaqet, J. Kim, D. Tilbury, S. K. Ramachandran, M. Ryalat, and C. H. Chu, "Gaussian mixture models for detecting sleep apnea events using single oronasal airflow record," Appl. Sci., vol. 10, no. 21, pp. 1–15, 2020.

A. Singhal, P. Singh, B. Lall, and S. D. Joshi, "Modeling and prediction of COVID-19 pandemic using Gaussian mixture model," Chaos, Solitons and Fractals, vol. 138, p. 110023, 2020.

H. Zhang, L. Huang, C. Q. Wu, and Z. Li, "An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset," Comput. Networks, vol. 177, no. April, 2020.

A. Saygılı, "Computer-Aided Detection of COVID-19 from CT Images Based on Gaussian Mixture Model and Kernel Support Vector Machines Classifier," Arab. J. Sci. Eng., vol. 47, no. 2, pp. 2435–2453, 2022.

A. Das, U. R. Acharya, S. S. Panda, and S. Sabut, "Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques," Cogn. Syst. Res., vol. 54, pp. 165–175, 2019.

K. Sekaran, P. Chandana, N. M. Krishna, and S. Kadry, "Deep learning convolutional neural network (CNN) With Gaussian mixture model for predicting pancreatic cancer," Multimed. Tools Appl., vol. 79, no. 15–16, pp. 10233–10247, 2020.

F. Riaz et al., "Gaussian Mixture Model Based Probabilistic Modeling of Images for Medical Image Segmentation," IEEE Access, vol. 8, pp. 16846–16856, 2020.

L. Moraru et al., "Gaussian mixture model for texture characterization with application to brain DTI images," J. Adv. Res., vol. 16, pp. 15–23, 2019.

Y. Yu and W. J. Zhou, "Mixture of GANs for clustering," IJCAI Int. Jt. Conf. Artif. Intell., vol. 2018-July, pp. 3047–3053, 2018.

L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, "Modeling tabular data using conditional GAN," Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, 2019.



  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development