Bayesian Model Averaging (BMA) Based on Logistic Regression for Gene Selection and Classification of Animal Tumor Disease on Microarray Data

Heri Kuswanto, Ika Nur Laily Fitriana


Tumor is one of the deadly diseases which is frequently to be found in animals. However, identifying whether an animal has a tumor still becomes a big challenge. Classification of tumor disease can be done through gene expression, which consists of hundreds of genes, but only a small number of samples is taken. This data structure is called microarray data having the characteristic of high-dimensional data. The choice of a single model can be a problem for high-dimensional data because it ignores model uncertainty. This research proposed to use Bayesian Model Averaging (BMA) to model the uncertainty model by averaging the posterior distribution of all best models, weighted by their posterior model probabilities. Selecting relevant genes to diagnose animal tumors is very important; hence, variable selection needs to be carried out. The selection of predictor variables is carried out by using the iterative BMA algorithm. The BMA results showed that from 335 gene expressions, 12 genes were selected to be relevant genes for classifying whether the animals have a tumor or normal. Moreover, from 2335 possible models formed, 12 of the best models are selected. The accuracy of BMA results is assessed using the Brier Score, resulting from a value indicating that the BMA model is good enough to classify animals, whether they have a tumor or not. This research has proven that BMA with logistic performance has very good predictability; hence, the method can be applied to classify other diseases.


Animal tumor; BMA; gene expression; microarray.

Full Text:



P. T. Ramadhani, U. Novia Wisesty, and A. Aditsania, “Deteksi Kanker berdasarkan Klasifikasi Data Microarray menggunakan Functional Link Neural Network dengan Seleksi Fitur Genetic Algorithm,†Indones. J. Comput., vol. 2, no. 2, pp. 11–22, Nov. 2017, doi: 10.21108/INDOJC.2017.2.2.173.

M. S. Rao et al., “Comparison of RNA-Seq and microarray gene expression platforms for the toxicogenomic evaluation of liver from short-term rat toxicity studies,†Front. Genet., vol. 9, 2019, doi: 10.3389/fgene.2018.00636.

K. Y. Yeung, R. E. Bumgarner, and A. E. Raftery, “Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data,†Bioinformatics, vol. 21, no. 10, pp. 2394–2402, May 2005, doi: 10.1093/BIOINFORMATICS/BTI319.

Ž. Avsec et al., “Effective gene expression prediction from sequence by integrating long-range interactions,†Nat. Methods 2021 1810, vol. 18, no. 10, pp. 1196–1203, Oct. 2021, doi: 10.1038/s41592-021-01252-x.

W. Li and Y. Yang, “How Many Genes are Needed For A Discriminant Microarray Data Analysis,†Bioinformatics, vol. 9, pp. 2429–2437, 2002.

X. Yu, L. Xiao, P. Zeng, and S. Huang, “Jackknife Model Averaging Prediction Methods for Complex Phenotypes with Gene Expression Levels by Integrating External Pathway Information,†Comput. Math. Methods Med., vol. 2019, 2019, doi: 10.1155/2019/2807470.

A. B. Astuti, N. Iriawan, Irhamah, and H. Kuswanto, “Bayesian Mixture Model Averaging Untuk Mengidentifikasi Perbedaan Ekspresi Gen Percobaan Microarray,†Appl. Math. Sci., vol. 8, no. 145–148, pp. 7277–7287, 2017, doi: 10.12988/AMS.2014.49760.

M. Hinne, Q. F. Gronau, D. van den Bergh, and E.-J. Wagenmakers, “A Conceptual Introduction to Bayesian Model Averaging,†Adv. Methods Pract. Psychol. Sci., vol. 3, no. 2, pp. 200–215, Jun. 2020, doi: 10.1177/2515245919898657.

D. Kaplan, “On the Quantification of Model Uncertainty: A Bayesian Perspective,†Psychometrika, vol. 86, no. 1, pp. 215–238, Mar. 2021, doi: 10.1007/S11336-021-09754-5/TABLES/5.

M. F. J. Steel, “Model Averaging and Its Use in Economics,†J. Econ. Lit., vol. 58, no. 3, pp. 644–719, Sep. 2020, doi: 10.1257/JEL.20191385.

Y. Ouyang, H. Cai, X. Yu, and Z. Li, “Capitalization of social infrastructure into China’s urban and rural housing values: Empirical evidence from Bayesian Model Averaging,†Econ. Model., vol. 107, p. 105706, Feb. 2022, doi: 10.1016/J.ECONMOD.2021.105706.

M. Camarero, S. Moliner, and C. Tamarit, “Japan’s FDI drivers in a time of financial uncertainty. New evidence based on Bayesian Model Averaging,†Japan World Econ., vol. 57, p. 101058, Mar. 2021, doi: 10.1016/J.JAPWOR.2021.101058.

B. K. Bierut and P. Dybka, “Increase versus transformation of exports through technological and institutional innovation: Evidence from Bayesian model averaging,†Econ. Model., vol. 99, p. 105501, Jun. 2021, doi: 10.1016/J.ECONMOD.2021.105501.

J. Xu, F. Anctil, and M. A. Boucher, “Hydrological post-processing of streamflow forecasts issued from multimodel ensemble prediction systems,†J. Hydrol., vol. 578, p. 124002, Nov. 2019, doi: 10.1016/J.JHYDROL.2019.124002.

S. Samadi, M. Pourreza-Bilondi, C. A. M. E. Wilson, and D. B. Hitchcock, “Bayesian Model Averaging With Fixed and Flexible Priors: Theory, Concepts, and Calibration Experiments for Rainfall-Runoff Modeling,†J. Adv. Model. Earth Syst., vol. 12, no. 7, p. e2019MS001924, Jul. 2020, doi: 10.1029/2019MS001924.

Y. Hao, J. Baik, H. Tran, and M. Choi, “Quantification of the effect of hydrological drivers on actual evapotranspiration using the Bayesian model averaging approach for various landscapes over Northeast Asia,†J. Hydrol., p. 127543, Jan. 2022, doi: 10.1016/J.JHYDROL.2022.127543.

P. Darbandsari and P. Coulibaly, “Introducing entropy-based Bayesian model averaging for streamflow forecast,†J. Hydrol., vol. 591, p. 125577, Dec. 2020, doi: 10.1016/J.JHYDROL.2020.125577.

A. Rema and A. K. Swamy, “Use of Bayesian Model Averaging to Estimate Model Uncertainty for Predicting Strain in a Four-Layered Flexible Pavement,†ASCE-ASME J. Risk Uncertain. Eng. Syst. Part A Civ. Eng., vol. 7, no. 1, p. 04021002, Jan. 2021, doi: 10.1061/AJRUA6.0001123.

S. Depaoli, K. Lai, and Y. Yang, “Bayesian Model Averaging as an Alternative to Model Selection for Multilevel Models,†Multivariate Behav. Res., vol. 56, no. 6, pp. 920–940, 2020, doi: 10.1080/00273171.2020.1778439.

M. Gharekhani, A. A. Nadiri, R. Khatibi, S. Sadeghfam, and A. Asghari Moghaddam, “A study of uncertainties in groundwater vulnerability modelling using Bayesian model averaging (BMA),†J. Environ. Manage., vol. 303, p. 114168, Feb. 2022, doi: 10.1016/J.JENVMAN.2021.114168.

Y. Gao et al., “Evaluation of crop model prediction and uncertainty using Bayesian parameter estimation and Bayesian model averaging,†Agric. For. Meteorol., vol. 311, p. 108686, Dec. 2021, doi: 10.1016/J.AGRFORMET.2021.108686.

G. Zhang et al., “Solar radiation estimation in different climates with meteorological variables using Bayesian model averaging and new soft computing models,†Energy Reports, vol. 7, pp. 8973–8996, Nov. 2021, doi: 10.1016/J.EGYR.2021.10.117.

F. Panahi, M. Ehteram, A. N. Ahmed, Y. F. Huang, A. Mosavi, and A. El-Shafie, “Streamflow prediction with large climate indices using several hybrid multilayer perceptrons and copula Bayesian model averaging,†Ecol. Indic., vol. 133, p. 108285, Dec. 2021, doi: 10.1016/J.ECOLIND.2021.108285.

Y. Hao, J. Baik, and M. Choi, “Combining generalized complementary relationship models with the Bayesian Model Averaging method to estimate actual evapotranspiration over China,†Agric. For. Meteorol., vol. 279, p. 107759, Dec. 2019, doi: 10.1016/J.AGRFORMET.2019.107759.

D. Madigan and A. E. Raftery, “Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window,†J. Am. Stat. Assoc., vol. 89, no. 428, p. 1535, Dec. 1994, doi: 10.2307/2291017.

M. Höge, A. Guthke, and W. Nowak, “Bayesian Model Weighting: The Many Faces of Model Averaging,†Water, vol. 12, no. 2, p. 309, Jan. 2020, doi: 10.3390/W12020309.

D. Fouskakis and A. I. Ntzoufras, “Bayesian Model Averaging Using Power-Expected-Posterior Priors,†Econometrics, vol. 8, no. 2, p. 17, May 2020, doi: 10.3390/ECONOMETRICS8020017.

X. K. Zhou, F. Liu, and A. J. Dannenberg, “A Bayesian model averaging approach for observational gene expression studies,†Ann. Appl. Stat., vol. 6, no. 2, pp. 497–520, Jun. 2012, doi: 10.1214/11-AOAS526.

A. Annest, R. E. Bumgarner, A. E. Raftery, and K. Y. Yee, “Iterative bayesian model averaging: A method for the application of survival analysis to high-dimensional microarray data,†BMC Bioinformatics, vol. 10, no. 1, pp. 1–17, Feb. 2009, doi: 10.1186/1471-2105-10-72/TABLES/9.

K. Rufibach, “Use of Brier score to assess binary predictions,†J. Clin. Epidemiol., vol. 63, no. 8, pp. 938–939, Aug. 2010, doi: 10.1016/J.JCLINEPI.2009.11.009.

Y. J. Guan, J. Y. Ma, and W. Song, “Identification of circRNA-miRNA-mRNA regulatory network in gastric cancer by analysis of microarray data,†Cancer Cell Int., vol. 19, no. 1, pp. 1–9, Jul. 2019, doi: 10.1186/S12935-019-0905-Z/FIGURES/7.

E. Pettersson, J. Lundeberg, and A. Ahmadian, “Generations of sequencing technologies,†Genomics, vol. 93, no. 2, pp. 105–111, Feb. 2009, doi: 10.1016/J.YGENO.2008.10.003.

J. K. Kruschke, “Bayesian Analysis Reporting Guidelines,†Nat. Hum. Behav. 2021 510, vol. 5, no. 10, pp. 1282–1291, Aug. 2021, doi: 10.1038/s41562-021-01177-7.

A. E. Raftery, “Bayes Factors and BIC: Comment on ‘A Critique of the Bayesian Information Criterion for Model Selection,’†Sociol. Methods Res., vol. 27, no. 3, pp. 411–427, Feb. 1999, doi: 10.1177/0049124199027003005.

K. Y. Y. Wan and J. E. Griffin, “An adaptive MCMC method for Bayesian variable selection in logistic and accelerated failure time regression models,†Stat. Comput., vol. 31, no. 1, pp. 1–11, Jan. 2021, doi: 10.1007/S11222-020-09974-2/TABLES/4.

Z. Javanshiri, M. Fathi, and S. A. Mohammadi, “Comparison of the BMA and EMOS statistical methods for probabilistic quantitative precipitation forecasting,†Meteorol. Appl., vol. 28, no. 1, p. e1974, Jan. 2021, doi: 10.1002/MET.1974.



  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development