Feature selection from colon cancer dataset for cancer classification using Artificial Neural Network

Md. Akizur Rahman, Ravie Chandren Muniyandi


In the fast-growing field of medicine and its dynamic demand in research, a study that proves significant improvement to healthcare seems imperative especially when it is on cancer research. This research paved way to such significant findings by the inclusion of feature selection as one of its major components. Feature selection has become a vital task to apply data mining algorithms effectively in the real-world problems for classification. Feature selection has been the focus of interest for quite some time and much completed work related to it. Although much research conducted on the field, a study that proved a nearly perfect accuracy seems limited; hence, more scientifically driven results should be produced. Using various research on feature selection as basis for the choices in this study, the method was product of careful selection and planning. Specifically, this study used feature selection for improving classification accuracy on cancerous dataset. This study proposed Artificial Neural Network (ANN) for cancer classification with feature selection on colon cancer dataset. The study used best first search method in weka tools for feature selection. Through the process, a promising result has been achieved. The result of the experiment achieved 98.4 % accuracy for cancer classification after feature selection by using proposed algorithm. The result displayed that feature selection improved the classification accuracy based on the experiment conducted on the colon cancer dataset. The result of this experiment was comparable with the other studies on colon cancer research. It  showed another significant improvement and can be considered promising for more future applications.


Artificial Neural Network, Classification; Feature Selection; Colon Cancer.

Full Text:



I. . Sarkar et al., “Characteristic attributes in cancer microarrays,†J. Biomed. Inform., vol. 35, no. 2, pp. 111–122, Apr. 2002.

N. Elkhani and R. C. Muniyandi, “Membrane computing inspired feature selection model for microarray cancer data,†Intell. Data Anal., vol. 21, no. S1, pp. S137–S157, Apr. 2017.

G. H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,†in Machine Learning Proceedings 1994, Elsevier, 1994, pp. 121–129.

N. Elkhani and R. C. Muniyandi, “Membrane Computing to Model Feature Selection of Microarray Cancer Data,†Proc. ASE BigData Soc. 2015, p. 13, 2015.

R. Kohavi and D. Sommerfield, “Feature subset selection using the wrapper method: overfltting and dynamic search space topology,†Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, pp. 192–197, 1995.

D. Koller and M. Sahami, “Toward optimal feature selection,†ICML’96 Proc. Thirteen. Int. Conf. Int. Conf. Mach. Learn., pp. 284–292, 1996.

K. Islam, G. Mujtaba, R. R.-… ICE2T), 2017 International, and undefined 2017, “Elevator button and floor number recognition through hybrid image classification approach for navigation of service robot in buildings,†researchgate.net.

S. N. Das, M. Mathew, and P. K. Vijayaraghavan, “An Approach for Optimal Feature Subset Selection using a New Term Weighting Scheme and Mutual Information,†Int. J. Adv. Sci. Eng. Inf. Technol., vol. 1, no. 3, p. 273, 2011.

M. S. Park and J. Y. Choi, “Theoretical analysis on feature extraction capability of class-augmented PCA,†Pattern Recognit., vol. 42, no. 11, pp. 2353–2362, Nov. 2009.

Y. Saeys, I. Inza, and P. Larranaga, “A review of feature selection techniques in bioinformatics,†Bioinformatics, vol. 23, no. 19, pp. 2507–2517, Oct. 2007.

M. Monirul Kabir, M. Monirul Islam, and K. Murase, “A new wrapper feature selection approach using neural network,†Neurocomputing, vol. 73, no. 16–18, pp. 3273–3283, Oct. 2010.

R. Kohavi, “Feature Subset Selection as Search with Probabilistic Estimates,†"AAAI Fall Symp. Relev., pp. 122–126, 1994.

C. Cortes and V. Vapnik, “Support-vector networks,†Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995.

C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,†Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167, 1998.

M. Xi, J. Sun, L. Liu, F. Fan, and X. Wu, “Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine,†Comput. Math. Methods Med., vol. 2016, pp. 1–9, 2016.

M. A. Rahman and R. C. Muniyandi, “Review of GPU implementation to process of RNA sequence on cancer,†Informatics Med. Unlocked, vol. 10, pp. 17–26, Jan. 2018.

S. Aalaei, H. Shahraki, A. Rowhanimanesh, and S. Eslami, “Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets.,†Iran. J. Basic Med. Sci., vol. 19, no. 5, pp. 476–82, May 2016.

O. Inan, M. S. Uzer, and N. Yılmaz, “A NEW HYBRID FEATURE SELECTION METHOD BASED ON ASSOCIATION RULES AND PCA FOR DETECTION OF BREAST CANCER,†Int. J. Innov. Comput., vol. 9, no. 2, pp. 727–739, 2013.

N. N. Mohd Hasri, N. H. Wen, C. W. Howe, M. S. Mohamad, S. Deris, and S. Kasim, “Improved Support Vector Machine Using Multiple SVM-RFE for Cancer Classification,†Int. J. Adv. Sci. Eng. Inf. Technol., vol. 7, no. 4–2, p. 1589, Sep. 2017.

X. Sun, Y. Liu, M. Xu, H. Chen, J. Han, and K. Wang, “Feature selection using dynamic weights for classification,†Knowledge-Based Syst., vol. 37, pp. 541–549, Jan. 2013.

M. Yassi and M. H. Moattar, “Robust and stable feature selection by integrating ranking methods and wrapper technique in genetic data classification,†Biochem. Biophys. Res. Commun., vol. 446, no. 4, pp. 850–856, Apr. 2014.

T. Nguyen, A. Khosravi, D. Creighton, and S. Nahavandi, “A novel aggregate gene selection method for microarray data classification,†Pattern Recognit. Lett., vol. 60–61, pp. 16–23, Aug. 2015.

L. Gao, M. Ye, and C. Wu, “Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony,†Molecules, vol. 22, no. 12, p. 2086, Nov. 2017.

H. Salem, G. Attiya, and N. El-Fishawy, “Classification of human cancer diseases by gene expression profiles,†Appl. Soft Comput., vol. 50, pp. 124–134, Jan. 2017.

S. A. Ludwig, S. Picek, and D. Jakobovic, “Classification of Cancer Data: Analyzing Gene Expression Data Using a Fuzzy Decision Tree Algorithm,†Springer, Cham, 2018, pp. 327–347.

J. Hopfield, “Artificial neural networks,†IEEE Circuits Devices Mag., vol. 4, no. 5, 1988.

K. Islam, G. Mujtaba, … R. R.-E. T., and undefined 2017, “Handwritten digits recognition with artificial neural network,†ieeexplore.ieee.org.

G. Bebis and M. Georgiopoulos, “Feed-forward neural networks,†IEEE Potentials, vol. 13, no. 4, 1994.

K. T. Islam, R. G. Raj, and G. Mujtaba, “Recognition of Traffic Sign Based on Bag-of-Words and Artificial Neural Network,†Symmetry (Basel)., vol. 9, no. 8, p. 138, Jul. 2017.

N. M. Nawi, N. A. Hamid, N. A. Samsudin, M. A. Mohd Yunus, and M. F. Ab Aziz, “Second Order Learning Algorithm for Back Propagation Neural Networks,†Int. J. Adv. Sci. Eng. Inf. Technol., vol. 7, no. 4, pp. 1162–1171, 2017.

M. Paliwal and U. A. Kumar, “Neural networks and statistical techniques: A review of applications,†Expert Syst. Appl., vol. 36, no. 1, pp. 2–17, Jan. 2009.

R. Kohavi and G. H. John, “Wrappers for feature subset selection,†Artif. Intell., vol. 97, no. 1–2, pp. 273–324, Dec. 1997.

N. Elkhani and R. Muniyandi, “Review of the Effect of Feature Selection for Microarray Data on the Classification Accuracy for Cancer Data Sets,†Int. J. Soft Comput., vol. 11, no. 5, pp. 334–342, 2016.

S. S. Shreem, S. Abdullah, and M. Z. A. Nazri, “Hybridising harmony search with a Markov blanket for gene selection problems,†Inf. Sci. (Ny)., vol. 258, pp. 108–121, Feb. 2014.

S. Sahran, D. Albashish, A. Abdullah, N. A. Shukor, and S. Hayati Md Pauzi, “Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading,†Artif. Intell. Med., Apr. 2018.

F. M. Selaru et al., “Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions,†Gastroenterology, vol. 122, no. 3, pp. 606–613, Mar. 2002.

K. Islam and R. Raj, “Real-time (vision-based) road sign recognition using an artificial neural network,†Sensors, vol. 17, no. 4, 2017.

O. Alesawy and R. C. Muniyandi, “Elliptic Curve Diffie-Hellman Random Keys Using Artificial Neural Network and Genetic Algorithm for Secure Data over Private Cloud,†Inf. Technol. J., vol. 15, no. 3, pp. 77–83, Jun. 2016.

J. Redmond, R. Vanderpool, and R. McClung, “Effectively Communicating Colorectal Cancer Screening Information to Primary Care Providers,†Am. J. Heal. Educ., vol. 43, no. 4, pp. 194–201, Jul. 2012.

M. A. Pourhoseingholi, S. Kheirian, and M. R. Zali, “Comparison of Basic and Ensemble Data Mining Methods in Predicting 5-Year Survival of Colorectal Cancer Patients.,†Acta Inform. Med., vol. 25, no. 4, pp. 254–258, Dec. 2017.

DOI: http://dx.doi.org/10.18517/ijaseit.8.4-2.6790


  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development