FAQ
logo of Jagiellonian University in Krakow

Feature Selection and Classification Pairwise Combinations for High-dimensional Tumour Biomedical Datasets

Publication date: 11.04.2016

Schedae Informaticae, 2015, Volume 24, pp. 53 - 62

https://doi.org/10.4467/20838476SI.15.005.3027

Authors

,
Agnieszka Wosiak
Institute of Information Technology Lodz University of Technology
All publications →
Agata Dziomdziora
Institute of Information Technology Lodz University of Technology
All publications →

Titles

Feature Selection and Classification Pairwise Combinations for High-dimensional Tumour Biomedical Datasets

Abstract

This paper concerns classification of high-dimensional yet small sample size biomedical data and feature selection aimed at reducing dimensionality of the microarray data. The research presents a comparison of pairwise combinations of six classification strategies, including decision trees, logistic model trees, Bayes network, Na¨ıve Bayes, k-nearest neighbours and sequential minimal optimization algorithm for training support vector machines, as well as seven attribute selection methods: Correlation-based Feature Selection, chi-squared, information gain, gain ratio, symmetrical uncertainty, ReliefF and SVM-RFE (Support Vector Machine-Recursive Feature Elimination). In this paper, SVMRFE feature selection technique combined with SMO classifier has demonstrated its potential ability to accurately and efficiently classify both binary and multiclass high-dimensional sets of tumour specimens.

References

[1] Chang C.-W., Cheng W.-C., Chen C.-R., Shu W.-Y., Tsai M.-L., et al., Identification of Human Housekeeping Genes and Tissue-Selective Genes by Microarray Meta-Analysis. PLoS ONE, 2011, 6(7): e22859, doi:10.1371/journal.pone.0022859.
[2] Dougherty E.R., Hua J., Sima C., Performance of Feature Selection Methods. Curr. Genomics. 2009, 10, pp. 365–374.
[3] Eisenberg E., Levanon E.Y., Human housekeeping genes, revisited. Trends in Genetics, October 2013, 29(10), pp. 569–574, doi:10.1016/j.tig.2013.05.010.
[4] Guyon I., Weston J., Barnhill S., Vapnik V., Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46, pp. 389–422.
[5] Janecek A., Gansterer W., Demel W., Ecker G., On the relationship between feature selection and classification accuracy. Journal of Machine Learning and Research, 2008, 4, pp. 90–105.
[6] Kumar A.P., Valsala P., Feature Selection for high Dimensional DNA Microarray data using hybrid approaches. Bioinformation, 2013, 9(16), pp. 824–828.
[7] Li X., Lu H., Wang M., A Hybrid Gene Selection Method for Multi-category Tumor Classification using Microarray Data. Int. J. Bioautomation, 2013, 17(4), pp. 249–258.
[8] Li X., Peng S., Zhan X., Zhang J., Xu Y., Comparison of feature selection methods for multiclass cancer classification based on microarray data. Proceedings of the 4th International Conference on Biomedical Engineering and Informatics (BMEI), 2011, 3, pp. 1692–1696.
[9] Liu G., Kong L., Gopalakrishnan V., A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets. AMIA Summits on Translational Science Proceedings, 2012, pp. 52–61.
[10] Podolak I. T., Roman A., CORES: fusion of supervised and unsupervised training methods for a multi-class classification problem. Pattern Analysis and Applications, 2011, 14, pp. 395–413.
[11] Saeys Y., Inaki I., Larranaga P., A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, 23(19), pp. 2507–2517.
[12] S´aez J.A., Luengo J., Stefanowski J., Herrera F., SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a resampling method with filtering. Information Sciences, 10 January 2015, 291, pp. 184–203, http://dx.doi.org/10.1016/j.ins.2014.08.051.
[13] Trevino V., Falciani F., Barrera-Saldana H.A., DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research. Molecular Medicine, 2007, 13(9–10), pp. 527–541.
[14] Wang X., Gotoh O., A Robust Gene Selection Method for Microarray-based Cancer Classification. Cancer Informatics, 2010, 9, pp. 15–30.
[15] Wang Y., Tetko I.V., Hall M.A., Frank E., Facius A., Mayer K.F., Gene selection from microarray data for cancer classification–a machine learning approach. Comput. Biol. Chem., 2005, 29, pp. 37–46.
[16] Wo´zniak M., Graa M., Corchado E., A survey of multiple classifier systems as hybrid systems. Information Fusion, 2014, 16, pp. 3–17.
[17] Zhang H., Wang H., Dai Z., Chen M.S., Yuan Z., Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinformatics, 2012, 13 (298), pp. 1.

Information

Information: Schedae Informaticae, 2015, Volume 24, pp. 53 - 62

Article type: Original article

Titles:

Polish:

Feature Selection and Classification Pairwise Combinations for High-dimensional Tumour Biomedical Datasets

English:

Feature Selection and Classification Pairwise Combinations for High-dimensional Tumour Biomedical Datasets

Authors

Institute of Information Technology Lodz University of Technology

Institute of Information Technology Lodz University of Technology

Published at: 11.04.2016

Article status: Open

Licence: None

Percentage share of authors:

Agnieszka Wosiak (Author) - 50%
Agata Dziomdziora (Author) - 50%

Article corrections:

-

Publication languages:

English