ejournals

Information

Afilliation

Jagiellonian University in Kraków, Gołębia 24, 31-007 Kraków, Poland

Articles

Sort:

Author

Mixture of Metrics Optimization for Machine Learning Problems

Marek Śmieja

Schedae Informaticae, Volume 24, 2015, pp. 83 - 92

https://doi.org/10.4467/20838476SI.15.008.3030

The selection of data representation and metric for a given data set is one of the most crucial problems in machine learning since it affects the results of classification and clustering methods. In this paper we investigate how to combine a various data representations and metrics into a single function which better reflects the relationships between data set elements than a single representation-metric pair. Our approach relies on optimizing a linear combination of selected distance measures with use of least square approximation. The application of our method for classification and clustering of chemical compounds seems to increase the accuracy of these methods.

PDF

Author

Regression SVM for Incomplete Data

Marek Śmieja

Schedae Informaticae, Volume 26, 2017, pp. 23 - 35

https://doi.org/10.4467/20838476SI.17.001.6807

The use of machine learning methods in the case of incomplete data is an important task in many scientific fields, like medicine, biology, or face recognition. Typically, missing values are substituted with artificial values that are estimated from the known samples, and the classical machine learning algorithms are applied. Although this methodology is very common, it produces less informative data, because artificially generated values are treated in the same way as the known ones. In this paper, we consider a probabilistic representation of missing data, where each vector is identified with a Gaussian probability density function, modeling the uncertainty of absent attributes. This representation allows to construct an analogue of RBF kernel for incomplete data. We show that such a kernel can be successfully used in regression SVM. Experimental results confirm that our approach capture relevant information that is not captured by traditional imputation methods.

PDF