Jacek Tabor
Schedae Informaticae, Volume 24, 2015, pp. 31-40
https://doi.org/10.4467/20838476SI.15.003.3025This work presents the step by step tutorial for how to use cross entropy clustering for the iris segmentation. We present the detailed construction of a suitable Gaussian model which best fits for in the case of iris images, and this is the novelty of the proposal approach. The obtained results are promising, both pupil and iris are extracted properly and all the information necessary for human identification and verification can be extracted from the found parts of the iris.
Jacek Tabor
Schedae Informaticae, Volume 24, 2015, pp. 21-29
https://doi.org/10.4467/20838476SI.15.002.3024This paper presents a novel global thresholding algorithm for the binarization of documents and gray-scale images using Cross-Entropy Clustering. In the first step, a gray-level histogram is constructed, and the Gaussian densities are fitted. The thresholds are then determined as the cross-points of the Gaussian densities. This approach automatically detects the number of components (the upper limit of Gaussian densities is required).
Jacek Tabor
Schedae Informaticae, Volume 28, 2019, pp. 25-47
https://doi.org/10.4467/20838476SI.19.002.14379Independent Component Analysis (ICA) is a method for searching the linear transformation that minimizes the statistical dependence between its components. Most popular ICA methods use kurtosis as a metric of independence (non-Gaussianity) to maximize, such as FastICA and JADE. However, their assumption of fourth-order moment (kurtosis) may not always be satisfied in practice. One of the possible solution is to use third-order moment (skewness) instead of kurtosis, which was applied in ICA_SG and EcoICA. In this paper we present a competitive approach to ICA based on the Split Generalized Gaussian distribution (SGGD), which is well adapted to heavy-tailed as well as asymmetric data. Consequently, we obtain a method which works better than the classical approaches, in both cases: heavy tails and non-symmetric data.
Jacek Tabor
Schedae Informaticae, Volume 27, 2018, pp. 129-141
https://doi.org/10.4467/20838476SI.18.010.10415In this paper we present a method with closed analytic formula of stitching aligned images.
It is obtained by choosing a statistically optimal global color change of one part of image. This approach, due to its numerical efficiency, is especially well-suited for merging big amount of satellite images into a single map.
Moreover, we present solution of a general problem, how to find an optimal shift by v of data Y with respect to v from V, so that the dataset X, Y+v is maximally statistically consistent. We show that the solution is given in a closed analytic form.
Jacek Tabor
Schedae Informaticae, Volume 27, 2018, pp. 47-57
https://doi.org/10.4467/20838476SI.18.004.10409In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a function f, a point x, and the gradient ▽xf of f, we aim to find the step-size h which is (locally) optimal, i.e. satisfies:
h = arg min f(x - t▽xf).
t≥0
Making use of quadratic approximation, we show that the algorithm satisfies the above assumption. We experimentally show that our method is insensitive to the choice of initial learning rate while achieving results comparable to other methods.
Jacek Tabor
Schedae Informaticae, Volume 26, 2017, pp. 23-35
https://doi.org/10.4467/20838476SI.17.001.6807The use of machine learning methods in the case of incomplete data is an important task in many scientific fields, like medicine, biology, or face recognition. Typically, missing values are substituted with artificial values that are estimated from the known samples, and the classical machine learning algorithms are applied. Although this methodology is very common, it produces less informative data, because artificially generated values are treated in the same way as the known ones. In this paper, we consider a probabilistic representation of missing data, where each vector is identified with a Gaussian probability density function, modeling the uncertainty of absent attributes. This representation allows to construct an analogue of RBF kernel for incomplete data. We show that such a kernel can be successfully used in regression SVM. Experimental results confirm that our approach capture relevant information that is not captured by traditional imputation methods.
Jacek Tabor
Schedae Informaticae, Volume 27, 2018, pp. 69-79
https://doi.org/10.4467/20838476SI.18.006.10411In this paper we discuss a class of AutoEncoder based generative models based on one dimensional sliced approach. The idea is based on the reduction of the discrimination between samples to one-dimensional case.
Our experiments show that methods can be divided into two groups. First consists of methods which are a modification of standard normality tests, while the second is based on classical distances between samples.
It turns out that both groups are correct generative models, but the second one gives a slightly faster decrease rate of Frechet Inception Distance (FID).
Jacek Tabor
Schedae Informaticae, Volume 24, 2015, pp. 133-142
https://doi.org/10.4467/20838476SI.15.013.3035We present a new subspace clustering method called SuMC (Subspace Memory Clustering), which allows to efficiently divide a dataset D RN into k 2 N pairwise disjoint clusters of possibly different dimensions. Since our approach is based on the memory compression, we do not need to explicitly specify dimensions of groups: in fact we only need to specify the mean number of scalars which is used to describe a data-point. In the case of one cluster our method reduces to a classical Karhunen-Loeve (PCA) transform. We test our method on some typical data from UCI repository and on data coming from real-life experiments.