FAQ
logo of Jagiellonian University in Krakow

Volume 24

2015 Next

Publication date: 11.04.2016

Licence: None

Editorial team

Editor-in-Chief Stanisław Migórski

Deputy Editor-in-Chief Andrzej Bielecki

Secretary Krzysztof Misztal

Issue content

Stanisław Jastrzębski, Wojciech Marian Czarnecki

Schedae Informaticae, Volume 24, 2015, pp. 9 - 19

https://doi.org/10.4467/20838476SI.15.001.3023

Support Vector Machines (SVM) with RBF kernel is one of the most successful models in machine learning based compounds biological activity prediction. Unfortunately, existing datasets are highly skewed and hard to analyze. During our research we try to answer the question how deep is activity concept modeled by SVM. We perform analysis using a model which embeds compounds’ representations in a low-dimensional real space using near neighbour search with Jaccard similarity. As a result we show that concepts learned by SVM is not much more complex than slightly richer nearest neighbours search. As an additional result, we propose a classification technique, based on Locally Sensitive ashing approximating the Jaccard similarity through minhashing technique, which performs well on 80 tested datasets (consisting of 10 proteins with 8 different representations) while in the same time allows fast classification and efficient online training.

Read more Next

Mateusz Malik, Przemysław Spurek, Jacek Tabor

Schedae Informaticae, Volume 24, 2015, pp. 21 - 29

https://doi.org/10.4467/20838476SI.15.002.3024

This paper presents a novel global thresholding algorithm for the binarization of documents and gray-scale images using Cross-Entropy Clustering. In the first step, a gray-level histogram is constructed, and the Gaussian densities are fitted. The thresholds are then determined as the cross-points of the Gaussian densities. This approach automatically detects the number of components (the upper limit of Gaussian densities is required).

Read more Next

Krzysztof Misztal, Przemysław Spurek, Emil Saeed, Khalid Saeed, Jacek Tabor

Schedae Informaticae, Volume 24, 2015, pp. 31 - 40

https://doi.org/10.4467/20838476SI.15.003.3025

This work presents the step by step tutorial for how to use cross entropy clustering for the iris segmentation. We present the detailed construction of a suitable Gaussian model which best fits for in the case of iris images, and this is the novelty of the proposal approach. The obtained results are promising, both pupil and iris are extracted properly and all the information necessary for human identification and verification can be extracted from the found parts of the iris.

Read more Next

Andrzej Rusiecki, Mirosław Kordos

Schedae Informaticae, Volume 24, 2015, pp. 41 - 51

https://doi.org/10.4467/20838476SI.15.004.3026

Deep learning is a field of research attracting nowadays much attention, mainly because deep architectures help in obtaining outstanding results on many vision, speech and natural language processing – related tasks. To make deep learning effective, very often an unsupervised pretraining phase is applied. In this article, we present experimental study evaluating usefulness of such approach, testing on several benchmarks and different percentages of labeled data, how Contrastive Divergence (CD), one of the most popular pretraining methods, influences network generalization.

Read more Next

Agnieszka Wosiak, Agata Dziomdziora

Schedae Informaticae, Volume 24, 2015, pp. 53 - 62

https://doi.org/10.4467/20838476SI.15.005.3027

This paper concerns classification of high-dimensional yet small sample size biomedical data and feature selection aimed at reducing dimensionality of the microarray data. The research presents a comparison of pairwise combinations of six classification strategies, including decision trees, logistic model trees, Bayes network, Na¨ıve Bayes, k-nearest neighbours and sequential minimal optimization algorithm for training support vector machines, as well as seven attribute selection methods: Correlation-based Feature Selection, chi-squared, information gain, gain ratio, symmetrical uncertainty, ReliefF and SVM-RFE (Support Vector Machine-Recursive Feature Elimination). In this paper, SVMRFE feature selection technique combined with SMO classifier has demonstrated its potential ability to accurately and efficiently classify both binary and multiclass high-dimensional sets of tumour specimens.

Read more Next

Tomasz Andrysiak, Łukasz Saganowski

Schedae Informaticae, Volume 24, 2015, pp. 63 - 71

https://doi.org/10.4467/20838476SI.15.006.3028

In this article we present the use of sparse representation of a signal and incoherent dictionary learning method for the purpose of network traffic analysis. In learning process we use 1D INK-SVD algorithm to detect proper dictionary structure. Anomaly detection is realized by parameter estimation of the analyzed signal and its comparative analysis to network traffic profiles. Efficiency of our method is examined with the use of extended set of test traces from real network traffic. Received experimental results confirm effectiveness of the presented method.

Read more Next

Filipo S. Perotto

Schedae Informaticae, Volume 24, 2015, pp. 73 - 82

https://doi.org/10.4467/20838476SI.15.007.3029

Balancing exploratory and exploitative behavior is an essential dilemma faced by adaptive agents. The challenge of finding a good trade-off between exploration (learn new things) and exploitation (act optimally based on what is already known) has been largely studied for decision-making problems where the agent must learn a policy of actions. In this paper we propose the engaged climber method, designed for solving the exploration-exploitation dilemma. The solution consists in explicitly creating two different policies (for exploring or for exploiting), and to determine the good moments to shift from the one to the other by the use of notions like engagement and curiosity.

Read more Next

Magdalena Wiercioch, Marek Śmieja

Schedae Informaticae, Volume 24, 2015, pp. 83 - 92

https://doi.org/10.4467/20838476SI.15.008.3030

The selection of data representation and metric for a given data set is one of the most crucial problems in machine learning since it affects the results of classification and clustering methods. In this paper we investigate how to combine a various data representations and metrics into a single function which better reflects the relationships between data set elements than a single representation-metric pair. Our approach relies on optimizing a linear combination of selected distance measures with use of least square approximation. The application of our method for classification and clustering of chemical compounds seems to increase the accuracy of these methods.

Read more Next

Anna Szczepanek

Schedae Informaticae, Volume 24, 2015, pp. 93 - 101

https://doi.org/10.4467/20838476SI.15.009.3031

This paper presents a formalised description of the models of influence propagation in social networks introduced in the classic paper of Kempe et al. The formal framework that we propose clarifies the structure of the most popular propagation models and helps rigorously re-establish the essential results concerning the problem of influence maximisation. We also introduce new models of propagation and show how they fit into the general picture. In particular, we focus on models that capture either positive or negative effects of resisting influence on individual’s future resistance.

Read more Next

Andrzej Szwabe, Pawel Misiorek, Michal Ciesielczyk

Schedae Informaticae, Volume 24, 2015, pp. 103 - 112

https://doi.org/10.4467/20838476SI.15.010.3032

We propose a novel model of multilinear filtering based on a hierarchical structure of covariance matrices – each matrix being extracted from the input tensor in accordance to a specific set-theoretic model of data generalization, such as derivation of expectation values. The experimental analysis results presented in this paper confirm that the investigated approaches to tensor-based data representation and processing outperform the standard collaborative filtering approach in the ‘cold-start’ personalized recommendation scenario (of very sparse input data). Furthermore, it has been shown that the proposed method is superior to standard tensor-based frameworks such as N-way Random Indexing (NRI) and Higher-Order Singular Value Decomposition (HOSVD) in terms of both the AUROC measure and computation time.

Read more Next

Arkadiusz Tomczyk

Schedae Informaticae, Volume 24, 2015, pp. 113 - 121

https://doi.org/10.4467/20838476SI.15.011.3033

In the paper a contour ensemble image segmentation concept is presented. It bases on the previously observed relationship between contours and classifiers. Because of the specificity of the active contour segmentation the method requires a special procedure to obtain ensemble members with desired properties. In this work it is achieved by early stopping of randomized optimization algorithm. The results of the method are illustrated with a practical problem of heart ventricle segmentation by means of active potential contours. Automatically found contours may be of use in a process of pulmonary embolism diagnosis.

Read more Next

Wojciech Marian Czarnecki

Schedae Informaticae, Volume 24, 2015, pp. 123 - 132

https://doi.org/10.4467/20838476SI.15.012.3034

Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea which employs information theoretic concept in order to create a multithreshold maximum margin model. In this paper we analyze its consistency over multithreshold linear models and show that its objective function upper bounds the amount of misclassified points in a similar manner like hinge loss does in support vector machines. For further confirmation we also conduct some numerical experiments on five datasets.

Read more Next

Łukasz Struski, Jacek Tabor, Przemysław Spurek

Schedae Informaticae, Volume 24, 2015, pp. 133 - 142

https://doi.org/10.4467/20838476SI.15.013.3035

We present a new subspace clustering method called SuMC (Subspace Memory Clustering), which allows to efficiently divide a dataset D  RN into k 2 N pairwise disjoint clusters of possibly different dimensions. Since our approach is based on the memory compression, we do not need to explicitly specify dimensions of groups: in fact we only need to specify the mean number of scalars which is used to describe a data-point. In the case of one cluster our method reduces to a classical Karhunen-Loeve (PCA) transform. We test our method on some typical data from UCI repository and on data coming from real-life experiments.

Read more Next

Aleksander Czechowski, Piotr Zgliczyński

Schedae Informaticae, Volume 24, 2015, pp. 143 - 158

https://doi.org/10.4467/20838476SI.15.014.3486

We consider the Boussinesq PDE perturbed by a time-dependent forcing. Even though there is no smoothing effect for arbitrary smooth initial data, we are able to apply the method of self-consistent bounds to deduce the existence of smooth classical periodic solutions in the vicinity of 0. The proof is non-perturbative and relies on construction of periodic isolating segments in the Galerkin projections.

Read more Next

Oskar Wyszyński

Schedae Informaticae, Volume 24, 2015, pp. 159 - 177

https://doi.org/10.4467/20838476SI.15.015.3997

In this paper, a novel probabilistic tracking method is proposed. It combines two competing models: (i) a discriminative one for background classification; and (ii) a generative one as a track model. The model competition, along with a combinatorial data association, shows good signal and background noise separation. Furthermore, a stochastic and derivative-free method is used for parameter optimization by means of the Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES). Finally, the applicability and performance of the particle trajectories reconstruction are shown. The algorithm is developed for NA61/SHINE data reconstruction purpose and therefore the method was tested on simulation data of the NA61/SHINE experiment.

Read more Next

Tomasz Wojtowicz

Schedae Informaticae, Volume 24, 2015, pp. 179 - 195

https://doi.org/10.4467/20838476SI.16.016.4357

Transistor level simulation of the CPU, while very accurate, brings also the performance challenge. MOS6502 CPU simulation algorithm is analysed with several optimisation techniques proposed. Application of these techniques improved the transistor level simulation speed by a factor of 3–4, bringing it to the levels on par with fastest RTL-level simulations so far.

Read more Next

Tomasz Wojtowicz

Schedae Informaticae, Volume 24, 2015, pp. 197 - 210

https://doi.org/10.4467/20838476SI.16.017.4358

Deep understanding of microprocessor architecture, its internal structure and mechanics of its work is essential for engineers in the fields like computer science, integrated circuit design or embedded systems (including microcontrollers). Usually the CPU architecture is presented at the level of ISA, functional decomposition of the chip and data flows. In this paper we propose more tangible, interactive and effective approach to present the CPU microarchitecture. Based on the recent advancements in simulation of MOS6502, one of the most successful microprocessor of all times, that started the personal computing revolution, we present the CPU visualisation framework. The framework supports showing CPU internals at various levels (from single transistor, through logic gates, ending with registers, operation decoders and ALU). It allows for execution of real code and detailed analysis of fetch–decode–execute cycle, measurement of cycles per operation or measurement of the CPU activity factor. The analysis means provided by this framework will also enable us to propose the transistor level simulation speed improvements to the model in the future.

Read more Next

Tomasz Wójtowicz

Schedae Informaticae, Volume 24, 2015, pp. 211 - 220

https://doi.org/10.4467/20838476SI.16.018.4359

This paper presents an optimal scheduling solution for a case of agents sharing a resource. The amount of resource can not satisfy all agents at once and in case of runout there is a penalty. Each agent randomly toggle its state between requiring and not requiring the resource. Using the knowledge of previous state and probability of change, the scheduling algorithm is able to calculate optimal number of concuring agents for one turn, that minimizes possibility of collision yet provides as much throughput as possible. Several different scheduling strategies are tested. The optimal solution adapts automatically to the value of probability of change. Further experiments show that optimality is retained if only the average probability of a set of agents is known. A case of practical application is provided.

Read more Next

Alexander Meduna, Jiřĭ Kučera

Schedae Informaticae, Volume 24, 2015, pp. 221 - 237

https://doi.org/10.4467/20838476SI.16.019.4360

In this paper, we introduce a new kind of automata systems, called state-synchronized automata systems of degree n. In general, they consists of n pushdown automata, referred to as their components. These systems can perform a computation step provided that the concatenation of the current states of all their components belongs to a prescribed control language. As its main result, the paper demonstrates that these systems characterize the family of recursively enumerable languages. In fact, this characterization is demostrated in both deterministic and nondeterministic versions of these systems. Restricting their components, these systems provides less computational power.

Read more Next

Sławomir Bakalarski, Jakub Zygadło

Schedae Informaticae, Volume 24, 2015, pp. 239 - 251

https://doi.org/10.4467/20838476SI.16.020.4361

A subset S of vertices of a graph G = (V,E) is called a k-path vertex cover if every path on k vertices in G contains at least one vertex from S. Denote by Ψk (G) the minimum cardinality of a k-path vertex cover in G and form a sequence Ψ (G) = (Ψ1(G), Ψ2 (G), . . . , Ψ|V|(G)), called the path sequence of G. In this paper we prove necessary and sufficient conditions for two integers to appear on fixed positions in Ψ(G). A complete list of all possible path sequences (with multiplicities) for small connected graphs is also given.

Read more Next

Szymon Szomiński, Anna Plichta

Schedae Informaticae, Volume 24, 2015, pp. 253 - 260

https://doi.org/10.4467/20838476SI.16.021.4362

The paper concerns the construction scheme of Direct Digital Synthesis (DDS) generator based on widely developed Field Programmable Gate Arrays (FPGA) technology. Firstly, the division of all generators is presented regarding the fact whether they generate sinusoidal or non-sinusoidal signals and whether the they have positive or negative feedback. Next chapter concerns how to generate frequency directly and indirectly synthesis and how to integrate these types of sytheses into the hybrid one. According to the blueprint, DDS generator if fully digital, which facilitates the control enabling to change almost at will the oscillation while the system works. Next chapters contain the conclusions and some proposals regarding practical application. DDS synthesizer has been made in accordance to the assumptions of the project, which was confirmed in practice. Control program enables us to fuse some measuring systems together into one, which facilitates simultaneous gathering of the data from several devices. Hence, the overall efficiency rises and the costs of carried out experiments are reduced. In the case of foreigners, it will be supplied by editors.

Read more Next

Słowa kluczowe: Support Vector Machines, Locally Sensitive Hashing, Jaccard similarity, Cross-Entropy Clustering, thresholding, binarization, Otsu, CEC, cross entropy clustering, iris recognition, biometrics, neural networks, deep learning, restricted Boltzmann Machine, contrastive divergence, dictionary learning, sparse representation, anomaly detection, Dictionary learning, sparse representation, anomaly detection, learning and adaptation, Markovian Decision Process, Exploration- Exploitation Dilemma, metric learning, classification, clustering, chemical compound activity, fingerprint, social networks, influence propagation, influence maximisation, viral marketing, tensor-based data modeling, multilinear PCA, random indexing, dimensionality reduction, multilinear data filtering, higher-order SVD, active contours, classifier ensembles, information fusion, potential contours, multithreshold classifier, entropy, consistency, classification theory, subspace clustering, projection clustering, PCA, Boussinesq equation, ill-posed PDEs, periodic solutions, isolating segments, tracking, event, reconstruction, particle, high, energy, physics, HEP, NA61, SHINE, CERN, TPC, magnetic, field, CMA, evolutionary, strategy, bayes, generative, discriminative, CPU, microarchitecture, simulation, registers, pipeline, activity, 6502, 6502, CPU, microarchitecture, simulation, registers, pipeline, activity, hard real-time systems, scheduling, shared resource, optimization, round-robin, state-synchronized automata systems, automata systems, pushdown automata, determinism, recursively enumerable languages, k-path vertex cover, path sequence, list for small graphs, frequency synthesis, digital oscillations, field programmable gate arrays systems, FPGA