Stanisław Jastrzębski
Schedae Informaticae, Volume 24, 2015, pp. 9 - 19
https://doi.org/10.4467/20838476SI.15.001.3023Support Vector Machines (SVM) with RBF kernel is one of the most successful models in machine learning based compounds biological activity prediction. Unfortunately, existing datasets are highly skewed and hard to analyze. During our research we try to answer the question how deep is activity concept modeled by SVM. We perform analysis using a model which embeds compounds’ representations in a low-dimensional real space using near neighbour search with Jaccard similarity. As a result we show that concepts learned by SVM is not much more complex than slightly richer nearest neighbours search. As an additional result, we propose a classification technique, based on Locally Sensitive ashing approximating the Jaccard similarity through minhashing technique, which performs well on 80 tested datasets (consisting of 10 proteins with 8 different representations) while in the same time allows fast classification and efficient online training.
Stanisław Jastrzębski
Schedae Informaticae, Volume 27, 2018, pp. 143 - 153
https://doi.org/10.4467/20838476SI.18.011.10416Natural language inference (NLI) is a central problem in natural language processing (NLP) of predicting the logical relationship between a pair of sentences. Lexical knowledge, which represents relations between words, is often important for solving NLI problems. This knowledge can be accessed by using an external knowledge base (KB), but this is limited to when such a resource is accessible. Instead of using a KB, we propose a simple architectural change for attention based models. We show that by adding a skip connection from the input to the attention layer we can utilize better the lexical knowledge already present in the pretrained word embeddings. Finally, we demonstrate that our strategy allows to use an external source of knowledge in a straightforward manner by incorporating a second word embedding space in the model.
Stanisław Jastrzębski
Schedae Informaticae, Volume 25, 2016, pp. 37 - 47
https://doi.org/10.4467/20838476SI.16.003.6184There is a strong research eort towards developing models that can achieve state-of-the-art results without sacrificing interpretability and simplicity. One of such is recently proposed Recursive Random Support Vector Machine (R2SVM) model, which is composed of stacked linear models. R2SVM was reported to learn deep representations outperforming many strong classiffiers like Deep Convolutional Neural Network. In this paper we try to analyze it both from theoretical and empirical perspective and show its important limitations.Analysis of similar model Deep Representation Extreme Learning Machine (DrELM) is also included. It is concluded that models in its current form achieves lower accuracy scores than Support Vector Machine with Radial Basis Function kernel.