TY - JOUR TI - Analysis of Compounds Activity Concept Learned by SVM Using Robust Jaccard Based Low-dimensional Embedding AU - Jastrzębski, Stanisław AU - Czarnecki, Wojciech Marian TI - Analysis of Compounds Activity Concept Learned by SVM Using Robust Jaccard Based Low-dimensional Embedding AB - Support Vector Machines (SVM) with RBF kernel is one of the most successful models in machine learning based compounds biological activity prediction. Unfortunately, existing datasets are highly skewed and hard to analyze. During our research we try to answer the question how deep is activity concept modeled by SVM. We perform analysis using a model which embeds compounds’ representations in a low-dimensional real space using near neighbour search with Jaccard similarity. As a result we show that concepts learned by SVM is not much more complex than slightly richer nearest neighbours search. As an additional result, we propose a classification technique, based on Locally Sensitive ashing approximating the Jaccard similarity through minhashing technique, which performs well on 80 tested datasets (consisting of 10 proteins with 8 different representations) while in the same time allows fast classification and efficient online training. VL - 2015 IS - Volume 24 PY - 2016 SN - 1732-3916 C1 - 2083-8476 SP - 9 EP - 19 DO - 10.4467/20838476SI.15.001.3023 UR - https://ejournals.eu/en/journal/schedae-informaticae/article/analysis-of-compounds-activity-concept-learned-by-svm-using-robust-jaccard-based-low-dimensional-embedding KW - Support Vector Machines KW - Locally Sensitive Hashing KW - Jaccard similarity