Artur Niewiarowski
Technical Transactions, Fundamental Sciences Issue 3 NP (17) 2014, 2014, pp. 109-112
https://doi.org/10.4467/2353737XCT.14.319.3407This paper presents a method for the parallelization of the Levenshtein distance algorithm deployed on very large strings. The proposed approach was accomplished using .NET Framework 4.0 technology with a specific implementation of threads using the System. Threading.Task namespace library. The algorithms developed in this study were tested on a high performance machine using Xamarin Mono (for Linux RedHat/Fedora OS). The computational results demonstrate a high level of efficiency of the proposed parallelization procedure.
Artur Niewiarowski
Technical Transactions, Fundamental Sciences Issue 1-NP 2016, 2016, pp. 159-173
https://doi.org/10.4467/2353737XCT.16.149.5760This paper proposes a method of comparing the short texts using the Levenshtein distance algorithm and thesaurus for analysing terms enclosed in texts instead of popular methods exploiting the grammatical variations glossary. The tested texts contain a variety of nouns and verbs together with grammatical or orthographical mistakes. Based on the proposed new algorithm the similarity of such texts will be estimated. The described technique is compared with methods: Cosine distances, distance Dice and Jaccard distance constructed on the term frequency method. The proposition is competitive against well-known algorithms of stemming and lemmatization.