Short text similarity algorithm based on the edit distance and thesaurus
cytuj
pobierz pliki
RIS BIB ENDNOTEWybierz format
RIS BIB ENDNOTEShort text similarity algorithm based on the edit distance and thesaurus
Data publikacji: 14.12.2016
Czasopismo Techniczne, 2016, Nauki Podstawowe Zeszyt 1-NP 2016, s. 159 - 173
https://doi.org/10.4467/2353737XCT.16.149.5760Autorzy
Short text similarity algorithm based on the edit distance and thesaurus
This paper proposes a method of comparing the short texts using the Levenshtein distance algorithm and thesaurus for analysing terms enclosed in texts instead of popular methods exploiting the grammatical variations glossary. The tested texts contain a variety of nouns and verbs together with grammatical or orthographical mistakes. Based on the proposed new algorithm the similarity of such texts will be estimated. The described technique is compared with methods: Cosine distances, distance Dice and Jaccard distance constructed on the term frequency method. The proposition is competitive against well-known algorithms of stemming and lemmatization.
Informacje: Czasopismo Techniczne, 2016, Nauki Podstawowe Zeszyt 1-NP 2016, s. 159 - 173
Typ artykułu: Oryginalny artykuł naukowy
Tytuły:
Short text similarity algorithm based on the edit distance and thesaurus
Short text similarity algorithm based on the edit distance and thesaurus
Institute of Computer Science, Faculty of Physics, Mathematics and Computer Science of Cracow University of Technology
Publikacja: 14.12.2016
Status artykułu: Otwarte
Licencja: Żadna
Udział procentowy autorów:
Korekty artykułu:
-Języki publikacji:
AngielskiLiczba wyświetleń: 1584
Liczba pobrań: 1257