TY - JOUR
TI - Short text similarity algorithm based on the edit distance and thesaurus
AU - Niewiarowski, Artur
TI - Short text similarity algorithm based on the edit distance and thesaurus
AB - 
	This paper proposes a method of comparing the short texts using the Levenshtein distance algorithm and thesaurus for analysing terms enclosed in texts instead of popular methods exploiting the grammatical variations glossary. The tested texts contain a variety of nouns and verbs together with grammatical or orthographical mistakes. Based on the proposed new algorithm the similarity of such texts will be estimated. The described technique is compared with methods: Cosine distances, distance Dice and Jaccard distance constructed on the term frequency method. The proposition is competitive against well-known algorithms of stemming and lemmatization.
VL - 2016
IS - Fundamental Sciences Issue 1-NP 2016
PY - 2016
SN - 0011-4561
C1 - 2353-737X
SP - 159
EP - 173
DO - 10.4467/2353737XCT.16.149.5760
UR - https://ejournals.eu/en/journal/czasopismo-techniczne/article/short-text-similarity-algorithm-based-on-the-edit-distance-and-thesaurus
KW - Levenshtein distance algorithm
KW - the edit distance
KW - thesaurus
KW - the measure of texts similarity
KW - plagiarism detection
KW - text mining
KW - Natural Language Processing
KW - Natural Language Understanding
KW - stemming
KW - lemmatization