Experiments with language combinatorics in text classification: lessons learned and future
implications
cytuj
pobierz pliki
RIS BIB ENDNOTEWybierz format
RIS BIB ENDNOTE
Experiments with language combinatorics in text classification: lessons learned and future
implications
Data publikacji: 22.11.2017
Czasopismo Techniczne, 2017, Volume 11 Year 2017 (114), s. 183 - 197
https://doi.org/10.4467/2353737XCT.17.199.7428Autorzy
Experiments with language combinatorics in text classification: lessons learned and future
implications
W niniejszym artykule przedstawiono metaanalizę badań przeprowadzonych za pomocą kombinatoryki językowej (language combinatorics, LC), nowej metody generacji modelu języka i ekstrakcji cech, opartej o kombinacyjne manipulacje na elementach zdań (np. słowa). W trakcie ostatnich lat LC została zastosowana do wielu zadań z dziedziny klasyfikacji tekstu, takich jak analiza afektu, wykrywanie cyberagresji lub ekstrakcja odniesień do przyszłych wydarzeń. W niniejszym artykule podsumowujemy dwa z najbardziej obszernych doświadczeń i omawiamy ogólne implikacje dotyczące przyszłych zastosowań kombinatoryjnego modelu języka.
[1] Ptaszynski M., Masui F., Rzepka R., Araki K., First Glance on Pattern-based Language Modeling, Language Acquisition and Understanding Research Group Technical Reports, 2014.
[2] Ptaszynski M., Masui F., Kimura Y., Rzepka R., Araki K., Extracting Patterns of Harmful Expressions for Cyberbullying Detection, Proceedings of LTC’15, 2016, 370-375.
[3] Ptaszynski M., Masui F., Rzepka R., Araki K., Subjective? Emotional? Emotive?: Language Combinatorics based Automatic Detection of Emotionally Loaded Sentences, Linguistics and Literature Studies, Vol. 5, No. 1, 2017, 36-50.
[4] Bickel S., Haider P., Scheffer T., Predicting sentences using n-gram language models, Proceedings of HLT-EMNLP 2005, 2005, 193-200.
[5] Li Haizhou, Bin Ma, A phonotactic language model for spoken language identification, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, 515-522.
[6] Ponte J.M., Croft W.B., A language modeling approach to information retrieval, Proceedings of the 21st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, 275-281.
[7] Brown P.F., Cocke J., Pietra S.A.D., Pietra V.J.D., Jelinek F., Lafferty J.D., Mercer R.L., Roossin P.S., A statistical approach to machine translation, Computational Linguistics, Vol. 16, No. 2, 1990, 79-85.
[8] Mays E., Damerau F.J., Mercer R.L., Context based spelling correction, Information Processing & Management, Vol. 27, No. 5, 1991, 517-522.
[9] Kupiec J., Robust part-of-speech tagging using a hidden Markov model, Computer Speech & Language, Vol. 6, No.3, 1992, 225-242.
[10] Hu Y., Lu R., Li X., Chen Y., Duan J., A language modeling approach to sentiment analysis, Computational Science – ICCS 2007, 1186-1193.
[11] Ptaszynski M., Rzepka R., Araki K., Momouchi Y., Language combinatorics: A sentence pattern extraction architecture based on combinatorial explosion, International Journal of Computational Linguistics (IJCL), Vol. 2, No. 1, 2011, 24-36.
[12] Harris Z., Distributional Structure, Word, Vol. 10, N. 2/3, 1954, 146-162.
[13] Cambria E., Hussain A., Sentic Computing: Techniques, Tools, and Applications, Springer, 2012.
[14] Lu Y., Zhai C.X., Positional Language Models for Information Retrieval, 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009, 299-306.
[15] Markov A.A., Extension of the limit theorems of probability theory to a sum of variables connected in a chain, Reprinted in Appendix B of: R. Howard, Dynamic Probabilistic Systems, Vol. 1: Markov Chains, John Wiley and Sons, 1971.
[16] Huang X., Alleva F., Hon H.W., Hwang M.Y., Rosenfeld R., The SPHINX-II Speech Recognition System: An Overview,Computer, Speech and Language, Vol. 7, 1992, 137-148.
[17] Guthrie D., Allison B., Liu W., Guthrie L., Wilks Y., A closer look at skip-gram modelling, Proceedings of LREC-2006, 2006, 1-4.
[18] Pickhardt R., Gottron T., Korner M., Wagner P.G., Speicher T., Staab S., A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing, Proceedings of ACL 2014, 2014, 1145-1154.
[19] Ptaszynski M., Lempa P., Masui F., A Modular System for Support of Experiments in Text Classification, Technical Transactions, vol. 7-B/2015, 229-243.
[20] Nakajima Y., Ptaszynski M., Honma H., Masui F., Investigation of Future Reference Expressions in Trend Information, Proceedings of the 2014 AAAI Spring Symposium Series, 2014, 31-38.
[21] Ptaszynski M., Dybala P., Rzepka R., Araki K., Affecting Corpora: Experiments with Automatic Affect Annotation System – A Case Study of the 2channel Forum, Proceedings of PACLING-09, 2009, 223-228.
[22] Human Rights Research Institute Against All Forms for Discrimination and Racism in Mie Prefecture, Japan, http://www.pref.mie.lg.jp/jinkenc/hp/ (access: 21.04.2017).
[23] Ministry of Education, Culture, Sports, Science and Technology (MEXT), ‘Netto-jo no ijime’ ni kansuru taio manyuaru jirei shu (gakko, kyoin muke), MEXT, 2008.
[24] Ure J., Lexical density and register differentiation, [in:] Applications of Linguistics, (eds.) G. Perren, J.L.M. Trim, Cambridge University Press, London 1971, 443-452.
Informacje: Czasopismo Techniczne, 2017, Volume 11 Year 2017 (114), s. 183 - 197
Typ artykułu: Oryginalny artykuł naukowy
Tytuły:
Experiments with language combinatorics in text classification: lessons learned and future
implications
Experiments with language combinatorics in text classification: lessons learned and future
implications
Department of Computer Science Kitami Institute of Technology, Japan
Department of Computer Science Kitami Institute of Technology, Japan
Publikacja: 22.11.2017
Status artykułu: Otwarte
Licencja: Żadna
Udział procentowy autorów:
Korekty artykułu:
-Języki publikacji:
AngielskiLiczba wyświetleń: 1537
Liczba pobrań: 1036