Document content mining for authors’ identification task
cytuj
pobierz pliki
RIS BIB ENDNOTEChoose format
RIS BIB ENDNOTEDocument content mining for authors’ identification task
Publication date: 2013
Technical Transactions, 2013, Automatic Control Issue 1-AC (2) 2013 , pp. 3 - 15
https://doi.org/10.4467/2353737XCT.14.001.1989Authors
Document content mining for authors' identification task
Eksploracja treści dokumentów w problemie identyfikacji autorów
Przedmiotem niniejszego artykułu jest problem identyfikacji autora na podstawie analizy treści dokumentów. Podejście to opiera się na wyborze odpowiednich cech związanych ze specyficznym użyciem struktur gramatycznych, interpunkcji oraz słownika, a następnie – użycie wybranego algorytmu klasyfikacji. W artykule przedstawiono najpierw różne charakterystyki tekstu, które mogą być użyte w omawianym zagadnieniu, a następnie załączono wyniki eksperymentów obliczeniowych obejmujących wybór cech i badanie skuteczności klasyfikacji w problemie identyfikacji autorów. Artykuł podsumowano wnioskami oraz propozycjami dalszych prac w rozważanej tematyce badawczej.
Argamon S., Levitan S., Measuring the Usefulness of Function Words for Authorship Attribution, Proc. Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, paper no 162, 2005.
Broccias C., Cognitive linguistic theories of grammar and grammar teaching, [in:] De Knop S., De Rycker T. (eds.), Cognitive Approaches to Pedagogical Grammar, Walter de Gruyter, Berlin 2008, 67-90.
Cheng N. Chandramouli R., Subbalakshmi K.P., Author gender identification from text, Digital Investigation, vol. 8, 2011, 78-88.
Eder M., Rybicki J., Do birds of a feather really flock together, or how to choose training samples for authorship attribution, Literary and Linguistic Computing /to appear.
Grieve J., Quantitative authorship attribution: An evaluation of techniques, Literary and Linguistic Computing, vol. 22, 2007, 251-270.
Hastie T., Tibshirani R., Friedman, J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York 2009.
Hoover D.L., Another Perspective on Vocabulary Richness, Computers and the Humanities, vol. 37, 2003, 151-178.
Jagadev A.K, Devi S., Mall R., Soft Computing for Feature Selection, [in:] Dehuri S., Cho, S.-B. (eds.), Knowledge Mining using Intelligent Agents, Imperial College Press, London 2011, 217-258.
Jockers M.L., Witten D.M., A comparative study of machine learning methods for authorship attribution, Literary and Linguistic Computing, vol. 25, 2010, 215-223.
Jurafsky D., Martin J.H., Speech And Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall, New Jersey 2009.
Karlgren J., Eriksson G., Authors, genre, and linguistic convention, Proc. SIGIR Workshop on Plagiarism Analysis, Author ship Attribution, and Near-Duplicate Detection, 2007, 23-28.
Klein D., Manning C.D., Accurate Unlexicalized Parsing, Proceedings of the 41st Meeting of the Association for Computational Linguistics, 2003, 423-430.
Koppel M. Schler J., Argamon S., Authorship attribution in the wild, Language Resources and Evaluation, vol. 45, 2011, 83-94.
Kowalski P.A., Procedure of feature extraction from face image for biometrical system (in Polish), Technical Transactions, vol. 1-AC/2012, Cracow University of Technology Press, 55-79.
Layton R., Authorship Attribution for Twitter in 140 Characters or Less, Cracow University of Technology Press, Proc. 2nd Cybercrime and Trustworthy Computing Workshop, 2010, 1-8.
Łukasik S., Haręza M., Kaczor M., Grammatical structures ranking (supplementary material), http://www.pk.edu.pl/~szymonl/nauka/Author_suppl.pdf (date of access: 9.10.2013).
Łukasik S., Kulczycki P., Using topology preservation measures for high-dimensional data analysis in a reduced feature space (in Polish), Technical Transactions, vol. 1-AC/2012, Cracow University of Technology Press, 5-15.
Panicheva P., Cardiff J., Rosso P., Personal Sense and Idiolect: Combining Authorship Attribution and Opinion Analysis, [in:] Proc. International Conference on Language Resources and Evaluation, Valletta, paper no. 10.491, 2010.
Penn Treebank Project, http://www.cis.upenn.edu/~treebank/
Punctuation, [in:] Merriam-Webster’s Collegiate Dictionary: Eleventh Edition, 1009, Merriam-Webster, Springfield 2004.
Radford A., Minimalist Syntax: Exploring the structure of English, Cambridge University Press, Cambridge 2004.
Stamatatos E., A survey of modern authorship attribution methods, Journal of the American Society for Information Science and Technology, vol. 60, 2009, 538-556.
Wang L.P, Fu X.J., Data Mining with Computational Intelligence, Springer, Berlin 2005.
Zheng R., Li J., Chen H., Huang Z., A framework for authorship identification of online messages: Writing style features and classification techniques, Journal of the American Society of Information Science and Technology, vol. 57, 2006, 378-393.
Information: Technical Transactions, 2013, Automatic Control Issue 1-AC (2) 2013 , pp. 3 - 15
Article type: Original article
Titles:
Document content mining for authors' identification task
Document content mining for authors’ identification task
Department of Automatic Control and Information Technology, Faculty of Electrical and Computer Engineering, Cracow University of Technology; Systems Research Institute, Polish Academy of Sciences
Department of Automatic Control and Information Technology, Faculty of Electrical and Computer Engineering, Cracow University of Technology
Department of Automatic Control and Information Technology, Faculty of Electrical and Computer Engineering, Cracow University of Technology
Published at: 2013
Article status: Open
Licence: None
Percentage share of authors:
Article corrections:
-Publication languages:
EnglishView count: 2339
Number of downloads: 1431