%0 Journal Article %T Detection algorithm for content on Internet web portals %A Ulman, Krzysztof %A Rzecki, Krzysztof %J Technical Transactions %V 2012 %R 10.4467/2353737XCT.14.090.1867 %N Fundamental Sciences Issue 1-NP (18) 2012 %P 1-1 %K web pages contents recognition, data mining, web scraping, data collection, web pages structure analysis, HTML %@ 0011-4561 %D 2012 %U https://ejournals.eu/en/journal/czasopismo-techniczne/article/detection-algorithm-for-content-on-internet-web-portals %X The paper shows steps, made during designing and implementing automatic web pages contents recognition algorithm, based on HTML structure analysis. A web page contents is the article text with its headline, without any other text like menu, advertisements, user’s comments, image captions, etc.