TY - JOUR TI - Detection algorithm for content on Internet web portals AU - Ulman, Krzysztof AU - Rzecki, Krzysztof TI - Detection algorithm for content on Internet web portals AB - The paper shows steps, made during designing and implementing automatic web pages contents recognition algorithm, based on HTML structure analysis. A web page contents is the article text with its headline, without any other text like menu, advertisements, user’s comments, image captions, etc. VL - 2012 IS - Nauki Podstawowe Zeszyt 1-NP (18) 2012 PY - 2012 SN - 0011-4561 C1 - 2353-737X SP - 1 EP - 1 DO - 10.4467/2353737XCT.14.090.1867 UR - https://ejournals.eu/czasopismo/czasopismo-techniczne/artykul/detection-algorithm-for-content-on-internet-web-portals KW - web pages contents recognition KW - data mining KW - web scraping KW - data collection KW - web pages structure analysis KW - HTML