FAQ

Was this the real Web? Quantitative overview of the Polish ccTLD Internet Archive data (1996–2001)

Data publikacji: 23.12.2021

Archeion, 2021, 122, s. 44-68

https://doi.org/10.4467/26581264ARC.21.015.14495

Autorzy

Marcin Wilkowski
Uniwersytet Warszawski, ul. Krakowskie Przedmieście 30, 00-927 Warszawa, Polska
https://orcid.org/0000-0003-2924-268X Orcid
Wszystkie publikacje autora →

Pobierz pełny tekst

Tytuły

Was this the real Web? Quantitative overview of the Polish ccTLD Internet Archive data (1996–2001)

Abstrakt

This article is an attempt to build a quantitative panorama of the Polish country code top-level domain (ccTLD) in the years 1996–2001 on the basis of data generously provided by the Internet Archive. The purpose of analyzing over 72 million captures is to show that these resources have limited potential in reconstructing the early Polish Web. The availability of historical Web resources and tools for their easy exploration in no way determines their potential value and usefulness in research, even if we do not have access to alternative sources.

Bibliografia

Pobierz bibliografię
Baeza-Yates R., Castillo C., Efthimiadis E.N., Characterization of national Web domains, “ACM Transactions on Internet Technology” 2007, 7, 2, p. 1–32, https://doi.org/10.1145/1239971.1239973. Accessed 16.09.2021.
Ben-David A., Critical Web Archive Research, [in:] The Past Web: Exploring Web Archives, D. Gomes, E. Demidova, J. Winters, T. Risse, eds., Springer Nature Switzerland, Cham 2021, p. 181–188, https://doi.org/10.1007/978-3-030-63291-5_14. Accessed 16.09.2021.
Ben-David A., Amram A., The Internet Archive and the socio-technical construction of historical facts, “Internet Histories: Digital Technology, Culture and Society” 2018, 2, p. 1–23, https://doi.org/10.1080/24701475.2018.1455412. Accessed 16.09.2021.
Bingham N.J., Byrne H., Archival strategies for contemporary collecting in a world of big data: Challenges and opportunities with curating the UK web archive, “Big Data & Society” 2021, 8, 1, p. 1–6, https://doi.org/10.1177/2053951721990409. Accessed 16.09.2021.
Brügger N., When the Present Web is Later the Past: Web Historiography, Digital History, and Internet Studies, “Historical Social Research” / “Historische Sozialforschung” 2012, 37, 4 (142), s. 102–117.
Brügger N., Ditte L., Historical studies of National Web Domains, [in:] The SAGE Handbook of Web History, N. Brügger, I. Milligan, eds., Sage, Los Angeles–London–New Delhi 2018, p. 413–427. 
Brügger N., Nielsen J., Laursen D., Big data experiments with the archived Web: Methodological reflections on studying the development of a nation’s Web, “First Monday” 2020, 25, 3, https://doi.org/10.5210/fm.v25i3.10384. Accessed 16.09.2021.
Cocciolo A., Quantitative Web History Methods, [in:] The SAGE Handbook of Web History, N. Brügger, I. Milligan, eds., Sage, Los Angeles–London–New Delhi 2018, p. 138–152.
Denev D. et al., The SHARC framework for data quality in Web archiving, “The VLDB Journal” 2011, 20, p. 184–207, https://doi.org/10.1007/s00778-011-0219-9. Accessed 16.09.2021.
Foot K., Web Sphere Analysis and Cybercultural Studies, [in:] Critical Cyberculture Studies, D. Silver, A. Massanari, eds, NYU Press, New York 2006, p. 88–96.
Hale S.A., Blank G., Alexander V.D., Live versus archive: Comparing a web archive to a population of web pages, [in:] The Web as History. Using Web Archives to Understand the Past and the Present, N. Brügger and R. Schroeder, eds., UCL Press, London 2017, p. 45–61.
Helmond A., A Historiography of the Hyperlink: Periodizing the Web Through the Changing Role of the Hyperlink, [in:] The SAGE Handbook of Web History, N. Brügger, I. Milligan, eds., Sage, Los Angeles–London–New Delhi 2018, p. 227–241.
Holzmann H., Goel V., Anand A., ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation, [in:] 16th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Newark, New Jersey, p. 83– 92, https://doi.org/10.1145/2910896.2910902. Accessed 16.09.2021.
Jones S.M. et al., Scholarly context adrift: three out of four URI references lead to changed content, “PLOS One” 2016, 11 (12), p. 1–32, https://doi.org/10.1371/journal.pone.0167475. Accessed 16.09.2021.
Kimpton M., Ubois J., Year-by-Year: From an Archive of the Internet to an Archive on the Internet, [in:] Web Archiving, Julien Masanes, ed., Springer, Berlin–Heidelberg–New York 2006, p. 201–212. Milligan I., Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives, “International Journal of Humanities and Arts Computing” 2016, 10, 1, p. 78–94, https://doi.org/10.3366/ijhac.2016.0161. Accessed 16.09.2021.
Milligan I., Ruest N., Lin J., Content Selection and Curation for Web Archiving: The Gatekeepers vs. the Masses, [in:] Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 2016, p. 107–110.
Rauber A. et al., Uncovering Information Hidden in Web Archives. A Glimpse at Web Analysis building on Data Warehouses, “D-Lib Magazine” 2002, 8, 12, https://www.doi.org/10.1045/december2002-rauber. Accessed 16.09.2021.
Spaniol M. et al., Data quality in Web Archiving, [in:] Proceedings of the 3rd Workshop on Information Credibility on the Web, Association for Computing Machinery, New York 2009, p. 19–26, https://doi.org/10.1145/1526993.1526999. Accessed 16.09.2021.
Trotman A., Zhang J., Future Web Growth and its Consequences for Web Search Architectures, arXiv.org, p. 1–41, https://arxiv.org/abs/1307.1179. Accessed 16.09.2021.
Tufekci Z., Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls, [in:] Eighth International AAAI Conference on Weblogs and Social Media, Association for the Advancement of Artificial Intelligence, 2014, p. 505–514.
Vlassenroot E., Chambers S., Di Pretoro E. et al., Web archives as a data resource for digital scholars, “International Journal of Digital Humanities” 2019, 1, p. 85–111, https://doi.org/10.1007/s42803-019-00007-7. Accessed 16.09.2021.
Weber M.S., Web Archives: A Critical Method for the Future of Digital Research, WARCnet Papers, Aarhus 2020, p. 1–17, https://cc.au.dk/fileadmin/user_upload/WARCnet/Weber_Web_Archives_A_Critical_Method.pdf. Accessed 16.09.2021.
Wilkowski W., Polish Web resources described in the “Polish World” directory (1997). Characteristics of domains and their conservation state, “Archiwa – Kancelarie – Zbiory” 2020, 11, 13, p. 119–140, https://doi.org/10.12775/AKZ.2020.005. Accessed 16.09.2021.
Internet Resources
Common Crawl, https://commoncrawl.org//. Accessed 16.09.2021.
GitHub, https://github.com. Accessed 18.10.2021. Internet Archive, https://archive.org. Accessed 16.09.2021.
Internet Domain Survey Background (2003), https://web.archive.org/web/20031002012504/http://www.isc.org/ds/new-survey.html. Accessed 16.09.2021.
HTTP Archive, https://httparchive.org/. Accessed 16.09.2021.
MDN Web Docs. HTTP response status codes, https://developer.mozilla.org/en-US/docs/Web/HTTP/Status. Accessed 16.09.2021.
SHINE, https://www.webarchive.org.uk/shine. Accessed 16.09.2021. Spark SQL, https://spark.apache.org. Accessed 16.09.2021.
The World Bank Data, https://data.worldbank.org. Accessed 16.09.2021.
Wikipedia. Spacer GIF, https://en.wikipedia.org/wiki/Spacer_GIF. Accessed 16.09.2021.

Informacje

Informacje: Archeion, 2021, 122, s. 44-68

Typ artykułu: Oryginalny artykuł naukowy

Tytuły:

Angielski:

Was this the real Web? Quantitative overview of the Polish ccTLD Internet Archive data (1996–2001)

Polski: Czy to był prawdziwy Web? Ilościowy przegląd polskiej domeny krajowej w zbiorach Internet Archive (1996–2001)

Autorzy

https://orcid.org/0000-0003-2924-268X

Marcin Wilkowski
Uniwersytet Warszawski, ul. Krakowskie Przedmieście 30, 00-927 Warszawa, Polska
https://orcid.org/0000-0003-2924-268X Orcid
Wszystkie publikacje autora →

Uniwersytet Warszawski, ul. Krakowskie Przedmieście 30, 00-927 Warszawa, Polska

Publikacja: 23.12.2021

Status artykułu: Otwarte __T_UNLOCK

Licencja: CC BY-NC-ND  ikona licencji

Udział procentowy autorów:

Marcin Wilkowski (Autor) - 100%

Korekty artykułu:

-

Języki publikacji:

Angielski