FAQ

Archeion

Web archives as research infrastructure for digital societies: the case study of Arquivo.pt

__T_CHECK_FOR_UPDATES

Data publikacji: 14.11.2022

Archeion, 2022, 123, s. 46 - 85

https://doi.org/10.4467/26581264ARC.22.012.16665

Autorzy

Daniel Gomes
Fundação para a Ciência e a Tecnologia: Arquivo.pt, Portugalia
https://orcid.org/0000-0002-5447-4581 Orcid
Wszystkie publikacje autora →

Abstrakt

Archiwum internetu jako infrastruktura badawcza społeczeństwa cyfrowego: studium przypadku Arquivo.pt

Ludzkość jest dominującym gatunkiem na Ziemi. nasza przewaga ma źródło w unikalnej zdolności organizowania się na dużą skalę dla osiągnięcia wspólnych celów. W społeczeństwie cyfrowym wszelka organizacja wymaga przekazywania informacji, a współcześnie jej większość jest publikowana wyłącznie online. Problem stanowi to, iż informacja online znika bardzo szybko, już po kilku miesiącach. Zależność ludzkości od informacji online jest bardzo duża i wciąż aktualna, a konsekwencje utraty perspektywy historycznej w odniesieniu do danych online nie zostały dotąd zbadane. Archiwa internetowe są cyfrowymi systemami przechowywania, które gromadzą, zachowują i udostępniają historyczne dane stron internetowych. Są one używane przez badaczy. Jednakże archiwa internetowe, aby służyć społeczeństwu cyfrowemu, powinny być także wykorzystywane przez szerszy krąg użytkowników. Arquivo.pt jest publicznym archiwum internetowym, uruchomionym w 2007 r., które umożliwia prowadzenie badań i dostęp do danych historycznych stron internetowych, zachowanych od lat dziewięćdziesiątych XX w. W artykule zaprezentowano portal Arquivo.pt jako studium przypadku dotyczące infrastruktury badawczej rozwijanej do obsługi szerokiego grona użytkowników na poziomie krajowym i międzynarodowym. Artykuł prezentuje najważniejsze wnioski mogące przysłużyć się powstawaniu i szybszemu rozwojowi innych inicjatyw archiwizacji Internetu. Opisuje także istniejące narzędzia i podejścia umożliwiające badanie historycznych zbiorów internetowych. Wreszcie, prezentuje wyzwania wiążące się z tworzeniem archiwów internetowych oraz propozycje działań w tym zakresie.

Bibliografia

Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C. and Nelson, M.L., 2011, June. How much of the web is archived? In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries (pp. 133–136).

Ainsworth, S.G., Nelson, M.L. and de Sompel, H.V., 2015. Evaluating the Temporal Coherence of Archived Pages.

Alam, S., Weigle, M., Nelson, M., Melo, F., Bicho, D. and Gomes, D., 2019, June. MementoMap framework for flexible and adaptive web archive profiling. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (pp. 172–181). IEEE.

AlSum, A., Weigle, M.C., Nelson, M.L. and Van de Sompel, H., 2014. Profiling web archive coverage for top-level domain and content language. International Journal on Digital Libraries, 14(3), pp. 149166.

Ben-David, A., 2019. 2014 not found: a cross-platform approach to retrospective web archiving. Internet Histories, 3(34), pp. 316342.

Ben-David, A., 2019. National web histories at the fringe of the Web: Palestine, Kosovo, and the quest for online self-determination. In The Historical Web and Digital Humanities (pp. 89–109). Routledge.

Ben-David, A. and Amram, A., 2018. The Internet Archive and the socio-technical construction of historical facts. Internet Histories, 2(12), pp. 179201.

Bicho, D. and Gomes, D., 2016. Preserving Websites Of Research & Development Projects. In iPRES. Brügger, N., 2005. Archiving Websites. General Considerations and Strategies: General Considerations and Strategies.

Brügger, N., 2018. The archived web: doing history in the digital age. MIT Press.

Brügger, N. and Laursen, D. eds., 2019. The historical web and digital humanities: the case of national web domains. Routledge.

Brügger, N. and Milligan, I. eds., 2018. The SAGE handbook of web history. Sage. Brügger, N. ed., 2010. Web history (Vol. 56). Peter Lang.

Brügger, N., Goggin, G., Milligan, I. and Schafer, V., 2017. Introduction: Internet histories. Internet Histories, 1(12), pp. 17.

Brügger, N., Locatelli, E., Weber, M. and Nanni, F., 2017. Web 25: histories from the first 25 years of the World Wide Web.

Classificação automática de artigos estigmatizantes de doenças mentais em jornais de notícias portugueses online, https://alina-yanchuk02.github.io/estigma/, accessed: 31 October 2022.

Costa M., 2014. Information Search in Web Archives (Doctoral dissertation, Universidade de Lisboa (Portugal)).

Costa, M., Gomes, D. and Silva, M.J., 2017. The evolution of web archiving. International Journal on Digital Libraries, 18(3), pp. 191205.

Cruz, D. and Gomes, D., 2013, September. Adapting search user interfaces to web archives. In Proc. of the 10th International Conference on Preservation of Digital Objects (Vol. 17). Dados.gov.pt – Portal de dados abertos da Administração Pública, Arquivo.pt – pesquise páginas do passado, https://arquivo.pt/dadosabertos, accessed 31 October 2022.

Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information, http://data.europa.eu/eli/dir/2019/1024/oj, accessed 31 October 2022.

Gomes, D. and Costa, M., 2014. The importance of web archives for humanities. International Journal of Humanities and Arts Computing, 8(1), pp. 106123.

Gomes, D. and Silva, M.J., 2006, July. Modelling information persistence on the web. In Proceedings of the 6th international conference on Web engineering (pp. 193200).

Gomes, D. and Silva, M.J., 2008. The Viúva Negra crawler: an experience report. Software: Practice and Experience, 38(2), pp. 161188.

Gomes, D., Costa, M., Cruz, D., Miranda, J. and Fontes, S., 2013, May. Creating a billion-scale searchable web archive. In Proceedings of the 22nd International Conference on World Wide Web (pp. 10591066).

Gomes, D., Demidova, E., Winters, J. and Risse, T., 2021. Past Web. Springer International Publishing.

Gomes, D.C., 2006. Web Modelling for Web Warehouse Design (Doctoral dissertation, Universidade de Lisboa (Portugal)).

Graham, S., Milligan, I., Weingart, S.B. and Martin, K., 2016. Exploring big historical data: the historian’s macroscope.

Harari, Y.N., 2014. Sapiens: A brief history of humankind. Random House.

Hockx-Yu, H., Laursen, D. and Gomes, D., 2019. The curious case of archiving. eu. In The Historical Web and Digital Humanities (pp. 6472). Routledge.

International Internet Preservation Consortium, SolrWayback 4.0 release! What’s it all about? Part 2, https://netpreserveblog.wordpress.com/2021/03/04/solrwayback-4-0-release-whats-it-all-about-part-2/, accessed 31 October 2022.

Internet Archive, Wayback Machine Save Page Now, https://web.archive.org/save/, accessed 31 October 2022.

ISO 28500:2017 Information and documentation — WARC file format.

Jones, S.M., Van de Sompel, H., Shankar, H., Klein, M., Tobin, R. and Grover, C., 2016. Scholarly context adrift: three out of four URI references lead to changed content. PloS one, 11(12).

Kahle, B., 1997. Preserving the internet. Scientific American, 276(3), pp. 8283.

Klein, M. and Nelson, M.L., 2014. Moved but not gone: an evaluation of real-time methods for discovering replacement web pages. International Journal on Digital Libraries, 14(1), pp. 1738.

Klein, M., Balakireva, L. and Van de Sompel, H., 2018, May. Focused crawl of web archives to build event collections. In Proceedings of the 10th ACM Conference on Web Science (pp. 333342).

Masanes, J., 2006. Web archiving: issues and methods. In Web archiving (pp. 153). Springer, Berlin, Heidelberg.

Masanès, J., Major, D. and Gomes, D., 2021. The Past Web: A Look into the Future. In The Past Web (pp. 285291). Springer.

Milligan, I., 2019. History in the age of abundance?: how the web is transforming historical research. McGill-Queen’s University Press.

Milligan, I., 2022. The Transformation of Historical Research in the Digital Age. Elements in Historical Theory and Practice.

Ministério da Educação e Ciência, Decreto-Lei n.º 55/2013, Diário da República, n.º 75/2013, Série I de 2013-04-17, páginas 2257–2261.

Miranda, J. and Gomes, D., 2009, November. Trends in Web characteristics. In 2009 Latin American Web Congress (pp. 146153). IEEE.

Mourão, A. and Gomes, D., 2021. The Anatomy of a Web Archive Image Search Engine-Technical Report,  https://sobre.arquivo.pt/wp-content/uploads/The_Anatomy_of_a_Web_Archive_Image_Search_Engine_tech_report-1.pdf, accessed 31 October 2022.

Quitney Anderson, J., 2009. Tim Berners-Lee launches “WWW Foundation” at IGF 2009, https://arstechnica.com/tech-policy/2009/11/tim-berners-lee-launches-www-foundation-at-igf-2009/, accessed 31 October 2022.

Ruest, N., Lin, J., Milligan, I. and Fritz, S., 2020, August. The archives unleashed project: Technology, process, and community to improve scholarly access to web archives. In Proceedings of the ACM/ IEEE Joint Conference on Digital Libraries in 2020 (pp. 157166), https://archivesunleashed.org/, accessed 31 October 2022.

SalahEldeen, H.M. and Nelson, M.L., 2013, May. Carbon dating the web: estimating the age of web resources. In Proceedings of the 22nd International Conference on World Wide Web (pp. 10751082).

Schafer, V. and Winters, J., 2021. The values of web archives. International Journal of Digital Humanities, 2(1), pp. 129144.

Schroeder, R. and Brügger, N., 2017. The Web as History: Using Web Archives to Understand the Past and the Present (p. 296). UCL Press.

Sherratt, T. and Jackson, A., 2020. GLAM-Workbench/web-archives, https://glam-workbench.net/web-archives/, accessed 31 October 2022.

Spaniol, M., Mazeika, A., Denev, D. and Weikum, G., 2009, September. Catch me if you can: Visual analysis of coherence defects in web archiving. In 9th International Web Archiving Workshop (IWAW 2009), Corfu, Greece (pp. 2737).

Upwork, How Much Does It Cost To Build a Website? (2022 Data), https://www.upwork.com/resources/how-much-does-it-cost-to-build-website, accessed 31 October 2022.

Van de Sompel, H., Nelson, M. and Sanderson, R., 2013. RFC 7089-HTTP framework for time- based access to resource states-Memento. Internet Engineering Task Force (IETF), RFC.

Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L.L., Ainsworth, S. and Shankar, H., 2009. Memento: Time travel for the web. arXiv preprint arXiv:0911.1112.

Winters, J., 2015. „Big UK Domain Data for the Arts and Humanities”, Presentation, 2015 International Internet Preservation Coalition General Assembly, April 27-May 1, 2015. Silicon Valley, California, https://buddah.projects.history.ac.uk/, accessed 31 October 2022.

 

Internet sources:

Arquivo do Parlamento, https://arquivo-parlamento.pt/, accessed 31 October 2022.

Arquivo.pt, A first attempt to archive the .EU domain, https://sobre.arquivo.pt/en/a-first-attempt-to-archive-the-eu-domain/, accessed 31 October 2022.

Arquivo.pt, Arquivo.pt Application Programming Interfaces (APIs), https://arquivo.pt/api, accessed 31 October 2022.

Arquivo.pt, Arquivo.pt Awards, https://arquivo.pt/awards, accessed 31 October 2022.

Arquivo.pt, Arquivo.pt Memorial: preserves information of historical websites, https://arquivo.pt/memorialen, accessed 31 October 2022.

Arquivo.pt, Cross-lingual collection about the 2019 European Elections is available, https://sobre.arquivo.pt/en/cross-lingual-collection-about-the-2019-european-elections-is-available/, accessed 31 October 2022.

Arquivo.pt, Exhibitions, https://arquivo.pt/exhibitions/, accessed 31 October 2022.

Arquivo.pt, H2020 projects preserved by Arquivo.pt, https://sobre.arquivo.pt/en/h2020-projects- preserved-by-arquivo-pt/, accessed 31 October 2022.

Arquivo.pt, Open dataset about cryptocurrency, https://sobre.arquivo.pt/en/open-dataset-about-cryptocurrency/, accessed 31 October 2022.

Arquivo.pt, Publications, https://arquivo.pt/publications, accessed 31 October 2022.

Arquivo.pt, Put an end to “page not found” on your website, https://arquivo.pt/arquivo404en, accessed 31 October 2022.

Arquivo.pt, Recommendations for authors to enable web archiving, https://arquivo.pt/recommendations, accessed 31 October 2022.

Arquivo.pt, SavePageNow, https://arquivo.pt/savepagenow, accessed 31 October 2022.

Arquivo.pt, Search the Geocities history!, https://sobre.arquivo.pt/en/historical-collection-geocities-available-at-arquivo-pt/, accessed 31 October 2022.

Arquivo.pt, Suggest websites to be preserved – Collaborate, https://arquivo.pt/suggest, accessed 31 October 2022.

Arquivo.pt, Training courses, https://arquivo.pt/training, accessed 31 October 2022.

GitHub, Arquivo.pt, https://github.com/arquivo/, accessed 31 October 2022.

Memento Time Travel, http://timetravel.mementoweb.org/, accessed 31 October 2022.

Memória de festivais e eventos de arte, https://arteparasempre.wordpress.com/, accessed 31 October 2022.

MeuParlamento.pt, http://www.meuparlamento.pt/, accessed 31 October 2022.

Pywb, Configuring the Web Archive pywb 2.0 documentation, https://pywb.readthedocs.io/en/latest/manual/configuring.html#recording-mode, accessed 31 October 2022.

Webrecorder: Web archiving for all!, https://webrecorder.net/, accessed 31 October 2022.

Wikiquote, George Santayana, https://en.wikiquote.org/wiki/George_Santayana, accessed 31 October 2022.

Informacje

Informacje: Archeion, 2022, s. 46 - 85

Typ artykułu: Oryginalny artykuł naukowy

Tytuły:

Angielski:

Web archives as research infrastructure for digital societies: the case study of Arquivo.pt

Autorzy

https://orcid.org/0000-0002-5447-4581

Daniel Gomes
Fundação para a Ciência e a Tecnologia: Arquivo.pt, Portugalia
https://orcid.org/0000-0002-5447-4581 Orcid
Wszystkie publikacje autora →

Fundação para a Ciência e a Tecnologia: Arquivo.pt, Portugalia

Publikacja: 14.11.2022

Status artykułu: Otwarte __T_UNLOCK

Licencja: CC BY-NC-ND  ikona licencji

__T_CHECK_FOR_UPDATES

Udział procentowy autorów:

Daniel Gomes (Autor) - 100%

Korekty artykułu:

-

Języki publikacji:

Angielski

Liczba wyświetleń: 1150

Liczba pobrań: 465

Sugerowane cytowania: Chicago

Daniel, Gomes. "
Web archives as research infrastructure for digital societies: the case study of Arquivo.pt
" Archeion. Nov 14, 2022. https://ejournals.eu/czasopismo/archeion/artykul/web-archives-as-research-infrastructure-for-digital-societies-the-case-study-of-arquivo-pt