FAQ
T_LOGIN Log in

Don't have an account on our website?

T_REGISTER Register

Artificial Intelligence and Machine Learning at the Intersection of Privacy and Archives

Publication date: 03.12.2024

Archeion, 2024, 125, pp. 55-78

https://doi.org/10.4467/26581264ARC.24.006.20201

Authors

,
Iori Khuhro
University of British Columbia
, Canada
https://orcid.org/0009-0002-6403-4149 Orcid
All publications →
,
Erin Gilmore
San José State University
, United States of America
https://orcid.org/0009-0008-0249-1954 Orcid
All publications →
,
Jim Suderman
InterPARES Trust AI Project
, Canada
Contact with author
All publications →
Darra L. Hofman
San José State University
, United States of America
https://orcid.org/0000-0002-1772-6268 Orcid
Contact with author
All publications →

Titles

Artificial Intelligence and Machine Learning at the Intersection of Privacy and Archives

Abstract

As records are increasingly born digital – and thus, at least ostensibly, potentially much more accessible – archivists find themselves struggling to enable general access while providing appropriate privacy protections for the torrent of records being transferred to their care. In this article, the authors report the results of an integrative literature review study, examining the intersection of AI, archives, and privacy in terms of how archives are currently coping with these challenges and what role(s) AI might play in addressing privacy in archival records. The study revealed three major themes: 1) the challenges of – and possibilities beyond – defining “privacy” and “AI”; 2) the need for context-sensitive ways to manage privacy and access decisions; and 3) the lack of adequate “success measures” for ensuring the actual fitness for purpose of privacy AI solutions in the archival context.

Acknowledgements

The authors are grateful that this work was supported by International Research on Permanent Authentic Records in Electronic Systems (InterPARES) Trust AI, an international research partnership led by Drs. Luciana Duranti and Muhammad Abdul-Mageed, University of British Columbia. InterPARES Trust AI is supported in part by funding from the Social Sciences and Humanities Research Council of Canada (SSHRC). The authors would like to thank Kisun Kim (Okanagan College) and Carlos Quevedo, previous InterPARES Trust AI Graduate Research Assistants, for their contribution to this work.

References

Download references

Ardia D., Klinefelter A., Privacy and Court Records: An Empirical Study, “Berkeley Technology Law Journal” 2015, vol. 30, no. 3, pp. 1807–1898.

Baron J.R., Payne N., Dark archives and E-democracy: strategies for overcoming access barriers to the public record archives of the future [in:] Conference for E-Democracy and Open Government (CeDEM), eds. P. Parycek, N. Edelmann, Krems 2017, pp. 3–11.

Bingo S., Of Provenance and Privacy: Using Contextual Integrity to Define Third-Party Privacy, “The American Archivist” 2011, 74(2), pp. 506–521, https://doi.org/10.17723/aarc.74.2.55132839256116n4 [access: 5.11.2024].

CrossRef

Booms H., Überlieferungsbildung: keeping archives as a social and political activity, “Archivaria” 1991, vol. 33, pp. 25–33.

Desai M.A., Pasquetto I.V., Jacobs A.Z., Card D., An Archival Perspective on Pretraining Data, “Patterns” 2024, vol. 5, no. 4, pp. 1–11.

Fairfield J.A., “You Keep Using That Word”: Why Privacy Doesn’t Mean What Lawyers Think, “Osgoode Hall Law Journal” 2002, vol. 59, pp. 249–290.

Garat D., Wonsever D., Automatic Curation of Court Documents: Anonymizing Personal Data, “Information” 2022, vol. 13, no. 27, pp. 1–16, https://doi.org/10.3390/info13010027 [access: 5.11.2024].

CrossRef

Gichoya J.W., Kaesha T., Celi L.A., Safad N., Banerjee I., Banja J.D., Seyyed-Kalantari L., Trivedi H., Purkayastha S., AI pitfalls and what not to do: mitigating bias in AI, “The British Journal of Radiology” 2023, vol. 96, no. 1150, pp. 1–8, https://doi.org/10.1259/bjr.20230023 [access: 5.11.2024].

CrossRef

Glaser I., Schamberger T., Matthes F., Anonymization of German legal court rulings [in:] Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, New York 2021, pp. 205–209, https://doi.org/10.1145/3462757.3466087 [access: 5.11.2024].

CrossRef

Goldman B., Pyatt T.D., Security without obscurity: Managing personally identifiable information in born-digital archives, “Library & Archival Security” 2013, vol. 26, no. 1–2, pp. 37–55, https://doi.org/10.1080/01960075.2014.913966 [access: 5.11.2024].

CrossRef

Harris V., The archival sliver: power, memory, and archives in South Africa, “Archival Science” 2002, vol. 2, pp. 63–86.

Hertzog W., Privacy’s Blueprint: The Battle to Control the Design of New Technologies, Cambridge, Massachusetts 2018.

Heurix J., Zimmermann P., Neubauer T., Fenz S., A taxonomy for privacy enhancing technologies, “Computers & Security” 2015, vol. 53, pp. 1–17.

Hutchinson T., Protecting Privacy in the Archives: Preliminary Explorations of Topic Modeling for Born-Digital Collections [in:] Proceedings of the 2017 IEEE International Conference on Big Data25–30 June 2017, Honolulu, Hawaii, eds. G. Karypis, J. Zhang, Los Alamitos 2017, pp.  2251–2255,  https://harvest.usask.ca/items/e237ebe9-5627-44ac-8b2f-a61fc2e4acc3 [access: 5.11.2024].

Hutchinson T., Protecting Privacy in the Archives: Supervised Machine Learning and Born-Digital Records [in:] Proceedings 2018 IEEE International Conference on Big Data10–13 December 2018, Seattle, ed. N. Abe, H. Liu, C. Pu, X. Hu, N. Ahmed, M. Qiao, Y. Song, D. Kossmann, B. Liu, K. Lee, J. Tang, J. He, J. Saltz, Piscataway 2018, pp. 2696–2701, https://doi.org/10.1109/BigData.2018.8621929 [access: 5.11.2024].

CrossRef

Jenkinson H., A Manual of Archive Administration: Including the Problems of War Archives and Archive Making, London 1922.

Koops B.J., Newell B.C., Timan T., Chokrevski T., A Typology of Privacy, “University of Pennsylvania Journal of International Law” 2017, vol. 38, no. 2, pp. 483–578.

LeClere E., Breaking Rules for Good? How Archivists Manage Privacy in Large-Scale Digitisation Projects, “Archives and Manuscripts” 2018, vol. 46, no. 3, pp. 289–308, https://doi.org/10.1080/01576895.2018.1547653 [access: 5.11.2024].

CrossRef

Lee C.A., Woods K., Automated redaction of private and personal data in collections [in:] Proceedings of Memory of the World in the Digital Age: Digitization and Preservation International Conference, eds. L. Duranti, E. Shaffer, Vancouver 2012, pp. 298–313, https://ils.unc.edu/callee/p298-lee.pdf [access: 5.11.2024].

Lemieux V.L., Werner J., Protecting Privacy in Digital Records: The Potential of Privacy-Enhancing Technologies, “Journal on Computing and Cultural Heritage” 2024, vol. 16, no. 4, article 83, pp. 1–18, https://doi.org/10.1145/3633477 [access: 5.11.2024].

CrossRef

Liu B., Ding M., Shaham S., Rahayu W., Farokhi F., Lin Z., When Machine Learning Meets Privacy: A Survey and Outlook, “ACM Computing Survey” 2021, vol. 54, no. 2, article 31, pp. 1–36, https://doi.org/10.1145/3436755 [access: 5.11.2024].

CrossRef

Mordell D., Critical Questions for Archives as (Big) Data, “Archivaria” 2019, vol. 87, pp. 140–161.

Nissenbaum H.F., Privacy in context: Technology, policy, and the integrity of social life, Stanford 2009.

Ohm P., Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization, “UCLA Law Review” 2010, vol. 57, pp. 1701–1777.

Oksanen A., Tamper M., Tuominen J., Hietanen A., Hyvöonen E., ANOPPI: A pseudonymization service  for  Finnish  court  documents  [in:]  Legal  Knowledge  and  Information  Systems, eds. M. Araszkiewicz, V. Rodríguez-Doncel, Amsterdam 2019, pp. 251–254, https://helda.helsinki.fi/server/api/core/bitstreams/622773b4-8c6e-4558-8571-da432fe7ea8f/content [access: 5.11.2024].

Pasquale  F.,  New laws of robotics: defending human expertise in the age of AI, Cambridge, Massachusetts 2020.

Rolan G., Humphries G., Jeffrey L., Samaras E., Antsoupova T., Stuart K., More human than human? Artificial intelligence in the archive, “Archives and Manuscripts” 2019, vol. 47, no. 2, pp. 179–203, https://doi.org/10.1080/01576895.2018.1502088 [access: 5.11.2024].

CrossRef

Sillitoe P., Privacy in a public place: Managing public access to personal information controlled by archives services, “Journal of the Society of Archivists” 1998, vol. 19, no. 1, pp. 5–15.

Silva P., Goncalves C., Godinho C., Antunes N., Curado M., Using NLP and Machine Learning to Detect Data Privacy Violations [in:] IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto 2020, pp. 972–977, https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162683 [access: 5.11.2024].

CrossRef

Snyder H., Literature review as a research methodology: An overview and guidelines, “Journal of Business Research” 2019, vol. 104, pp. 333– 339, https://doi.org/10.1016/j.jbusres.2019.07.039 [access: 5.11.2024].

CrossRef

Solove D.J., A Taxonomy of Privacy, “University of Pennsylvania Law Review” 2006, vol. 145, no. 3, pp. 477–564, https://doi.org/10.2307/40041279 [access: 5.11.2024].

CrossRef

Solove D.J., Access and Aggregation: Public Records, Privacy, and the Constitution, “Minnesota Law Review” 2002, vol. 86, no. 6, pp. 1137–1209.

Tamper M., Oksanen A., Tuominen J.A., Hyvönen E.A., Hietanen A., Anonymization Service for Finnish Case Law: Opening Data without Sacrificing Data Protection and Privacy of Citizens, 2018,  https://research.aalto.fi/en/publications/anonymization-service-for-finnish-case-law-opening-data-without-s [access: 5.11.2024].

Todd M., Power, Identity, Integrity, Authenticity, and the Archives: A Comparative Study of the Application of Archival Methodologies to Contemporary Privacy, “Archivaria” 2006, vol. 61, pp. 181–214.

Tzouganatou A., Openness and privacy in born-digital archives: reflecting the role of AI development, “AI & Society” 2022, vol. 37, pp. 991–999, https://doi.org/10.1007/s00146-021-01361-3 [access: 5.11.2024].

CrossRef

Westin A.F., Privacy and Freedom, New York 1967.

Yun H., Lee G., Kim D.J., A Chronological Review of Empirical Research on Personal Information Privacy Concerns: An Analysis of Contexts and Research Constructs, “Information & Management” 2019, vol. 56, no. 4, pp. 570–601, https://doi.org/10.1016/j.im.2018.10.001 [access: 5.11.2024].

CrossRef
Netography

Heaven W.D., What is AI?, “MIT Technology Review”, 10 July 2024, https://www.technologyreview.com/2024/07/10/1094475/what-is-artificial-intelligence-ai-definitive-guide/ [access: 5.11.2024].

Kundu R., F1 Score in Machine Learning: Intro & Calculation, 16 December 2022, https://www.v7labs.com/blog/f1-score-guide [access: 5.11.2024].

Ovide S., Why Google’s AI might recommend you mix glue into your pizza, “The Washington Post”, 24 May 2024, https://www.washingtonpost.com/technology/2024/05/24/google-ai-overviews-wrong/ [access: 5.11.2024].

UNCTAD. Data Protection and Privacy Legislation Worldwide, https://unctad.org/page/data-protection-and-privacy-legislation-worldwide [access: 5.11.2024].

Ware W.H., Records, computers and the rights of citizens, [“Report of the Secretary’s Advisory Committee on Automated Personal Data Systems”, Washington 1973], https://aspe.hhs.gov/reports/records-computers-rights-citizens [access: 5.11.2024].

WIPO. Genetic Resources, Traditional Knowledge and Traditional Cultural Expressions, https://www.wipo.int/tk/en/ [access: 5.11.2024].

Information

Information: Archeion, 2024, 125, pp. 55-78

Article type: Original article

Titles:

English: Artificial Intelligence and Machine Learning at the Intersection of Privacy and Archives
Polish: Sztuczna inteligencja i uczenie maszynowe na styku prywatności i archiwów

Authors

https://orcid.org/0009-0002-6403-4149

Iori Khuhro
University of British Columbia
, Canada
https://orcid.org/0009-0002-6403-4149 Orcid
All publications →

University of British Columbia
Canada

https://orcid.org/0009-0008-0249-1954

Erin Gilmore
San José State University
, United States of America
https://orcid.org/0009-0008-0249-1954 Orcid
All publications →

San José State University
United States of America

InterPARES Trust AI Project
Canada

https://orcid.org/0000-0002-1772-6268

Darra L. Hofman
San José State University
, United States of America
https://orcid.org/0000-0002-1772-6268 Orcid
Contact with author
All publications →

San José State University
United States of America

Published at: 03.12.2024

Article status: Open

Licence: CC BY-NC-ND  licence icon

Article financing:

The authors are grateful that this work was supported by International Research on Permanent Authentic Records in Electronic Systems (InterPARES) Trust AI, an international research partnership led by Drs. Luciana Duranti and Muhammad Abdul-Mageed, University of British Columbia. InterPARES Trust AI is supported in part by funding from the Social Sciences and Humanities Research Council of Canada (SSHRC).

Percentage share of authors:

Iori Khuhro (Author) - 25%
Erin Gilmore (Author) - 25%
Jim Suderman (Author) - 25%
Darra L. Hofman (Author) - 25%

Article corrections:

-

Publication languages:

English

Artificial Intelligence and Machine Learning at the Intersection of Privacy and Archives

quote

download files

RIS BIB ENDNOTE