Revisiting how we access historical archives: Auditing Gender Stereotypes and the Division of Labour in the Analysis of Historical Photography Collections
F. Net Barnes, A. Molina Rodríguez, S. Llàcer Caro, L. Gómez Bigordà
The access to non-textual documents, such as historical photographs, still represent a major challenge in accessibility to historical archives and digital libraries. The access to such information has been accelerated through the years thanks to the emergence of multimodal encoders. However, due to the vast training of such models, they present biases than, when deployed on heritage institutions, may result in the distorsion of historical narratives when involving minorities and underrepresented groups. In this paper, we investigate how gender bias in multimodal encoders shifts when exposed to historical documentation. We conduct a large-scale empirical audit of gender bias in CLIP using century-spanning datasets of archival photographs, ranging from controlled yearbook images to unconstrained real-world collections. We leverage a semantic taxonomy of stereotype-related concepts and an entropy-based bias metric, we quantify how gender associations encoded by CLIP vary across decades and semantic domains.