Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions

Archival finding aids are often only partially capable of fully expressing the informational potential of data due to the presence of numerous unstructured fields in the descriptions of documentary collections. The prevalence of extensive literal sections, or full-text fields, limits both the possib...

Full description

Saved in:
Bibliographic Details
Main Authors: Lucia Giagnolini, Andrea Schimmenti, Paolo Bonora, Francesca Tomasi
Format: Article
Language:English
Published: University of Bologna 2025-07-01
Series:Umanistica Digitale
Subjects:
Online Access:https://umanisticadigitale.unibo.it/article/view/21229
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839626482822938624
author Lucia Giagnolini
Andrea Schimmenti
Paolo Bonora
Francesca Tomasi
author_facet Lucia Giagnolini
Andrea Schimmenti
Paolo Bonora
Francesca Tomasi
author_sort Lucia Giagnolini
collection DOAJ
description Archival finding aids are often only partially capable of fully expressing the informational potential of data due to the presence of numerous unstructured fields in the descriptions of documentary collections. The prevalence of extensive literal sections, or full-text fields, limits both the possibility of semantic queries and the ability to uncover the latent contexts embedded in such unstructured text. This study proposes a methodology for the automatic extraction of knowledge (Knowledge Extraction, KE) from archival descriptions, aiming to enhance their structuring and semantic interoperability. Through a case study based on the Italian National Archival System (SAN) and leveraging ready-to-use tools such as TINT, FRED, and GPT-4o, we conducted a preliminary evaluation of various morphosyntactic, lexical, and semantic analysis techniques. The most promising results highlighted the potential of Large Language Models (LLMs), leading to the development of a KE pipeline based on the open-source model Llama 3.3. The findings demonstrate a high capacity for extracting biographical events and relationships, achieving a good balance between precision and recall, thus confirming the validity of the approach. However, the need for a more robust software architecture emerges, as LLM-based pipelines must become truly scalable to enable effective integration into archival systems.
format Article
id doaj-art-91f4d34cb739459d9b6485d0f3c00a85
institution Matheson Library
issn 2532-8816
language English
publishDate 2025-07-01
publisher University of Bologna
record_format Article
series Umanistica Digitale
spelling doaj-art-91f4d34cb739459d9b6485d0f3c00a852025-07-17T10:32:19ZengUniversity of BolognaUmanistica Digitale2532-88162025-07-012011514410.6092/issn.2532-8816/2122919606Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival DescriptionsLucia Giagnolini0https://orcid.org/0000-0002-4876-2691Andrea Schimmenti1https://orcid.org/0000-0001-7865-7537Paolo Bonora2https://orcid.org/0000-0001-8337-3379Francesca Tomasi3https://orcid.org/0000-0002-6631-8607Università di BolognaUniversità di BolognaUniversità di BolognaUniversità di BolognaArchival finding aids are often only partially capable of fully expressing the informational potential of data due to the presence of numerous unstructured fields in the descriptions of documentary collections. The prevalence of extensive literal sections, or full-text fields, limits both the possibility of semantic queries and the ability to uncover the latent contexts embedded in such unstructured text. This study proposes a methodology for the automatic extraction of knowledge (Knowledge Extraction, KE) from archival descriptions, aiming to enhance their structuring and semantic interoperability. Through a case study based on the Italian National Archival System (SAN) and leveraging ready-to-use tools such as TINT, FRED, and GPT-4o, we conducted a preliminary evaluation of various morphosyntactic, lexical, and semantic analysis techniques. The most promising results highlighted the potential of Large Language Models (LLMs), leading to the development of a KE pipeline based on the open-source model Llama 3.3. The findings demonstrate a high capacity for extracting biographical events and relationships, achieving a good balance between precision and recall, thus confirming the validity of the approach. However, the need for a more robust software architecture emerges, as LLM-based pipelines must become truly scalable to enable effective integration into archival systems.https://umanisticadigitale.unibo.it/article/view/21229linked open dataarchivesinformation retrievalknowledge extractionknowledge representationsupervised annotationarchival contextsaiucd2024
spellingShingle Lucia Giagnolini
Andrea Schimmenti
Paolo Bonora
Francesca Tomasi
Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions
Umanistica Digitale
linked open data
archives
information retrieval
knowledge extraction
knowledge representation
supervised annotation
archival contexts
aiucd2024
title Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions
title_full Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions
title_fullStr Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions
title_full_unstemmed Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions
title_short Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions
title_sort expliciting contexts semantic knowledge extraction from traditional archival descriptions
topic linked open data
archives
information retrieval
knowledge extraction
knowledge representation
supervised annotation
archival contexts
aiucd2024
url https://umanisticadigitale.unibo.it/article/view/21229
work_keys_str_mv AT luciagiagnolini explicitingcontextssemanticknowledgeextractionfromtraditionalarchivaldescriptions
AT andreaschimmenti explicitingcontextssemanticknowledgeextractionfromtraditionalarchivaldescriptions
AT paolobonora explicitingcontextssemanticknowledgeextractionfromtraditionalarchivaldescriptions
AT francescatomasi explicitingcontextssemanticknowledgeextractionfromtraditionalarchivaldescriptions