Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions
Archival finding aids are often only partially capable of fully expressing the informational potential of data due to the presence of numerous unstructured fields in the descriptions of documentary collections. The prevalence of extensive literal sections, or full-text fields, limits both the possib...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Bologna
2025-07-01
|
Series: | Umanistica Digitale |
Subjects: | |
Online Access: | https://umanisticadigitale.unibo.it/article/view/21229 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839626482822938624 |
---|---|
author | Lucia Giagnolini Andrea Schimmenti Paolo Bonora Francesca Tomasi |
author_facet | Lucia Giagnolini Andrea Schimmenti Paolo Bonora Francesca Tomasi |
author_sort | Lucia Giagnolini |
collection | DOAJ |
description | Archival finding aids are often only partially capable of fully expressing the informational potential of data due to the presence of numerous unstructured fields in the descriptions of documentary collections. The prevalence of extensive literal sections, or full-text fields, limits both the possibility of semantic queries and the ability to uncover the latent contexts embedded in such unstructured text. This study proposes a methodology for the automatic extraction of knowledge (Knowledge Extraction, KE) from archival descriptions, aiming to enhance their structuring and semantic interoperability. Through a case study based on the Italian National Archival System (SAN) and leveraging ready-to-use tools such as TINT, FRED, and GPT-4o, we conducted a preliminary evaluation of various morphosyntactic, lexical, and semantic analysis techniques. The most promising results highlighted the potential of Large Language Models (LLMs), leading to the development of a KE pipeline based on the open-source model Llama 3.3. The findings demonstrate a high capacity for extracting biographical events and relationships, achieving a good balance between precision and recall, thus confirming the validity of the approach. However, the need for a more robust software architecture emerges, as LLM-based pipelines must become truly scalable to enable effective integration into archival systems. |
format | Article |
id | doaj-art-91f4d34cb739459d9b6485d0f3c00a85 |
institution | Matheson Library |
issn | 2532-8816 |
language | English |
publishDate | 2025-07-01 |
publisher | University of Bologna |
record_format | Article |
series | Umanistica Digitale |
spelling | doaj-art-91f4d34cb739459d9b6485d0f3c00a852025-07-17T10:32:19ZengUniversity of BolognaUmanistica Digitale2532-88162025-07-012011514410.6092/issn.2532-8816/2122919606Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival DescriptionsLucia Giagnolini0https://orcid.org/0000-0002-4876-2691Andrea Schimmenti1https://orcid.org/0000-0001-7865-7537Paolo Bonora2https://orcid.org/0000-0001-8337-3379Francesca Tomasi3https://orcid.org/0000-0002-6631-8607Università di BolognaUniversità di BolognaUniversità di BolognaUniversità di BolognaArchival finding aids are often only partially capable of fully expressing the informational potential of data due to the presence of numerous unstructured fields in the descriptions of documentary collections. The prevalence of extensive literal sections, or full-text fields, limits both the possibility of semantic queries and the ability to uncover the latent contexts embedded in such unstructured text. This study proposes a methodology for the automatic extraction of knowledge (Knowledge Extraction, KE) from archival descriptions, aiming to enhance their structuring and semantic interoperability. Through a case study based on the Italian National Archival System (SAN) and leveraging ready-to-use tools such as TINT, FRED, and GPT-4o, we conducted a preliminary evaluation of various morphosyntactic, lexical, and semantic analysis techniques. The most promising results highlighted the potential of Large Language Models (LLMs), leading to the development of a KE pipeline based on the open-source model Llama 3.3. The findings demonstrate a high capacity for extracting biographical events and relationships, achieving a good balance between precision and recall, thus confirming the validity of the approach. However, the need for a more robust software architecture emerges, as LLM-based pipelines must become truly scalable to enable effective integration into archival systems.https://umanisticadigitale.unibo.it/article/view/21229linked open dataarchivesinformation retrievalknowledge extractionknowledge representationsupervised annotationarchival contextsaiucd2024 |
spellingShingle | Lucia Giagnolini Andrea Schimmenti Paolo Bonora Francesca Tomasi Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions Umanistica Digitale linked open data archives information retrieval knowledge extraction knowledge representation supervised annotation archival contexts aiucd2024 |
title | Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions |
title_full | Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions |
title_fullStr | Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions |
title_full_unstemmed | Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions |
title_short | Expliciting Contexts: Semantic Knowledge Extraction from Traditional Archival Descriptions |
title_sort | expliciting contexts semantic knowledge extraction from traditional archival descriptions |
topic | linked open data archives information retrieval knowledge extraction knowledge representation supervised annotation archival contexts aiucd2024 |
url | https://umanisticadigitale.unibo.it/article/view/21229 |
work_keys_str_mv | AT luciagiagnolini explicitingcontextssemanticknowledgeextractionfromtraditionalarchivaldescriptions AT andreaschimmenti explicitingcontextssemanticknowledgeextractionfromtraditionalarchivaldescriptions AT paolobonora explicitingcontextssemanticknowledgeextractionfromtraditionalarchivaldescriptions AT francescatomasi explicitingcontextssemanticknowledgeextractionfromtraditionalarchivaldescriptions |