Extracting named entities from Russian-language documents with different expressiveness of structure

This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of t...

Full description

Saved in:
Bibliographic Details
Main Authors: Maria D. Averina, Olga A. Levanova
Format: Article
Language:English
Published: Yaroslavl State University 2023-12-01
Series:Моделирование и анализ информационных систем
Subjects:
Online Access:https://www.mais-journal.ru/jour/article/view/1827
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839573191449640960
author Maria D. Averina
Olga A. Levanova
author_facet Maria D. Averina
Olga A. Levanova
author_sort Maria D. Averina
collection DOAJ
description This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.
format Article
id doaj-art-fbbfca93b18744e09e6fa483a803860f
institution Matheson Library
issn 1818-1015
2313-5417
language English
publishDate 2023-12-01
publisher Yaroslavl State University
record_format Article
series Моделирование и анализ информационных систем
spelling doaj-art-fbbfca93b18744e09e6fa483a803860f2025-08-04T14:06:43ZengYaroslavl State UniversityМоделирование и анализ информационных систем1818-10152313-54172023-12-0130438239310.18255/1818-1015-2023-4-382-3931397Extracting named entities from Russian-language documents with different expressiveness of structureMaria D. Averina0Olga A. Levanova1P.G. Demidov Yaroslavl State UniversityP.G. Demidov Yaroslavl State UniversityThis work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.https://www.mais-journal.ru/jour/article/view/1827named entity extractioncrf
spellingShingle Maria D. Averina
Olga A. Levanova
Extracting named entities from Russian-language documents with different expressiveness of structure
Моделирование и анализ информационных систем
named entity extraction
crf
title Extracting named entities from Russian-language documents with different expressiveness of structure
title_full Extracting named entities from Russian-language documents with different expressiveness of structure
title_fullStr Extracting named entities from Russian-language documents with different expressiveness of structure
title_full_unstemmed Extracting named entities from Russian-language documents with different expressiveness of structure
title_short Extracting named entities from Russian-language documents with different expressiveness of structure
title_sort extracting named entities from russian language documents with different expressiveness of structure
topic named entity extraction
crf
url https://www.mais-journal.ru/jour/article/view/1827
work_keys_str_mv AT mariadaverina extractingnamedentitiesfromrussianlanguagedocumentswithdifferentexpressivenessofstructure
AT olgaalevanova extractingnamedentitiesfromrussianlanguagedocumentswithdifferentexpressivenessofstructure