Extracting named entities from Russian-language documents with different expressiveness of structure
This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of t...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Yaroslavl State University
2023-12-01
|
Series: | Моделирование и анализ информационных систем |
Subjects: | |
Online Access: | https://www.mais-journal.ru/jour/article/view/1827 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839573191449640960 |
---|---|
author | Maria D. Averina Olga A. Levanova |
author_facet | Maria D. Averina Olga A. Levanova |
author_sort | Maria D. Averina |
collection | DOAJ |
description | This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86. |
format | Article |
id | doaj-art-fbbfca93b18744e09e6fa483a803860f |
institution | Matheson Library |
issn | 1818-1015 2313-5417 |
language | English |
publishDate | 2023-12-01 |
publisher | Yaroslavl State University |
record_format | Article |
series | Моделирование и анализ информационных систем |
spelling | doaj-art-fbbfca93b18744e09e6fa483a803860f2025-08-04T14:06:43ZengYaroslavl State UniversityМоделирование и анализ информационных систем1818-10152313-54172023-12-0130438239310.18255/1818-1015-2023-4-382-3931397Extracting named entities from Russian-language documents with different expressiveness of structureMaria D. Averina0Olga A. Levanova1P.G. Demidov Yaroslavl State UniversityP.G. Demidov Yaroslavl State UniversityThis work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.https://www.mais-journal.ru/jour/article/view/1827named entity extractioncrf |
spellingShingle | Maria D. Averina Olga A. Levanova Extracting named entities from Russian-language documents with different expressiveness of structure Моделирование и анализ информационных систем named entity extraction crf |
title | Extracting named entities from Russian-language documents with different expressiveness of structure |
title_full | Extracting named entities from Russian-language documents with different expressiveness of structure |
title_fullStr | Extracting named entities from Russian-language documents with different expressiveness of structure |
title_full_unstemmed | Extracting named entities from Russian-language documents with different expressiveness of structure |
title_short | Extracting named entities from Russian-language documents with different expressiveness of structure |
title_sort | extracting named entities from russian language documents with different expressiveness of structure |
topic | named entity extraction crf |
url | https://www.mais-journal.ru/jour/article/view/1827 |
work_keys_str_mv | AT mariadaverina extractingnamedentitiesfromrussianlanguagedocumentswithdifferentexpressivenessofstructure AT olgaalevanova extractingnamedentitiesfromrussianlanguagedocumentswithdifferentexpressivenessofstructure |