Topic modeling in the stream of short messages in Russian
Objectives. This work is devoted to the topic modeling of short messages received through social networks or in another way in the form of a series of short messages. This need arises in public relations systems in state and municipal structures, in public opinion polling centers, as well as in cust...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | Russian |
Published: |
MIREA - Russian Technological University
2025-02-01
|
Series: | Российский технологический журнал |
Subjects: | |
Online Access: | https://www.rtj-mirea.ru/jour/article/view/1071 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Objectives. This work is devoted to the topic modeling of short messages received through social networks or in another way in the form of a series of short messages. This need arises in public relations systems in state and municipal structures, in public opinion polling centers, as well as in customer service systems and marketing departments. The aim of the work is to develop and experimentally test a set of algorithms for a thematic model for automatically determining the main topics of information exchange and typical messages illustrating these topics.Methods. The work uses methods of variable statistical distributions applied to collocation statistics and approaches typical for resolving problems of topic modeling of short texts, but applied to successive messages. In this way, online machine learning and topic modeling are considered jointly.Results. The work considered the construction of a thematic model in which clusters found with the presentation of their typical representatives and current weight can help decision-making in accordance with the subject of these most important messages. The proposed method was experimentally tested on a corpus of real messages. The results of topic modeling (the constructed thematic models) are consistent with the results obtained manually. The messages selected illustrate that the topics with the highest weight are seen as such from the point of view of human experts.Conclusions. The proposed algorithm of topic modeling allows the most important topics in current communication to be automatically identified. It shows posts that serve as indicators of these topics, and thereby significantly simplifies the solution of the problem. |
---|---|
ISSN: | 2782-3210 2500-316X |