A method for identifying relevant topics of pilot simulator training based on clustering of flight safety reports

Natural language processing (NLP) technologies, in one of their applications, provide effective research of patterns and trends in large sets of textual data. Textual safety data presented in the form of accident investigation reports is a promising object for extracting new useful information that...

Full description

Saved in:
Bibliographic Details
Main Authors: Z. R. Zabbarov, A. K. Volkov
Format: Article
Language:Russian
Published: Moscow State Technical University of Civil Aviation 2024-08-01
Series:Научный вестник МГТУ ГА
Subjects:
Online Access:https://avia.mstuca.ru/jour/article/view/2400
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Natural language processing (NLP) technologies, in one of their applications, provide effective research of patterns and trends in large sets of textual data. Textual safety data presented in the form of accident investigation reports is a promising object for extracting new useful information that can be used both in flight safety management and in the framework of simulator training. This paper discusses the application of NLP technologies for the study of the body of flight safety reports of PJSC Aeroflot – Russian Airlines. The aim of the work is to develop a method for identifying relevant topics of simulator training for pilots. The paper presents an analysis of existing foreign works in the field of intellectual analysis of textual information in civil aviation. It has been revealed that NLP technologies are actively used abroad to study flight safety reports. The paper presents a scheme of a method for identifying relevant topics of pilot simulator training based on clustering of flight safety reports. The procedures of text preprocessing and the construction of its vector space are described. The scientific novelty of the approach is that, unlike previous works, it is proposed to use a full vector representation of flight safety reports, which is built by combining matrices of thematic and semantic vectors. The proposed method has been tested. The analyzed corpus of texts amounted to 1080 reports. As a result of the clustering algorithm, 36 clusters were identified, which were then visualized using the algorithms t-distributed stochastic embedding of neighbors (t-SNE). The practical significance of the research results lies in the fact that the approach based on clustering of reports will allow for a more in-depth analysis of flight safety reports, which can simplify and speed up the work of both safety management specialists and flight simulator instructors.
ISSN:2079-0619
2542-0119