Data analysis methods in astronomic objects classification (Sloan Digital Sky Survey DR14)
In the paper Sloan Digital Sky Survey DR14 dataset was investigated. It contains statistical information about many astronomical objects. The information was obtained within the framework of the Sloan Digital Sky Survey project. There are telescopes at the Earth surface, at the Earth orbit and in th...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | Russian |
Published: |
MIREA - Russian Technological University
2021-06-01
|
Series: | Российский технологический журнал |
Subjects: | |
Online Access: | https://www.rtj-mirea.ru/jour/article/view/329 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the paper Sloan Digital Sky Survey DR14 dataset was investigated. It contains statistical information about many astronomical objects. The information was obtained within the framework of the Sloan Digital Sky Survey project. There are telescopes at the Earth surface, at the Earth orbit and in the Lagrange points of some systems (Earth–Moon, Sun–Earth). The telescopes gain information in different frequency ranges. The large quantity of statistical information leads to the demand for analytical algorithms and systems capable of making classification. Such information is marked up well enough to build machine learning classification systems. The paper presents the results of a number of classifiers. The handled data contains measures of three types of astronomical objects of the Sloan Digital Sky Survey DR14 dataset (star, quasar, galaxy). The CART decision tree, logistic regression, naïve Bayes classifiers and ensembles of classifiers (random forest, gradient boosting) were implemented. Conclusions about special features of each machine learning classifier trained to solve this task are made at the end of the paper. In some cases, classifiers’ structure can be explained physically. The accuracy of the classifiers built in this research is more than 90% (metrics F1, precision and recall are implemented, because the classes are unbalanced). Taking these values into account classification task is supposed to be successfully solved. At the same time, the structure of classifiers and importance of features can be used as a physical explanation of the solution. |
---|---|
ISSN: | 2782-3210 2500-316X |