Real Time Intrusion Detection System Based on Web Log File Analysis

Web log data have a wealth of useful data about a website. They contain the history of all users’ activities while accessing websites.  Some log files contain records of various intrusion types that refer to unauthorized or malicious activities recorded during website access. System and network log...

Full description

Saved in:
Bibliographic Details
Main Authors: Rawand Raouf Abdalla, Alaa Khalil Jumaa, Ahmad Freidoon Fadhil
Format: Article
Language:English
Published: Sulaimani Polytechnic University 2025-02-01
Series:Kurdistan Journal of Applied Research
Subjects:
Online Access:https://kjar.spu.edu.iq/index.php/kjar/article/view/977
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Web log data have a wealth of useful data about a website. They contain the history of all users’ activities while accessing websites.  Some log files contain records of various intrusion types that refer to unauthorized or malicious activities recorded during website access. System and network logs are examined as part of log file analysis for Intrusion Detection Systems (IDS) to identify suspicious activities and possible security risks. Many existing IDS systems suffer from false positives and false negatives, which can either fail to identify real dangers or overwhelm administrators with unnecessary alarms. Real-time cyberattacks are common, and any delay in detection can lead to serious consequences like data breaches and system outages. In this paper, we developed a real time IDS based on weblog analysis which is used to predict if the user’s request is an attack, normal, or suspicious. This can be done by utilizing the contents of the Apache access log data, considering some of the hyper text transfer protocol request features obtained by analyzing the user’s requests.  In this work, various data preprocessing techniques are applied, and key features are extracted, enhancing the system's ability to effectively detect intrusions. The model was constructed using four machine learning algorithms: gradient-boosted trees, decision tree, random forest, and support vector machine. According to the results obtained, the proposed model with the random forest algorithm produces the most accurate model among the others. It attained 99.66% precision, 99.66% recall, and 99.83% accuracy score.
ISSN:2411-7684
2411-7706