Natural Language Processing-based Model for Log Anomaly Detection

Abstract

Logs are widely used in IT industry and the anomaly detection of logs is essential to identify the running status of systems. Conventional methods solving this problem require sophisticated rule-based regulations and intensive labor input. In this paper, we propose a new model based on natural language processing techniques. In order to modify the feature extraction and to improve the vector quality of log templates, Part-of-Speech (PoS) and Named Entity Recognition (NER) are employed in our model, which leads to the less involvement of regulation-based rule and a modification of the template vector thanks to the weight vector by NER. The PoS property of each token in the template is firstly analyzed, which also reduces labor involvement and helps for better weight allocation. The weight investigation on tokens of the template is introduced to modify the template vector. And the final detection based on the modified vector of templates is realized by deep neural networks (DNNs). The effectiveness of our model is tested on three datasets, and compared with two state-of-the-art models. The evaluation results prove that our model achieves better log anomaly detection.

Publication
2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence (SEAI)