PoSBert: Log Classification via Modified Bert Based on Part-of-Speech Weight

Abstract

Logs that record the information of events conducted by clusters are of great importance to the system maintenance. The log classification is also of great value to help engineers monitor the system running status, predict the log anomaly, and take corresponding measures. In order to comprehensively consider the various significance of each token in the template and build a comprehensive representation of the log template, we propose a natural language processing-based model, PoSBert, which is a modified Bert model based on the part-of-speech (PoS) analysis of the tokens of log template. The PoS analysis is firstly carried out and the PoS-based weight embedding is obtained, which realizes the proper weight allocation and helps modify the representation of log templates. The segmented embedding of the vanilla Bert is replaced by this PoS-based weight embedding, which is of more practical significance compared to the segment embeddings in the domain of log classification. The PoS analysis and PoS-based weight embedding provide the Bert with more informative inputs which promotes the model to make better results. Four public datasets and one dataset collected from a practical warning platform are employed for evaluation. The results prove that our model can achieve a better classification result in the log classification domain.

Publication
2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)