To obtain access to full text of journal and articles you must register!
- Article name
- Detection of confidential information and automatic classification of documents and text messages by confidentiality level (Review)
- Authors
- Sulavko A. E., , sulavich@mail.ru, Omsk State Technical University, Omsk, Russia
Panfilova I. E., , panfilova_2015@bk.ru, Samara State Technical University, Samara, Russia
Warkentin Yu. A., , varkentinyuri@gmail.com, Omsk State Technical University, Omsk, Russia
- Keywords
- large language models / feature extraction / information security / multilayer neural networks / transformers / privacy leak prevention systems
- Year
- 2024 Issue 4 Pages 18 - 27
- Code EDN
- YTMIVD
- Code DOI
- 10.52190/2073-2600_2024_4_18
- Abstract
- This study provides a systematic review of the scientific literature in the field of detection and classification of confidential information in a text stream. In particular, such problems as the classification of text documents, short messages, and simply recognizing the fact of the presence of confidential information in a text document or message are considered. Confidentiality is understood as any information that cannot be classified as publicly available in accordance with the law or the requirements of individuals or organizations. It is shown that in practice it is necessary to apply automatic machine learning methods to customize language models taking into account the specifics of sensitive information at each enterprise. Extracting features from text is an important step in building any text classification system. We tested 30 pre-trained libraries in the task of binary classification of text messages in Russian.
- Text
- BUY for read the full text of article
- Buy
- 500.00 rub