Natural Text Anonymization Using Universal Transformer with a Self-attention

Статья в сборнике трудов конференции

The paper focuses on the anonymization of natural language text in Russian. The problem of anonymization is topical in connection with the need to conduct studies aimed at assessing the effectiveness and stability of methods of attribution of the text to its intentional distortion by various techniques of anonymization. The paper presents a technique for anonymizing a Russian text based on a fast correlation filter, dictionary synonymization and a universal transformer model with a self-attention mechanism. The automated system developed on its basis is tested on an experimental corpus of Russian texts. The texts obtained with its help are analyzed by the authorship identification system. The effectiveness of attribution of anonymous texts by a specialized software system was reduced to a level of random guessing, which allows to name the proposed methodology effective.

Библиографическая запись: Natural Text Anonymization Using Universal Transformer with a Self-attention [Electronic resource] / A. Romanov [et al.] // CEUR Workshop Proceedings. Proceedings of the III International Conference on Language Engineering and Applied Linguistics (PRLEAL-2019) (Saint Petersburg, Russia, November 27, 2019). – [Frankfurt am Main?] : CEUR-WS, 2020. – Vol. 2552. – P. 22-37.

Ключевые слова:

ANONYMIZATION AUTHORSHIP DEEP LEARNING

Конференция:

III International Conference on Language Engineering and Applied Linguistics (PRLEAL-2019)
Russia, Ленинградская область, Санкт-Петербург, 27-27 ноября 2019,
Международная

Издательство:

CEUR-WS

Germany, Frankfurt am Main

Авторы: Романов А. С., Мещеряков Р. В.

Год издания: 2020

Страницы: 22 - 37

Язык: Английский

Индексируется в Scopus, РИНЦ