Sciact
  • EN
  • RU

Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward Научная публикация

Журнал Journal of Quantitative Linguistics
ISSN: 1744-5035
Вых. Данные Год: 2024, Том: 31, Номер: 1, Страницы: 1-18 Страниц : 18 DOI: 10.1080/09296174.2023.2275342
Ключевые слова change-point detection
Авторы Abebe Berhane 1,2 , Chebunin Mikhail 1,3 , Kovalevskii Artyom 1,4,5
Организации
1 Novosibirsk State University
2 Mainefhi College of Science
3 Karlsruhe Institute of Technology, Institute of Stochastics
4 Sobolev Institute of Mathematics
5 Novosibirsk State Technical University

Информация о финансировании (1)

1 Институт математики им. С.Л. Соболева СО РАН FWNF-2022-0010

Реферат: The paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods.
Библиографическая ссылка: Abebe B. , Chebunin M. , Kovalevskii A.
Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward
Journal of Quantitative Linguistics. 2024. V.31. N1. P.1-18. DOI: 10.1080/09296174.2023.2275342 WOS Scopus OpenAlex
Даты:
Опубликована online: 12 нояб. 2023 г.
Опубликована в печати: 15 янв. 2024 г.
Идентификаторы БД:
Web of science: WOS:001100158000001
Scopus: 2-s2.0-85176726465
OpenAlex: W4388608298
Цитирование в БД:
БД Цитирований
OpenAlex 2
Web of science 2
Scopus 3
Альметрики: