Sciact
  • EN
  • RU

Statistical tests for text homogeneity: using forward and backward processes of numbers of different words Информационное сообщение

Журнал Glottometrics
ISSN: 1617-8351 , E-ISSN: 2625-8226
Вых. Данные Год: 2022, Том: 53, Страницы: 42-58 Страниц : 17 DOI: 10.53482/2022_53_401
Ключевые слова Zipf’s law, weak convergence, Gaussian process, statistical test, text homogeneity, urn model.
Авторы Abebe Berhane 1,2 , Chebunin Mikhail 3,2 , Kovalevskii Artyom 4,2 , Zakrevskaya Natalia 4
Организации
1 Mainefhi College of Science, Mainefhi, Eritrea
2 Novosibirsk State University, Novosibirsk, Russia
3 Karlsruhe Institute of Technology, Institute of Stochastics, Karlsruhe, Germany
4 Novosibirsk State Technical University, Novosibirsk, Russia

Реферат: The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.
Библиографическая ссылка: Abebe B. , Chebunin M. , Kovalevskii A. , Zakrevskaya N.
Statistical tests for text homogeneity: using forward and backward processes of numbers of different words
Glottometrics. 2022. V.53. P.42-58. DOI: 10.53482/2022_53_401 WOS Scopus РИНЦ OpenAlex
Даты:
Опубликована в печати: 1 янв. 2024 г.
Идентификаторы БД:
Web of science: WOS:000975069100003
Scopus: 2-s2.0-85146477404
РИНЦ: 59204255
OpenAlex: W4316923420
Цитирование в БД:
БД Цитирований
Scopus 4
Web of science 3
OpenAlex 3
Альметрики: