Sciact
  • EN
  • RU

Statistical tests for text homogeneity: using forward and backward processes of numbers of different words Information message

Journal Glottometrics
ISSN: 1617-8351 , E-ISSN: 2625-8226
Output data Year: 2022, Volume: 53, Pages: 42-58 Pages count : 17 DOI: 10.53482/2022_53_401
Tags Zipf’s law, weak convergence, Gaussian process, statistical test, text homogeneity, urn model.
Authors Abebe Berhane 1,2 , Chebunin Mikhail 3,2 , Kovalevskii Artyom 4,2 , Zakrevskaya Natalia 4
Affiliations
1 Mainefhi College of Science, Mainefhi, Eritrea
2 Novosibirsk State University, Novosibirsk, Russia
3 Karlsruhe Institute of Technology, Institute of Stochastics, Karlsruhe, Germany
4 Novosibirsk State Technical University, Novosibirsk, Russia

Abstract: The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.
Cite: Abebe B. , Chebunin M. , Kovalevskii A. , Zakrevskaya N.
Statistical tests for text homogeneity: using forward and backward processes of numbers of different words
Glottometrics. 2022. V.53. P.42-58. DOI: 10.53482/2022_53_401 WOS Scopus РИНЦ OpenAlex
Dates:
Published print: Jan 1, 2024
Identifiers:
Web of science: WOS:000975069100003
Scopus: 2-s2.0-85146477404
Elibrary: 59204255
OpenAlex: W4316923420
Citing:
DB Citing
Scopus 4
Web of science 3
OpenAlex 3
Altmetrics: