Statistical tests for text homogeneity: using forward and backward processes of numbers of different words Information message
Journal |
Glottometrics
ISSN: 1617-8351 , E-ISSN: 2625-8226 |
||||||||
---|---|---|---|---|---|---|---|---|---|
Output data | Year: 2022, Volume: 53, Pages: 42-58 Pages count : 17 DOI: 10.53482/2022_53_401 | ||||||||
Tags | Zipf’s law, weak convergence, Gaussian process, statistical test, text homogeneity, urn model. | ||||||||
Authors |
|
||||||||
Affiliations |
|
Abstract:
The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.
Cite:
Abebe B.
, Chebunin M.
, Kovalevskii A.
, Zakrevskaya N.
Statistical tests for text homogeneity: using forward and backward processes of numbers of different words
Glottometrics. 2022. V.53. P.42-58. DOI: 10.53482/2022_53_401 WOS Scopus РИНЦ OpenAlex
Statistical tests for text homogeneity: using forward and backward processes of numbers of different words
Glottometrics. 2022. V.53. P.42-58. DOI: 10.53482/2022_53_401 WOS Scopus РИНЦ OpenAlex
Dates:
Published print: | Jan 1, 2024 |
Identifiers:
Web of science: | WOS:000975069100003 |
Scopus: | 2-s2.0-85146477404 |
Elibrary: | 59204255 |
OpenAlex: | W4316923420 |