Sciact
  • EN
  • RU

Hapax legomena via stochastic processes Full article

Journal Glottometrics
ISSN: 1617-8351 , E-ISSN: 2625-8226
Output data Year: 2024, Volume: 56, Pages: 22-39 Pages count : 18 DOI: 10.53482/2024_56_415
Tags Zipf’s law, statistical test, mathematical expectation, limit theorems.
Authors Fayzullayev Shahzod 1,2 , Kovalevskii Artyom 1,3,4
Affiliations
1 Novosibirsk State University, Novosibirsk, Russia
2 Urgench State University, Urgench, Uzbekistan
3 Sobolev Institute of Mathematics, Novosibirsk, Russia
4 Novosibirsk State Technical University, Novosibirsk, Russia

Funding (1)

1 Sobolev Institute of Mathematics FWNF-2022-0010

Abstract: We study the number of words that occur exactly once since the beginning of a text. We model it as a stochastic process over the length of the text. The elementary probability model, going back to Bahadur and Karlin, states that the number of words that occur exactly once should grow according to a power law, like the number of different words. The final value of the number of words occurring exactly once is the number of hapaxes of this text. We construct two statistical tests to test Karlin’s model under the assumption that the probabilities of words in this model satisfy the generalized Zipf’s law. These statistical tests show that some texts fit the model well, but many texts deviate significantly from it. This deviation is that the number of hapaxes is too small relative to the number of different words.
Cite: Fayzullayev S. , Kovalevskii A.
Hapax legomena via stochastic processes
Glottometrics. 2024. V.56. P.22-39. DOI: 10.53482/2024_56_415 WOS Scopus РИНЦ OpenAlex
Dates:
Published print: Jul 22, 2024
Published online: Jul 22, 2024
Identifiers:
Web of science: WOS:001274055400002
Scopus: 2-s2.0-85200776162
Elibrary: 68935439
OpenAlex: W4400898031
Citing: Пока нет цитирований
Altmetrics: