Approximate approach for frequent itemsets mining on massive distributed data beyond computing capacity Научная публикация
| Журнал |
Expert Systems with Applications
ISSN: 0957-4174 , E-ISSN: 1873-6793 |
||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Вых. Данные | Год: 2026, Том: 318, Номер статьи : 132043, Страниц : 15 DOI: 10.1016/j.eswa.2026.132043 | ||||||||||||||||||
| Ключевые слова | Parallel and distributed algorithms; Frequent itemsets mining; Spark; Big data analytics; Distributed data files; Sampling techniques | ||||||||||||||||||
| Авторы |
|
||||||||||||||||||
| Организации |
|
Реферат:
Frequent itemsets mining (FIM) is a fundamental task in data mining; however, traditional methods struggle with massive distributed data that exceeds available memory and computing resources. Mining frequent itemsets (FIs) from a massive static distributed data file (MSDDF) on a cluster with limited memory is therefore a challenging problem. In this paper, we propose Approximate Frequent Itemsets Mining (ApproxFIM), a novel two-stage solution that combines a new sampling method and an approximation approach to reduce computational cost under strict resource constraints. In the first stage, a bounded number of data blocks are randomly selected from the MSDDF and converted into representative random sample partitions. Theoretical guarantees are derived to bound the number of selected data blocks and to ensure the quality of the random sample, and prove that each constructed sample remains representative of the entire dataset. In the second stage, frequent itemsets are mined independently and in parallel from the sampled partitions using FP-Growth, and the resulting patterns are aggregated into a final approximate FIs set. ApproxFIM is implemented in Apache Spark using the Local Operations with Global Operations (LOGO) computing paradigm and evaluated on both real-world and synthetic datasets. Experimental results demonstrate that ApproxFIM scales effectively, significantly reduces memory and execution time requirements, and produces accurate approximations, making it well-suited for practical massive static distributed data mining on small clusters with limited resources.
Библиографическая ссылка:
Ngueilbaye A.
, Sibagatullin R.
, Cai Y.
, Mahmud M.S.
, Sun X.
, Nechesov A.
, Goncharov S.S.
, Huang J.Z.
Approximate approach for frequent itemsets mining on massive distributed data beyond computing capacity
Expert Systems with Applications. 2026. V.318. 132043 :1-15. DOI: 10.1016/j.eswa.2026.132043 WOS Scopus OpenAlex
Approximate approach for frequent itemsets mining on massive distributed data beyond computing capacity
Expert Systems with Applications. 2026. V.318. 132043 :1-15. DOI: 10.1016/j.eswa.2026.132043 WOS Scopus OpenAlex
Даты:
| Поступила в редакцию: | 13 окт. 2025 г. |
| Принята к публикации: | 10 мар. 2026 г. |
| Опубликована online: | 14 мар. 2026 г. |
| Опубликована в печати: | 1 июл. 2026 г. |
Идентификаторы БД:
| ≡ Web of science: | WOS:001721167100001 |
| ≡ Scopus: | 2-s2.0-105034621405 |
| ≡ OpenAlex: | W7135405357 |