Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Methods for finding frequent items in data streams
Cormode G., Hadjieleftheriou M. The VLDB Journal: The International Journal on Very Large Data Bases19 (1):3-20,2010.Type:Article
Date Reviewed: Aug 10 2010

A large class of real-world applications, with a tight relation to large-scale industrial systems, requires the processing of data that arrives in the form of streams. Data stream mining is a challenging research problem due to the high volume of the data and the enormous rate in which it is produced by data generation processes, as well as the fact that data processing has to be done on-the-fly (as the data occurs) to allow for a timely analysis of the observed trends. Several research directions of this problem have been studied over the years.

Cormode and Hadjieleftheriou provide an interesting survey on the research methods that have been proposed since the 1980s for the mining of frequent items in data streams. Simply stated, given a sequence of items, the problem of frequent item mining is to identify those items that occur most frequently in the sequence. Several variations of this problem have been studied, assuming the existence of importance (positive or negative) weights on the items. In this work, the authors partitioned the existing methodologies for finding frequent items into three broad classes: counter-based algorithms, quantile algorithms, and sketch algorithms. For each class, they produced baseline implementations of the most important algorithms and performed a thorough experimental evaluation to test the performance of the algorithms on different datasets.

The paper is well written, and leads to several important observations about the quality of the different algorithms that have been proposed for mining frequent items. It is a must-read for anyone interested in data stream mining.

Reviewer:  Aris Gkoulalas-Divanis Review #: CR138236 (1012-1274)
Bookmark and Share
 
Sorting And Searching (F.2.2 ... )
 
 
Concept Learning (I.2.6 ... )
 
 
Data Mining (H.2.8 ... )
 
 
Database Administration (H.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Sorting And Searching": Date
Binary search trees of almost optimal height
Anderson A., Icking C., Klein R., Ottmann T. (ed) Acta Informatica 28(2): 165-178, 1990. Type: Article
May 1 1992
More nearly optimal algorithms for unbounded searching, part II
Reingold E., Shen X. SIAM Journal on Computing 20(1): 184-208, 1991. Type: Article
Apr 1 1992
A simple linear-time algorithm for in situ merging
Mannila H., Ukkonen E. Information Processing Letters 18(4): 203-208, 1984. Type: Article
Feb 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy