Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Recursive hashing functions for n-grams
Cohen J. ACM Transactions on Information Systems15 (3):291-320,1997.Type:Article
Date Reviewed: Nov 1 1997

An n-gram is a sequence of n consecutive symbols taken from a stream of symbols, such as a document. In several contexts, such as indexing, retrieval, and comparison, it is necessary to hash-code every such n-gram. A good hash function should map the n-grams fairly evenly onto the address space. Such functions are computation-intensive.

The computation is eased if the hash value of one n-gram can be used to find the hash value of the next; after all, two consecutive n-grams overlap in n - 1 characters. Cohen collects such recursive hashing functions, generalizes the framework, improves some of the methods, proposes new ones, and compares them with respect to an even spread and the computation time. This includes extensive performance evaluations with texts of half a million characters, n between 3 and 10, and address space sizes of 8K, 32K, and 128K.

The paper is clearly written and provides 54 references to hashing and list applications.

Reviewer:  F. Gebhardt Review #: CR121106 (9711-0931)
Bookmark and Share
 
Indexing Methods (H.3.1 ... )
 
 
Hash-Table Representations (E.2 ... )
 
 
Probabilistic Algorithms (Including Monte Carlo) (G.3 ... )
 
 
Recurrences And Difference Equations (G.2.1 ... )
 
 
Information Search And Retrieval (H.3.3 )
 
 
Information Storage (H.3.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Indexing Methods": Date
Computation of term/document discrimination values by use of the cover coefficient
Can F. (ed), Ozkarahan E. Journal of the American Society for Information Science 38(3): 171-183, 1987. Type: Article
Mar 1 1988
Automatic indexing of full texts
Jonák Z. Information Processing and Management: an International Journal 20(5-6): 619-627, 1984. Type: Article
Jul 1 1985
Evaluation of access methods to text documents in office systems
Rabitti F., Zizka J.  Research and development in information retrieval (, King’s College, Cambridge,401984. Type: Proceedings
Sep 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy