Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
An analysis of the longest match and the greedy heuristics in text encoding
Katajainen J., Raita T. Journal of the ACM39 (2):281-294,1992.Type:Article
Date Reviewed: Mar 1 1993

The authors devote this research paper to the analysis of two text encoding techniques, defined by the longest match (LM) and greedy (G) heuristics. Those techniques are considered over appropriate dictionaries; the dictionary is viewed as a set of pairs (string, code of string), with the strings over a suitable alphabet.

Given a string over an alphabet, the problem of optimal compression may be formulated as a suitable storage minimization problem for the string. Formally, the technique used in the paper is based on a network representation of the dictionary; thus, a space-optimal problem for the encoding may be treated as a problem of finding a shortest path between a pair of vertices in a network (see Schuegraf and Heaps [1] and Storer [2]).

The first heuristic, the LM algorithm, is a direct choice of the longest dictionary string that matches the original text, processing the string from left to right. In particular, the authors prove that in the case of a code-uniform suffix dictionary (that is, the length of the codes is constant and the dictionary is closed under the taking of suffixes for all dictionary strings), the LM algorithm is optimal for any string. This result is a corollary of the authors’ estimation of the ratio C/O, where C is the length of compressed string and O stands for optimal storage length.

The G algorithm [3] is based on the best local compression criterion, defined as the difference between the number of bits in the input string and the corresponding code. The LM and G algorithms are, in general, incompatible modulo compression ratio. For any code-uniform dictionary, the methods are equipotent. For general dictionaries, the authors show that the algorithms behave similarly. All the precise results deal with upper boundaries for the compression ratio (both algorithms are considered). The authors pointed out that one problem is still open--what is a precise upper bound for the LM methods for suffix dictionaries?

It would be interesting to apply this technique to adaptive compression methods (with nonuniform dynamic dictionaries) and, more generally, to prefix dictionaries. A dictionary adjusted for a given set of database statistics would be  another  interesting application, since the actual compression for databases is applied in practice to dictionary features only.

The paper is well written and easy to read. It provides necessary details and is well organized. It will be useful for those who are interested in text compression, including adjusting heuristics for databases.

Reviewer:  S. Berger Review #: CR116281
1) Schuegraf, E. J. and Heaps, H. S. A comparison of algorithms for database compression by use of fragments as language elements. Inf. Stor. Ret. 10 (1974), 309–319.
2) Storer, J. A. Data compression, methods and theory. Computer Science Press, Baltimore, MD, 1988.
3) Gonzalez-Smith, M. E. and Storer, J. A. Parallel algorithms for data compression. J. ACM 32, 2 (April 1985), 344–373.
Bookmark and Share
 
Data Compaction And Compression (E.4 ... )
 
 
Path And Circuit Problems (G.2.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Compaction And Compression": Date
Data compression (3rd ed.)
Held G., John Wiley & Sons, Inc., New York, NY, 1991. Type: Book (9780471929413)
Apr 1 1992
Data compression using an intelligent generator: the storage of chess games as an example
Althöfer I. Artificial Intelligence 52(1): 109-113, 1991. Type: Article
Jan 1 1993
Practical dictionary management for hardware data compression
Bunton S., Borriello G. Communications of the ACM 35(1): 95-104, 1992. Type: Article
Sep 1 1993
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy