This paper, drawn from the 1977 PhD thesis of the first author, surveys a variety of LR-parser table compression techniques. Results of applying these techniques to several grammars are presented. The paper is suitable for practitioners, as well as for an advanced course in compiler construction. I am not aware of any other easily accessible reference to the compression methods covered.
The authors describe the Graph Coloring Scheme (GCS), the Line Elimination Scheme (LES), the Row Displacement Scheme (RDS), the Significant Distance Scheme (SDS), the Row Column Scheme (RCS) and the Suppressed Zero Scheme (SZS). The latter two techniques result in data structures that require searching lists during parsing; the other four produce indexed structures.
Experiments were conducted with a variety of compression methods on grammars for ADA, AL, ALGOLW, ALGOL60, BALG, EULER, LEX, LIS, MINILEX, PASCAL, and XPL. The methods compared are GCS, LES, RDS, SDS, LES-GCS combined, LES-RDS combined, RCS and SZS, with two variants of GCS and LES-GCS differing by the coloring heuristic used. Table sizes, significant entry counts, and storage requirements are given for each method on each grammar. A summary table is given that compares the storage requirements. This table must be criticized for selecting the best results for each of the graph coloring heuristics on each grammar rather than consistently selecting values obtained via one method. In general, GCS appears to be the best of the methods described in terms of storage compaction. The authors recommend running the LES-GCS combination during development as it ran 10 times faster than GCS alone. The generation of the ADA tables on a SIEMENS 7.760 required 500 CPU seconds for GCS and 55 seconds for LES-GCS. The tables for LES-GCS were up to 10 percent larger than those obtained with GCS.
The chief weakness of this paper is the lack of sufficient illustrations. It would be an ideal survey paper for instruction were examples included for each technique. The issue of “portable compilers” in the title is not addressed well. The authors state that previous methods were not portable in that the table access could not be programmed in a high-level language, but they do not justify their position. On p. 551, I believe the authors meant to say “amax+1” rather than “smax” for all three occurrences. The PT vector in Figure 3 should have 0 instead of 2 for the first four entries. Figure 7 shows GCS producing 1401 and 3135 byte tables for EULER and PASCAL, respectively. According to Tables IV and V, these values should be 1614 and 3183.