Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Logical Structure Analysis and Generation for Structured Documents: A Syntactic Approach
Lee K., Choy Y., Cho S. IEEE Transactions on Knowledge and Data Engineering15 (5):1277-1294,2003.Type:Article
Date Reviewed: Sep 13 2004

Lee, Choy, and Cho describe a system for recovering document structure from document presentation. Given an optically scanned representation of a journal paper, or similar, their system builds a Standard Generalized Markup Language/Extensible Markup Language (SGML/XML) structured form of the text.

The structure is derived by first building a “functional structure tree” from the scanned representation, then repeatedly traversing this tree in order to refine and merge components until a “logical tree structure” is obtained. This final tree is then converted to an XML tree representation. Only one document type definition (DTD) is used for the final representation.

The paper reports on early experiments with the system, involving the scanning of some 26 technical journal papers. An accuracy rate of nearly 99 percent is claimed, which compares favorably with other, similar reported work.

A restriction of the process is that it is targeted at a limited range of material, and document components like tables and figures are ignored. It is of interest to note that the one percent of errors arise largely because of misclassified document components, such as figure captions and equations. The authors point to future work that will involve a larger range of material, and it will be interesting to see how the method extends to more complex and varied document structures.

I did experience one frustration in reading the paper: nearly all of the figures appear several pages after they are referenced in the text.

]]
Reviewer:  John Hurst Review #: CR130117 (0502-0284)
Bookmark and Share
 
Document Management (I.7.1 ... )
 
 
Data Models (H.2.1 ... )
 
 
XML (I.7.2 ... )
 
 
Logical Design (H.2.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Document Management": Date
XRel: a path-based approach to storage and retrieval of XML documents using relational databases
Yoshikawa M., Amagasa T., Shimura T., Uemura S. ACM Transactions on Internet Technology 1(1): 110-141, 2001. Type: Article
Mar 1 2002
FileNet: a consultant’s guide to enterprise content management
Groff T., Jones T., Butterworth-Heinemann, Newton, MA, 2004. Type: Book (9780750678162)
Dec 22 2004

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy