Computing Reviews, the leading online review service for computing literature.

Search

Logical Structure Analysis and Generation for Structured Documents: A Syntactic Approach
Lee K., Choy Y., Cho S. IEEE Transactions on Knowledge and Data Engineering15 (5):1277-1294,2003.Type:Article

Date Reviewed: Sep 13 2004

Lee, Choy, and Cho describe a system for recovering document structure from document presentation. Given an optically scanned representation of a journal paper, or similar, their system builds a Standard Generalized Markup Language/Extensible Markup Language (SGML/XML) structured form of the text. The structure is derived by first building a “functional structure tree” from the scanned representation, then repeatedly traversing this tree in order to refine and merge components until a “logical tree structure” is obtained. This final tree is then converted to an XML tree representation. Only one document type definition (DTD) is used for the final representation. The paper reports on early experiments with the system, involving the scanning of some 26 technical journal papers. An accuracy rate of nearly 99 percent is claimed, which compares favorably with other, similar reported work. A restriction of the process is that it is targeted at a limited range of material, and document components like tables and figures are ignored. It is of interest to note that the one percent of errors arise largely because of misclassified document components, such as figure captions and equations. The authors point to future work that will involve a larger range of material, and it will be interesting to see how the method extends to more complex and varied document structures. I did experience one frustration in reading the paper: nearly all of the figures appear several pages after they are referenced in the text. ]]

Reviewer: John Hurst	Review #: CR130117 (0502-0284)

Document Management (I.7.1 ... )

Data Models (H.2.1 ... )

XML (I.7.2 ... )

Logical Design (H.2.1 )

Would you recommend this review?

yes

Other reviews under "Document Management":	Date

XRel: a path-based approach to storage and retrieval of XML documents using relational databases Yoshikawa M., Amagasa T., Shimura T., Uemura S. ACM Transactions on Internet Technology 1(1): 110-141, 2001. Type: Article	Mar 1 2002

FileNet: a consultant’s guide to enterprise content management Groff T., Jones T., Butterworth-Heinemann, Newton, MA, 2004. Type: Book (9780750678162)	Dec 22 2004

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy