Linguistic modeling is a method of structural or syntax modeling of some markup language elements that constitute classes of patterns with specified relations, which are automatically recognized by the system. A main purpose of implementing a metalanguage such as Extensible Markup Language (XML) is to describe the data information, from the highly structured to unstructured, in order to store it in database spreadsheets or in ordinary documents. Despite the considerable attention it received in the late 1990s, the enthusiasm has died down a little. The present usage is more focused on the development of markup languages and applications for processing and modeling information in a general sense. There are many markup languages and implementations, from those created for business processes and mathematical modeling, through technical document writing and dictionary building, to encoding data between systems and formatting e-books.
This 13-chapter book is from the Springer “Text, Speech and Language Technology” series. The book is a summary of a research project carried out by the German Research Foundation (DFG) and Bielefeld University’s Center for Interdisciplinary Research (ZiF). Each chapter is by different authors, and the chapter format follows the style of a research paper--abstract, keywords, introduction, sections, conclusions, and references--so it reads more like a conference proceedings volume than a book. However, the chapters are of a somewhat theoretical or descriptive nature. The authors state in the preface that the issues described in this book “could also benefit from the presentations and discussion at the conference ‘Modeling Linguistic Information Resources,’” also held by ZiF.
Each chapter covers different aspects of document and database structure, including their levels, annotations, and XML language types. The first chapter is an introduction to markup languages. It provides general knowledge for grouping information in XML documents and proposes using logical predicates. Chapter 2 presents the visualization of documents achieved by transformation from encoding to graphical descriptions. The next chapter considers a Web ontology language in domain linguistics for Web semantics, supported by examples. Chapter 4 is quite interesting. It outlines international aspects of XML and its standardization, and shows readers how to create a multilingual corpus. Chapters 5 and 6 provide a more detailed description of structure levels. From there, the book addresses more general issues, such as different types of annotations for information extraction from a corpus in chapter 7, the Web genres in chapter 8, and language resource models in chapter 9. Chapter 10 presents a framework for hypertext transformation rules, and chapter 11 discusses techniques for the conversion of different text types into hypertext networks. Chapter 12 illustrates the graphical modeling of hypertext structures. The last chapter covers linguistic treebanks defined by a logic of document units.
This tutorial is appropriate for undergraduate students who have a general knowledge of XML formalism.