Wikis seem to bring out a love-it-or-leave-it response in many people. As a maintainer of a wiki, I find that a lot of potential users will just not go near it. Similarly, in an educational context, when I set up wiki-based assignments for students, I’m surprised to find that some students will not use them, even when there are easy marks attached.
This paper offers some insight into these attitudes. It explores how wiki edits can be automatically categorized into groups, such as “minor typo corrections” and “substantial structural changes,” and it takes the usual Unix diff-based version changes that are available on most wikis significantly further.
The algorithms developed in this paper address several different areas, in order to realize these categorizations. First, there is the lexical level. In wikis, this is compounded by the need to handle the embedded markup and make sure that this markup does not interfere with the text differencing analysis--the next stage of analysis. This is done at both the token and the sentence levels, to ensure that higher-level similarities--as well as lower-level differences--in the texts are extracted. Higher-level structural changes are examined by an action categorizer, which extracts changes due to block moves of text, the rearrangement of sentences, and so forth. Finally, all of these differences are categorized by a history summarizer, which attempts to classify changes as layout changes, paragraph reorganizations, grammatical corrections, and, at the lowest level, spelling corrections.
The output of the analyzer gives a diff-like view of the two documents, but with higher-level structural changes and low-level sentence- and word-level alterations identified for what they are. Close agreement in the categorization of changes was observed when the outputs were compared with human evaluations of the document changes.
The long-term advantages of the authors’ work will allow wiki users to gain an appreciation of how document changes affect the document itself, as well as the likely significance of such changes to the reader.
If you at all use wikis for serious information-sharing purposes, you should read this paper. The authors offer their code as open source, so we can hope that wiki developers will eventually take up the approaches outlined in this paper.