Finding similarity measurement techniques for Web data records containing user-generated content or narrative text is an active area of research. This paper explains one such attempt, even though the title of paper does not suggest that explicitly. The paper points out some of the limitations of the mining data records (MDR) approach by Liu et al. [1], and suggests improvements. The authors prove their point with a case study.
While the authors have tried to provide as much information as possible, readers would require reasonable domain knowledge to follow the reported work. For example, they discuss “post” and “page” at great length without pointing out what they mean; similarly, the acronym DOM is used with no explanation. Interestingly, the reported approach does not seem to use the user-generated narrative content, but the metadata associated with the posts. Perhaps space constraints contributed to the missing information.