Computing Reviews

An evaluation method of words tendency depending on time-series variation and its improvements
Atlam E., Okada M., Shishibori M., Aoe J. Information Processing and Management: an International Journal38(2):157-171,2002.Type:Article
Date Reviewed: 06/03/03

Text searching techniques usually calculate the degree of similarity between the user’s input text and the retrieved texts. Ohkubo et al. [1] proposed a method to estimate search keyword popularity in a given period of time, and demonstrated that word groups connected with search words change according to the time when the search is performed.

Starting from this approach, the authors of this paper propose a new method to automatically estimate the stability classes that indicate a word’s popularity, with time-series variation based on the frequency change in past searched texts.

The method is performed by following several steps: define attributes to represent the type of word frequency change; produce learning examples based on these attributes; manually classify the training examples into three stability classes; use the C4.5 learning algorithm [2] to build a decision tree; and then use the tree to determine the stability of test data.

The authors present experimental results based on CNN articles on professional baseball, which show that the proposed approach is able to correctly classify test data with an accuracy of between 0.768 and 0.847.

The paper is clearly written, with relevant figures and tables. However, the words used as an example in figure 3 and table 1 differ, making the explanation confusing. The paper also contains an error in figure 4: the graph curves are not visible.


1)

Ohkubo, M. Extracting information demand by analyzing a WWW search login. Transactions of Information Processing Society of Japan 39, 7(1998), 2250–2258.


2)

Quinlan, J.R. C4.5: programs for machine learning. Morgan Kaufmann, Los Altos, CA, 1993.

Reviewer:  A. Florea Review #: CR127713 (0309-0954)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy