Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Hybrid neural network and case based reasoning system for Web user behavior clustering and classification
Zehraoui F., Kanawati R., Salotti S. International Journal of Hybrid Intelligent Systems7 (3):171-186,2010.Type:Article
Date Reviewed: Mar 21 2011

The question of how to classify visitors of e-commerce systems into buyers and nonbuyers is of great importance in today’s world of pervasive e-commerce. The typical behavior of visitors to an e-commerce Web site can be modeled by a sequence of page access and associated actions. A case-based reasoning (CBR) system accumulates past cases and predicts the behavior of new visits based on those behavior patterns. A common drawback of CBR is that it does not handle temporal sequence very well. On the other hand, an adaptive neural network system can formulate clusters of cases through learning and then classify the new cases using a threshold of closeness to these clusters. In their paper, the authors describe a hybrid system that combines a neural network and CBR in order to classify visitors to e-commerce systems.

The paper starts with a focused and detailed review of CBR and its hybrid systems combined with a neural network, along with how the system might apply to sequence processing. The three major tasks of CBR are case representation, case retrieval, and case reuse. The tasks of a hybrid system would be case indexing, case selection, and case adaptation.

The authors then describe CASEP2, their proposed hybrid system. The system operates in one of two modes. In the offline construction mode, the system takes an existing collection of e-commerce visitor cases as input and builds the indexing of cases and prototypes using adaptive neural networks. In the online use mode, a current sequence of user behavior is taken as the input. The adaptive neural network (ANN) part of the system examines the input and generates a classification when the visitor falls into the buyer’s class or the nonbuyer’s class. If the confidence associated with the ANN solution is greater than the threshold, the solution is returned as the final result. If the confidence level is below the threshold, the case base is searched to see if any close match can be found using the indexing system. If one is found, the result is returned. If none are found, the case is added to the atypical part of the case base. The atypical part of the case base (essentially the cases that the system has not seen before) will be used in the ANN training during the next round in offline construction mode.

The authors define a sequence as a collection of states ordered by the time when a state occurs. A state is an n-dimension feature vector where n is a chosen constant. Among the n features are the Web page that the user visits, the time of the visit, and the user’s Internet protocol (IP) address. In the proposed system, the length of the sequence is also a system-defined constant, p. Thus, for each user, we would have a state sequence of length p and each state is an n-dimension vector. This results in a matrix of n by p. The input sequence x thus can be modeled using its associated dynamic covariance matrix (COVx). The distance between an input sequence and a sample (or neuron weights) can be measured by the distance between the two matrices (the authors used Frobenius matrical distance).

The authors presented the test results using real data from an e-commerce Web site. They used 3,000 sequences in the construction mode and another 10,000 sequences in the use mode. Three performance measures were used in the test: recall, precision, and classification rate. The numerical values of the test results are very impressive--generally in the range of 70 to 95 percent--with the classification of nonbuyers having a better result than buyers.

Certain areas of the paper could be improved. For example, the authors could have discussed in more detail how to determine n features in a state. Also, the authors did not discuss how well the system performs in real time in terms of response time. Otherwise, this is a great paper for a general audience who is interested in analyzing user log data, especially in the area of e-commerce.

Reviewer:  Xiannong Meng Review #: CR138920 (1109-0935)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Hybrid Systems (C.1.m ... )
 
 
Clustering (H.3.3 ... )
 
 
Neural Nets (C.1.3 ... )
 
 
User Issues (H.3.7 ... )
 
 
Electronic Commerce (K.4.4 )
 
 
Information Search And Retrieval (H.3.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Hybrid Systems": Date
Introduction to hybrid intelligent networks: modeling, communication, and control
Guan Z., Hu B., Shen X., Springer International Publishing, New York, NY, 2019.  292, Type: Book (978-3-030021-60-3)
Feb 14 2020

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy