The authors present an analysis of five days of workload data from a large Web-based shopping system. Their purpose is to investigate the issues affecting the performance and scalability of such a system. The Web-based shopping system under study has a multi-tier architecture typical of e-commerce sites, including Web servers, application servers, database servers, and an assortment of load-balancing and firewall appliances. This architecture is described in section 3 of the paper. The authors then discuss the sources of their measurement data at the various levels.
In section 5, the results of the (HTTP-level) workload characterization are presented, including the distribution of requests by resource type, site usage during measurement periods, resource referencing patterns, and client request behaviors. In section 6, the authors characterize classes of requests based on the impact of their performance on the system. The impact of these request classes on system scalability is also discussed. The authors identify three classes of requests, with different resource demands: cacheable, non-cacheable, and search. The section also includes an investigation of the sensitivity of system scalability to request class mix and request cache hit rate. While section 6 contains an analysis of individual requests, section 7 presents a session-level characterization of the system under study. Issues that pertain to the two kinds of sources that make use of the system, namely users and robots, and session-level characteristics for two different time periods are discussed. Clustering techniques are used to categorize individual sessions (both user and robot), based on their performance impact, to support the evaluation of system scalability. The last section covers performance and scalability issues.
Although the part on capacity planning and scalability could have been more extensive, the paper represents a good workload characterization of a real Web-based shopping system.