Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Shasta: interactive reporting at scale
Manoharan G., Ellner S., Schnaitter K., Chegu S., Estrella-Balderrama A., Gudmundson S., Gupta A., Handy B., Samwel B., Whipkey C., Aharkava L., Apte H., Gangahar N., Xu J., Venkataraman S., Agrawal D., Ullman J.  SIGMOD 2016 (Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, Jun 26-Jul 1, 2016)1393-1404.2016.Type:Proceedings
Date Reviewed: Nov 30 2016

The evolution of technology is like a slow dance in which most steps are in place, but a few move forward. Motivated by increases of scale and efficiency, applications push the limits of technology and contribute to its advance. This paper presents an example of such an application.

Shasta is a system for interactive reporting of critical business data at Google. Using diverse, large-scale, distributed data, it was developed to satisfy requirements for:

  • complex computations to transform large, complex queries to data store schemas,
  • low-latency queries that capture recent data store updates, and
  • efficient system management of query views.

To satisfy these requirements, Shasta combines new language and system techniques in a four-level architecture stack:

(1) Relational view language (RVL) compiler to translate parameterized user query views to SQL and to automatically aggregate query results;

(2) F1 [Google relational database management system (RDBMS)] engine that generates an execution plan for the generated SQL;

(3) F1 servers and user-defined function (UDF) servers to execute the plan on a central server or distributed servers; and

(4) Distributed, diverse data stores that balance read versus write optimization using a novel caching scheme.

Shasta provides several benefits over the legacy C++ system it replaced. Views expressed in RVL are more understandable to business users and, using view templates, easier to query than the underlying schemas. Furthermore, by encapsulating view definition in RVL and separating it from query processing, software engineering management of Shasta is significantly improved over that of the legacy system. By providing more support for query planning and distributed execution of query plans, Shasta increases performance two to seven times for medium and large queries. With respect to scalability, as input data increases, query latency growth is sublinear, due to distributed query processing and the data characteristics of the Shasta applications (for Shasta applications, query complexity is largely constant across input sizes and query input size ‚Äútends to be determined by view parameters”).

The audience for this paper includes those interested in the application of integrated language and system technologies to improve the usability, performance, and scalability of data-rich Internet-distributed interactive applications. Shasta is an example of an application that pushes the limits of technology and contributes to its evolutionary dance forward.

Reviewer:  J. M. Perry Review #: CR144953 (1702-0151)
Bookmark and Share
  Editor Recommended
Featured Reviewer
Would you recommend this review?
Other reviews under "Query Processing": Date
An optimal algorithm for processing distributed star queries
Chen A., Li V. IEEE Transactions on Software Engineering SE-11(10): 1097-1107, 1985. Type: Article
Apr 1 1986
Principles of transaction-oriented database recovery
Haerder T., Reuter A. ACM Computing Surveys 15(4): 287-317, 1983. Type: Article
Mar 1 1985
Investigating service-oriented system performance: a systematic study
Woodall P., Brereton P., Budgen D. Software--Practice & Experience 37(2): 177-191, 2007. Type: Article
Jan 16 2008

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 2004™
Terms of Use
| Privacy Policy