This paper reports on a novel technique for scheduling database operations, particularly in an MIMD database machine environment. All database system implementations use a scheduler to manage concurrent execution of transactions. MIMD database machine implementations exploit parallelism even within a single transaction. In such machines, the use of dataflow techniques to schedule operations within a transaction is beneficial. The authors of this paper propose extending dataflow scheduling across transaction boundaries, thus eliminating the transaction scheduler. They claim that such an approach has the potential to obtain better performance than the traditional two-phase locking (2PL) scheme.
The authors introduce two separate dataflow models: transaction flow graphs (TFG), which model intratransaction concurrency and dependency, and database flow graphs (DBFG), which model intertransaction concurrency and dependency. A TFG for a transaction is formulated from the query DAG corresponding to that transaction, and TFGs of concurrent transactions are merged to create a DBFG.
The proposed approach involves three steps: producing a TFG from a query DAG, merging a TFG with an existing DBFG, and finally, updating the DBFG when a transaction terminates. The rules for merging a TFG with an existing DBFG guarantee that any dataflow schedule of operations ensures consistency. The paper provides proofs about the serializability of DBFGs and their deadlock-free conditions.
In addition to describing the basic technique, the paper discusses the implementation aspects of DBFG schedulers and provides simulation results comparing the performance of DBFG scheduling to that of 2PL. The authors use two performance measures: average response time per transaction and the average concurrency overhead per transaction. The simulation results show the same response time. As the number of transactions increases, however, the concurrency control overhead for 2PL becomes almost twice that of DBFG scheduling. In fact, the overhead for DBFG scheduling does not depend on the number of transactions at all.
The overall organization of the paper is logical and easy to follow. The authors have done a good job of explaining all the relevant concepts. Although the proposed technique is an important step forward, I hesitate to recommend it to database practitioners because the results indicate no clear performance advantage over 2PL scheduling. As the authors observe, more work is needed to understand the impact of indexes on the DBFG scheduling. Because indexing plays a major role in database performance, the results presented here will remain inconclusive until such a study is done.