A distributed system consists of many individual and independent nodes interacting with one another using a standardized set of protocols. Since the scale of nodes is often thousands or even millions, it is very hard, if not impossible, to detect if any one node is malfunctioning. A node can malfunction for several reasons, for example, errors in the protocol design, bugs in code, or being invaded by a malicious user.
This paper proposes a framework called Gatling to automatically find “performance attacks caused by insider attackers in large-scale message-passing distributed systems.” By performance attacks, it means that the malicious nodes will send or create messages “with the goal of degrading system performance.” It is understandable that an exhaustive search for malicious nodes is not possible. Gatling identifies a representative set of basic malicious message delivery and lying actions and designs a greedy search algorithm that finds effective attacks consisting of a subset of these actions. The system has been tested on nine distributed systems and the results are promising.
However, distributed systems, though not black boxes, are hard to test and visualize. The algorithms proposed in this paper are also heuristics. They may have worked on the test cases, but there is no guarantee that they will work on your systems. Therefore, use Gatling with this in mind and take your own risks. Finally, a 34-page paper seems a bit too long; it could use a little brevity.