GridFTP has been widely used for bulk data transfers over high-bandwidth wide area networks (WANs). Recent versions have an option to use extended-block mode, so that data can be transmitted in blocks that may arrive out of order at their destination; another option allows users to specify how many parallel connections can be used with this mode. The challenge, of course, is to pick values for these options that will maximize data throughput. Yang et al. try to address this by analyzing performance characteristics for both WAN and local area network (LAN) data transfers, using stream mode and extended-block mode, with varying degrees of parallelism.
There are a couple of paragraphs that detail the computation of a Hurst (self-similarity) parameter, and tables are given that show values for this parameter and average throughput, under various scenarios.
For WAN transfers, the conventional file transfer protocol gives similar throughputs. Using four parallel connections provides a 20 percent improvement; using 16 parallel connections more than doubles the speed. This came as a surprise to me, as other literature suggests that using more than four parallel connections may not significantly increase throughput.
In a LAN environment, GridFTP transfers of all types are about 18 percent faster than conventional transfers; my guess is that there was only a small amount of other traffic at the time of testing. In my own WAN environment, I have discovered that using the user datagram protocol (UDP)-based data transfer (UDT) option available with recent versions of GridFTP can provide significant increases in traffic throughput; I wonder why the authors did not explore this capability.
In any event, the results shown in this paper provide interesting reading; those whose work involves bulk data transfers will find it valuable.