Skip to content
huangche edited this page Sep 26, 2015 · 2 revisions

##abstract

A large amount of scientific data needs to be transferred from one site to another as fast as possible in the computational science fields. High-speed data transfer between sites is very important, especially in the Grid computing field; GridFTP has been widely used for bulk data transfer over a wide area network. GridFTP achieves greater performance by supporting parallel TCP streams. Using parallel TCP streams improves the throughput of slow-start algorithms and lossy networks even on a single path. This research proposes a traffic engineering technique that increases the data transfer performance by using multiple paths simultaneously for the parallel TCP streams. For this purpose, we use SoftwareDefined Network (SDN) technology and its implementation, OpenFlow.

##Approach

GridFTP supports a parallel data transfer scheme by using multiple TCP streams on application level to realize high speed transfer between sites, and it was widely used in the field of Grid computing. Figure 1 shows the parallel transfer of the conventional GridFTP. The conventional GridFTP basically takes only a single shortest path even if there are available multiple paths, because multiple TCP streams by GridFTP are routed according to the default IP routing protocol. On the other hand, as shown in Figure 2, our multipath GridFTP distributes the parallel TCP streams of GridFTP into multiple network paths. To control the distribution of the multiple TCP streams, we have designed a OpenFlow controller for multipath GridFTP. In our system, a client requests to the controller to assign multiple available paths in advance of the actual data transfer, then the TCP streams from the client are distributed to different paths by the controller.

 Parallel transfer of the conventional GridFTPFigure 1. Parallel transfer of the conventional GridFTP

 Parallel transfer of the conventional GridFTPFigure 2. Parallel transfer of our proposed multipath GridFTP

##Current Results We deployed multipath GridFTP system over the entire PRAGMA-ENT to evaluate its practicality. Figure 3 shows the overview of the real global-scale experimental environment. The evaluation of Multipath GridFTP system was performed by measuring the actual transfer time and used bandwidth of each TCP stream in the network. For the conventional GridFTP measurement, we used the single shortest path, path1. For our proposed multipath GridFTP, we used only four paths, path1, 2, 3 and 4.

Figure 4 shows shows the average speed of the data transfer by increasing the number of parallel TCP streams. From the results, in the case of using 4 and 8 parallel TCP streams, the average speeds of our proposed system were as not better than the conventional method. This is because our proposed system used path1, 2, 3 and 4 simultaneously, and only a single or two TCP streams were assigned for each path. Therefore, those TCP streams could not overcome the performance degradation of TCP’s slow start mechanism. On the other hand, in the case of the conventional method, all four or eight TCP streams were assigned to a shortest path, path 1, and achieved better performance than our method. However, when we used more than 12 streams, our proposed system performance achieved better performance than the conventional method. The performance using our proposed system has reached approximately 20% better than the conventional method.

 Parallel transfer of the conventional GridFTP Figure 3. Overview of the real global-scale experimental environment

 Parallel transfer of the conventional GridFTP Figure 4. Average data transfer speed of proposed system and conventional method while increasing the number of parallel TCP streams