Neural network training requires a large amount of computation and thus GPUs are often used for the acceleration. While they improve the performance, GPUs are underutilized during the training. This paper proposes out-of-order (ooo) back-prop, an effective scheduling technique for neural network training. By exploiting the dependencies of gradient computations, ooo backprop enables to reorder their executions to make the most of the GPU resources. We show that the GPU utilization in single- and multi-GPU training can be commonly improved by applying ooo backprop and prioritizing critical operations. We propose three scheduling algorithms based on ooo backprop. For single-GPU training, we schedule with multi-stream ooo computation to mask the kernel launch overhead. In data-parallel training, we reorder the gradient computations to maximize the overlapping of computation and parameter communication; in pipeline-parallel training, we prioritize critical gradient computations to reduce the pipeline stalls. We evaluate our optimizations with twelve neural networks and five public datasets. Compared to the respective state of the art training systems, our algorithms improve the training throughput by 1.03--1.58× for single-GPU training, by 1.10--1.27× for data-parallel training, and by 1.41--1.99× for pipeline-parallel training.
Link to the paper: https://dl.acm.org/doi/abs/10.1145/3492321.3519563
Please email for a
Jiwon Seo is an assistant professor at the department of computer science in Hanyang university, Korea.
He received his PhD in electrical engineering from Stanford in 2015. His research interests include graph processing systems and machine learning systems.