The increasing modeling capacities of large DNNs (e.g., Transformer and GPT-3) have achieved unprecedented successes in various AI areas, including understanding vision and natural languages. The high modeling power a large DNN mainly stems from its increasing complexity (having more neuron layers and more neuron operators in each layer) and dynamicity (frequently activating/deactivating neuron operators in each layer during training, such as Neural Architecture Search, or NAS). Such complexity and dynamicity can easily make a large DNN exceed the computing and memory capacities of a modern GPU, so training a large DNN often often needs to split the DNN into many GPUs via multiple dimensions, including data parallelism, tensor parallelism, and pipeline parallelism. Dr. Cui’s talk will present his two recent papers, [vPipe TPDS 2021] and [NASPipe ASPLOS 2022], which address major limitations in existing multi-dimensional parallel training systems, including GPipe, Pipedream, and Megatron. vPipe focuses on addressing the severe load imbalance and low GPU computing utilization (e.g., merely 20% in some latest advanced systems); NASPipe will present Supernet parallelism, a new parallel training dimension for highly dynamic large DNNs designed in the Supernet and NAS manners (e.g., Evolved Transformer).
Please email for a
Zoom link