Loading…
Back To Schedule
Tuesday, September 15 • 1:55pm - 2:25pm
HAL: Computer System for Scalable Deep Learning - Volodymyr Kindratenko, University of Illinois at Urbana-Champaign

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

This presentation will describe the design, deployment and operation of a computer system built to efficiently run deep learning frameworks. The system consists of 16 IBM POWER9 servers with 4 NVIDIA V100 GPUs each, interconnected with Mellanox EDR InfiniBand fabric, and a DDN all-flash storage array. This system is tailored towards efficient execution of the IBM Watson Machine Learning enterprise software stack that combines popular open-source deep learning frameworks. We build a custom management software stack to enable an efficient use of the system by a diverse community of users and provide guides and recipes for running deep learning workloads at scale utilizing all available GPUs. We demonstrate scaling of a PyTorch and TensorFlow based deep neural networks to produce state-of-the-art performance results.

Continue the conversation in Slack

Speakers
VK

Volodymyr Kindratenko

Senior Research Scientist, University of Illinois at Urbana-Champaign
Dr. Volodymyr Kindratenko is a Senior research Scientist at the National Center for Supercomputing Applications, an Adjunct Associate Professor in the Department of Electrical and Computer Engineering and a Research Associate Professor in the Department of Computer Science at the... Read More →


Tuesday September 15, 2020 1:55pm - 2:25pm CDT
Track 4