Name: HAL: Computer System for Scalable Deep Learning - Volodymyr Kindratenko, University of Illinois at Urbana-Champaign
Start: 2020-09-15T13:55:00-0500
End: 2020-09-15T14:25:00-0500

View More Details & Registration

Back To Schedule

HAL: Computer System for Scalable Deep Learning - Volodymyr Kindratenko, University of Illinois at Urbana-Champaign

Feedback form is now closed.

Open Zoom

This presentation will describe the design, deployment and operation of a computer system built to efficiently run deep learning frameworks. The system consists of 16 IBM POWER9 servers with 4 NVIDIA V100 GPUs each, interconnected with Mellanox EDR InfiniBand fabric, and a DDN all-flash storage array. This system is tailored towards efficient execution of the IBM Watson Machine Learning enterprise software stack that combines popular open-source deep learning frameworks. We build a custom management software stack to enable an efficient use of the system by a diverse community of users and provide guides and recipes for running deep learning workloads at scale utilizing all available GPUs. We demonstrate scaling of a PyTorch and TensorFlow based deep neural networks to produce state-of-the-art performance results.

Continue the conversation in Slack

Speakers

Volodymyr Kindratenko

Senior Research Scientist, University of Illinois at Urbana-Champaign

Dr. Volodymyr Kindratenko is a Senior research Scientist at the National Center for Supercomputing Applications, an Adjunct Associate Professor in the Department of Electrical and Computer Engineering and a Research Associate Professor in the Department of Computer Science at the... Read More →

Tuesday September 15, 2020 1:55pm - 2:25pm CDT
Track 4

Use Case AI

OpenPOWER Summit North America 2020

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Volodymyr Kindratenko