Microsoft's DeepSpeed update

DT #49 — September 27, 2020

Microsoft has updated DeepSpeed, its open-source library for efficiently training massive ML models (see DT #34, #40), with four big improvements: 3D parallelism for training trillion-parameter models; ZeRO-Offload for 10x bigger model training on a single GPU; Sparse Attention kernels for 10x longer input sequences in Transformers; and 1-bit Adam for reducing network load in multi-GPU training. My work focuses on tiny models rather than large ones, so I haven’t gotten a chance to try DeepSpeed, but if any of you have, I’d love to hear about your experience!

ML Research

This section of Dynamically Typed covers recent models, datasets, and tools for machine learning research.

Join 325+ others and subscribe to get DT in your inbox every second Sunday — 76 issues and counting!

Or check out recent DT issues first:

DT #76: Dynamically Typed Hiatus

DT #75: OpenAI's book summaries for the alignment problem, Translatotron 2, and AI-generated movie posters

DT #74: Apple's privacy-focused facial recognition, DeepMind's multimodal Perceiver IO, and sea ice forecasting with IceNet