Microsoft DeepSpeed updates
Microsoft released the second version of DeepSpeed and its Zero Redundancy Optimizer (ZeRO-2, see DT #34). These improvements enable training models that are an order of magnitude larger and faster than previously possible: up to 170 billion parameters, at up to 10x previous state-of-the-art speeds. It’s open-source on GitHub at microsoft/DeepSpeed.