-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Is your feature request related to a problem? Please describe.
DeepSpeed/ZeRO provides meaningful acceleration and memory reduction for large-scale training. However, timm’s training script currently does not include an official or built-in way to enable DeepSpeed/ZeRO. I noticed there was an earlier attempt in issue #490, but it appears to have been discontinued. I have implemented DeepSpeed/ZeRO support (with a few compromises) in my own training code built on top of the timm training script. If you think this would be a useful addition, I’d be happy to prepare a PR integrating it into the official timm codebase.
Describe the solution you'd like
The current implementation integrates DeepSpeed as an optional dependency and follows timm’s existing training structure. Specifically, it:
- Introduces a set of new CLI flags that match the current argument parsing style while enabling DeepSpeed and passing configuration options, such as:
--deepspeed--ds-zero-stage {0,1,2,3}--ds-offload-optimizer {none,cpu,nvme}--ds-offload-param {none,cpu,nvme}- …
- Uses a small helper function,
build_ds_config, to construct a DeepSpeed config dict/JSON directly from existingtimmarguments (batch size, gradient accumulation, AMP dtype, clipping, etc.). - Wraps the model and parameters with
deepspeed.initializeonly when--deepspeedis enabled, keeping the non-DeepSpeed training path completely unchanged. This ensures full backward compatibility with existing scripts while providing users an opt-in path for ZeRO acceleration.
Additional context
The current handling of model EMA, logging, and checkpointing in the DeepSpeed path is not yet elegant. I can refine these components and better align them with timm’s existing utilities if this feature is accepted. In practice, the acceleration achieved with DeepSpeed/ZeRO has been quite decent in my training experience.