Skip to content

[FEATURE] Add DeepSpeed / ZeRO support in training script #2623

@Tianyi-Franklin-Wang

Description

@Tianyi-Franklin-Wang

Is your feature request related to a problem? Please describe.
DeepSpeed/ZeRO provides meaningful acceleration and memory reduction for large-scale training. However, timm’s training script currently does not include an official or built-in way to enable DeepSpeed/ZeRO. I noticed there was an earlier attempt in issue #490, but it appears to have been discontinued. I have implemented DeepSpeed/ZeRO support (with a few compromises) in my own training code built on top of the timm training script. If you think this would be a useful addition, I’d be happy to prepare a PR integrating it into the official timm codebase.

Describe the solution you'd like
The current implementation integrates DeepSpeed as an optional dependency and follows timm’s existing training structure. Specifically, it:

  • Introduces a set of new CLI flags that match the current argument parsing style while enabling DeepSpeed and passing configuration options, such as:
    • --deepspeed
    • --ds-zero-stage {0,1,2,3}
    • --ds-offload-optimizer {none,cpu,nvme}
    • --ds-offload-param {none,cpu,nvme}
  • Uses a small helper function, build_ds_config, to construct a DeepSpeed config dict/JSON directly from existing timm arguments (batch size, gradient accumulation, AMP dtype, clipping, etc.).
  • Wraps the model and parameters with deepspeed.initialize only when --deepspeed is enabled, keeping the non-DeepSpeed training path completely unchanged. This ensures full backward compatibility with existing scripts while providing users an opt-in path for ZeRO acceleration.

Additional context
The current handling of model EMA, logging, and checkpointing in the DeepSpeed path is not yet elegant. I can refine these components and better align them with timm’s existing utilities if this feature is accepted. In practice, the acceleration achieved with DeepSpeed/ZeRO has been quite decent in my training experience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions