Skip to content

Tags: deepspeedai/DeepSpeed

Tags

v0.18.2

Toggle v0.18.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
README refresh (#7668)

Long overdue

---------

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

v0.18.1

Toggle v0.18.1's commit message
Update version.txt in advance of next release

v0.18.0

Toggle v0.18.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update email address (#7624)

Update contact address

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>

v0.17.6

Toggle v0.17.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[bugfix] fix partition context unpatch (#7566)

## Fix asymmetric patching/unpatching in
InsertPostInitMethodToModuleSubClasses

### Problem Description

The `InsertPostInitMethodToModuleSubClasses` context manager patches
`__init__` methods of model classes during entry and unpatches them
during exit.

However, asymmetric condition checks between patching and unpatching can
introduce subtle inheritance bugs.

### Root Cause Analysis

The issue occurs with classes that have multiple inheritance where:
1. **Child class A** does not override `__init__`
2. **Parent class B** does not inherit from `nn.Module`
3. **Parent class C** inherits from `nn.Module`

**Current asymmetric logic:**
```python
# Patching (entry): Only patch classes with explicit __init__
def _enable_class(cls):
    if '__init__' in cls.__dict__:  # ✅ Strict check
        cls._old_init = cls.__init__
        cls.__init__ = partition_after(cls.__init__)

# Unpatching (exit): Restore any class with _old_init
def _disable_class(cls):
    if hasattr(cls, '_old_init'):  # ❌ Permissive check
        cls.__init__ = cls._old_init
```

**Execution flow:**
1. **During entry**: Child A is skipped (no explicit `__init__`), Parent
C is patched
2. **During exit**: Child A inherits `_old_init` from Parent C and gets
incorrectly "restored"

**Result**: Child A's `__init__` points to Parent C's original
`__init__`, bypassing Parent B and breaking the inheritance chain.

### Reproduction Case

This pattern is common in Hugging Face models:
```python
class Qwen3ForSequenceClassification(GenericForSequenceClassification, Qwen3PreTrainedModel):
    pass  # No explicit __init__

# GenericForSequenceClassification - not a nn.Module subclass
# Qwen3PreTrainedModel - inherits from nn.Module
```

### Solution

Apply symmetric condition checking in both patch and unpatch operations:

```python
def _disable_class(cls):
    # Match the patching condition: only restore classes we explicitly patched
    if '__init__' in cls.__dict__ and hasattr(cls, '_old_init'):
        cls.__init__ = cls._old_init
        delattr(cls, '_old_init')  # Optional cleanup
```

This ensures that only classes that were explicitly patched during entry
get restored during exit.

### Testing

The fix has been validated against the Qwen3ForSequenceClassification
reproduction case and resolves the inheritance chain corruption.

### Related Issues
- External issue: modelscope/ms-swift#5820

Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>

v0.17.5

Toggle v0.17.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add index to HPU devices (#7497)

The [PR #7266](#7266)
enforces the devices having explicit device indices (i.e. 'hpu:0',
'cuda:0', etc).

This PR aligns HPU devices to the requirement.

Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>

v0.17.4

Toggle v0.17.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
`TiledFusedLogitsLoss` bug fix (#7459)

bug fix - mixed up tuple and list.

v0.17.3

Toggle v0.17.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix: Adapt Llama injection policy for newer transformers versions (#7443

)

This PR fixes an `AttributeError` that occurs during
`deepspeed.init_inference` when using kernel injection
(`replace_with_kernel_inject=True`) with Llama models from recent
versions of `transformers`.

**The Bug:**

In newer `transformers` versions (e.g., `4.53.3`), configurations like
`num_heads` and `rope_theta` were moved from direct attributes of the
`LlamaAttention` module into a nested `config` object.

The current DeepSpeed injection policy tries to access these attributes
from their old, direct location, causing the initialization to fail with
an `AttributeError: 'LlamaAttention' object has no attribute
'num_heads'`.

**The Solution:**

This change updates the Llama injection logic to be more robust:
1. It first tries to read attributes like `num_heads` from the new
`config` object location.
2. If that fails, it falls back to the legacy direct attribute path.

---------

Signed-off-by: huanyuqu <yc37960@um.edu.mo>

v0.17.2

Toggle v0.17.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: engine initializes optimizer attributes at the beginning (#7410)

As in `destroy`, `self.optimizer` is called, but the error out calling
to `destroy` can happen in `__init__`, even before optimizer and
scheduler is configured. So we need to move `self.optimizer` to the top
to avoid triggering another exception.

e.g.:
```logs
  File "deepspeed/runtime/engine.py", line 453, in _configure_tensor_parallel_states
    assert self.zero_optimization_stage(
AssertionError: Currently, the compatibility between 'autotp' and 'zero_stage = 3' has not been validated
Exception ignored in: <function DeepSpeedEngine.__del__ at 0x1516c0610820>
Traceback (most recent call last):
  File "deepspeed/runtime/engine.py", line 509, in __del__
    self.destroy()
  File "deepspeed/runtime/engine.py", line 512, in destroy
    if self.optimizer is not None and hasattr(self.optimizer, 'destroy'):
  File "deepspeed/runtime/engine.py", line 621, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DeepSpeedEngine' object has no attribute 'optimizer'
```

Signed-off-by: Hollow Man <hollowman@opensuse.org>

v0.17.1

Toggle v0.17.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Move pytest pinning from individual tests to requirements-dev.txt unt…

…il fixed. (#7327)

pytest 8.4.0 seems to break a number of our tests, rather than pinning
in each individually, we should just put this in the requirements file
until we resolve the issue.

---------

Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>

v0.17.0

Toggle v0.17.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Bump to v0.17.0 (#7324)

Co-authored-by: Logan Adams <loadams@microsoft.com>