Add Apple Silicon (MPS) support #264

provos · 2025-11-30T02:54:02Z

Adds MPS device support for both image and video predictors on Apple Silicon.

Changes:

Add get_default_device() utility that detects MPS availability
Fix device mismatches (coords cache, freqs_cis cache)
Add MPS workaround for complex tensor repeat() in RoPE
Make torch._assert_async conditional on CUDA
Fix MPS memory leak in video predictor via synchronization points

Performance of the Video predictor:

~3x faster than CPU
Runs with ~38GB peak memory. This is due to the way that MPS caches graphs. Before adding the synchronization points, running the video predictor would consume all available memory.

this has prebuilt wheels for apple silicon. bump numpy from 1.26 to 1.26.4 to meet dependency requirements for decord2

Allows systems without CUDA to fallback to CPU.

The pin_memory() optimization is only available for CUDA backends.

…nd PostProcessAPIVideo

CUDA handles this internally but we need to handle it directly for CPU

…or cpu introduce workarounds for torch operations not available on mps like repeats of complex tensors

forcefully flush pending operations with synchronize and empty the cache.

provos added 9 commits November 28, 2025 14:18

chore: change to decord2 as plugin replacement for decord

3c30532

this has prebuilt wheels for apple silicon. bump numpy from 1.26 to 1.26.4 to meet dependency requirements for decord2

fix: update numpy dependency to allow for newer versions

9b4b199

feat: add device management for CUDA and CPU support across models

351ce7b

Allows systems without CUDA to fallback to CPU.

feat: implement tensor_to_device utility for efficient tensor transfers

51bc2f6

The pin_memory() optimization is only available for CUDA backends.

fix: handle CPU fallback for mask interpolation in PostProcessImage a…

52a29b6

…nd PostProcessAPIVideo

fix: handle empty batch case

8893f05

CUDA handles this internally but we need to handle it directly for CPU

feat: implement get_default_device utility to pick between cuda, mps …

5b1b686

…or cpu introduce workarounds for torch operations not available on mps like repeats of complex tensors

fix: prevent excessive memory leakage on mps

9f6ddae

forcefully flush pending operations with synchronize and empty the cache.

Merge remote-tracking branch 'origin/main' into apple-silicon-support-v2

28574df

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Apple Silicon (MPS) support #264

Add Apple Silicon (MPS) support #264

Uh oh!

provos commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Apple Silicon (MPS) support #264

Are you sure you want to change the base?

Add Apple Silicon (MPS) support #264

Uh oh!

Conversation

provos commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant