You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Installation Issues with sglang==0.4.6post1 on GH200 (ARM/NGC PyTorch)
I am attempting to install a modified version of sglang==0.4.6post1 from eric-ai-lab/Soft-Thinking on a server node equipped with 4x NVIDIA GH200 GPUs (with ARM architecture CPU).
For optimal performance, I am working within the recommended NVIDIA NGC PyTorch 25.06 container (nvcr.io/nvidia/pytorch:25.06-py3). This environment, with its pre-compiled CUDA/PyTorch stack, is causing several dependency and build failures when attempting to install the package from source.
Problems
1. Failure to Build sgl-kernel==0.1.0 from Source
The required dependency sgl-kernel==0.1.0 lacks a pre-compiled wheel for the aarch64 (ARM) architecture. Attempting to build it from source using make build consistently fails. The build fails almost on every files, producing an a lot of error messages. The process eventually kills the compute node. I attempted to limit parallelism using MAX_JOBS=2, but the issue persists.
2. flashinfer-python Dependency Conflict
The dependency chain requires flashinfer-python, which, in turn, requires nvidia-cudnn-frontend>=1.13.0. The NGC container is pre-installed with an older version: nvidia-cudnn-frontend==1.12.0. I am unable to upgrade or reinstall a newer version of this package due to the constraints of the controlled NGC environment.
Questions
I would greatly appreciate any insights or assistance on the following points:
NGC/GH200 Installation Experience: Has anyone successfully installed sglang (or a similar high-performance kernel library) from source within a NVIDIA NGC PyTorch container on an ARM-based GH200 node? I'm specifically looking for environmental configuration tips or known workarounds for this setup.
sgl-kernel Version Compatibility: Since sgl-kernel>=0.3.12 appears to offer official aarch64 wheels, would it be possible to force the use of a newer version (e.g., sgl-kernel>=0.3.12) with the modified sglang==0.4.6post1? What modifications would be needed in the sglang source to support a newer sgl-kernel API?
flashinfer-python Conflict Resolution: Is there a known way to install flashinfer-python while bypassing or resolving the hard dependency check on nvidia-cudnn-frontend>=1.13.0? Alternatively, is there a compatible version of flashinfer-python that works with the pre-installed 1.12.0 version in the container?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Installation Issues with
sglang==0.4.6post1on GH200 (ARM/NGC PyTorch)I am attempting to install a modified version of
sglang==0.4.6post1from eric-ai-lab/Soft-Thinking on a server node equipped with 4x NVIDIA GH200 GPUs (with ARM architecture CPU).For optimal performance, I am working within the recommended NVIDIA NGC PyTorch 25.06 container (
nvcr.io/nvidia/pytorch:25.06-py3). This environment, with its pre-compiled CUDA/PyTorch stack, is causing several dependency and build failures when attempting to install the package from source.Problems
1. Failure to Build
sgl-kernel==0.1.0from SourceThe required dependency
sgl-kernel==0.1.0lacks a pre-compiled wheel for theaarch64(ARM) architecture. Attempting to build it from source usingmake buildconsistently fails. The build fails almost on every files, producing an a lot of error messages. The process eventually kills the compute node. I attempted to limit parallelism usingMAX_JOBS=2, but the issue persists.2.
flashinfer-pythonDependency ConflictThe dependency chain requires
flashinfer-python, which, in turn, requiresnvidia-cudnn-frontend>=1.13.0. The NGC container is pre-installed with an older version:nvidia-cudnn-frontend==1.12.0. I am unable to upgrade or reinstall a newer version of this package due to the constraints of the controlled NGC environment.Questions
I would greatly appreciate any insights or assistance on the following points:
sglang(or a similar high-performance kernel library) from source within a NVIDIA NGC PyTorch container on an ARM-based GH200 node? I'm specifically looking for environmental configuration tips or known workarounds for this setup.sgl-kernelVersion Compatibility: Sincesgl-kernel>=0.3.12appears to offer officialaarch64wheels, would it be possible to force the use of a newer version (e.g.,sgl-kernel>=0.3.12) with the modifiedsglang==0.4.6post1? What modifications would be needed in thesglangsource to support a newersgl-kernelAPI?flashinfer-pythonConflict Resolution: Is there a known way to installflashinfer-pythonwhile bypassing or resolving the hard dependency check onnvidia-cudnn-frontend>=1.13.0? Alternatively, is there a compatible version offlashinfer-pythonthat works with the pre-installed1.12.0version in the container?Thank you for your help in advance!
Reproduction step
cd sglang_soft_thinking_pkg/sgl-kernelmake buildpackages preinstalled in NGC Container
Beta Was this translation helpful? Give feedback.
All reactions