[fix] Fix MoE workspace info by storing Torch tensor itself instead of data_ptr #5900

jinyangyuan-nvidia · 2025-07-10T04:44:50Z

The GPU memory allocated for MoE workspace may be reallocated for other tensors because the workspace info stores data_ptr instead of Torch tensor itself. This PR fixes this potential issue.

…f data_ptr Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

jinyangyuan-nvidia · 2025-07-10T04:44:59Z

/bot run

tensorrt-cicd · 2025-07-10T04:50:35Z

PR_Github #11504 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-10T06:04:00Z

PR_Github #11504 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8512 completed with status: 'FAILURE'

jinyangyuan-nvidia · 2025-07-10T06:12:22Z

/bot run

tensorrt-cicd · 2025-07-10T06:17:56Z

PR_Github #11510 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-10T07:30:39Z

PR_Github #11510 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8517 completed with status: 'FAILURE'

jinyangyuan-nvidia · 2025-07-10T07:56:12Z

/bot run

tensorrt-cicd · 2025-07-10T08:01:55Z

PR_Github #11529 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-10T11:07:13Z

PR_Github #11529 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8532 completed with status: 'SUCCESS'

…f data_ptr (NVIDIA#5900) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> Signed-off-by: Yuxin <yuxinz@nvidia.com>

[fix] Fix MoE workspace info by storing Torch tensor itself instead o…

5f1e6f8

…f data_ptr Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

jinyangyuan-nvidia requested review from djns99 and wm2012011492 July 10, 2025 04:44

jinyangyuan-nvidia self-assigned this Jul 10, 2025

djns99 approved these changes Jul 10, 2025

View reviewed changes

jinyangyuan-nvidia enabled auto-merge (squash) July 10, 2025 05:12

jinyangyuan-nvidia merged commit 8b9a030 into NVIDIA:main Jul 10, 2025
3 checks passed

jinyangyuan-nvidia deleted the dev/fix_moe_workspace_info branch July 10, 2025 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix] Fix MoE workspace info by storing Torch tensor itself instead of data_ptr #5900

[fix] Fix MoE workspace info by storing Torch tensor itself instead of data_ptr #5900

Uh oh!

jinyangyuan-nvidia commented Jul 10, 2025

Uh oh!

jinyangyuan-nvidia commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

jinyangyuan-nvidia commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

jinyangyuan-nvidia commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[fix] Fix MoE workspace info by storing Torch tensor itself instead of data_ptr #5900

[fix] Fix MoE workspace info by storing Torch tensor itself instead of data_ptr #5900

Uh oh!

Conversation

jinyangyuan-nvidia commented Jul 10, 2025

Uh oh!

jinyangyuan-nvidia commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

jinyangyuan-nvidia commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

jinyangyuan-nvidia commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants