Skip to content

Conversation

@aparna-aketi
Copy link
Contributor

Summary: Llama3 model has RMSNorm and currently, we use functorch to support FGC for RMSNorm. This causes FSDP to rely on the root node for all_gather call of the RMSNorm layers. Adding norm_grad_sample method for RMSNorm to support layer-wise FSDP for this layer.

Differential Revision: D74334633

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74334633

Summary:
Pull Request resolved: meta-pytorch#755

Llama3 model has RMSNorm and currently, we use functorch to support FGC for RMSNorm. This causes FSDP to rely on the root node for all_gather call of the RMSNorm layers. Adding norm_grad_sample method for RMSNorm to support layer-wise FSDP for this layer.

Reviewed By: HuanyuZhang

Differential Revision: D74334633
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74334633

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in dbb5367.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants