Move generator state APIs to ATen #49589

lqf96 · 2020-12-18T04:29:32Z

Rationale

While most of the torch.Generator properties and methods are implemented as a thin wrapper of the corresponding at::Generator methods, torch.Generator.get_state() and torch.Generator.set_state() are implemented in legacy Torch code and are not dispatched through the c10::GeneratorImpl interface. This is not structured well and makes implementing generators for new backends (e.g. XLAGeneratorImpl for the XLA backend) inconvenient. As such, this pull request seeks to move these generator state APIs to c10 and ATen.

What is being refactored?

Interfaces
- Added c10::GeneratorImpl::set_state and c10::GeneratorImpl::state for getting and setting the internal state of a random number generator.
- at::Generator::set_state and at::Generator::state wraps the above-mentioned APIs, as it's basically a PIMPL.
- Added helper function at::detail::check_rng_state for checking the validity of new RNG state tensor.
CPU Generator
- Renamed and moved THTensor_(setRNGState) and THTensor_(getRNGState) to CPUGeneratorImpl::set_state and CPUGenerator::state.
- Renamed and moved THGeneratorState and THGeneratorStateNew to CPUGeneratorStateLegacy and CPUGeneratorState.
CUDA Generator
- Renamed and moved THCRandom_setRNGState and THCRandom_getRNGState to CUDAGeneratorImpl::set_state and CUDAGeneratorImpl::state.
PyTorch Bindings
- THPGenerator_setState and THPGenerator_getState now simply forward to at::Generator::set_state and at::Generator::state.

facebook-github-bot · 2020-12-18T04:29:46Z

💊 CI failures summary and remediations

As of commit 93decdd (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 06 11:20:48 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future

Jan 06 11:20:48 At:
Jan 06 11:20:48   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 06 11:20:48   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 06 11:20:48 
Jan 06 11:20:48 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Jan 06 11:20:48 
Jan 06 11:20:48 At:
Jan 06 11:20:48   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 06 11:20:48   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 06 11:20:48 
Jan 06 11:20:48 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Jan 06 11:20:48 
Jan 06 11:20:48 At:
Jan 06 11:20:48   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 06 11:20:48   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 06 11:20:48 
Jan 06 11:20:48 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown)
Jan 06 11:20:48 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown)
Jan 06 11:20:48 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown)
Jan 06 11:20:49 ok (2.767s)
Jan 06 11:20:51   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown)

1 job timed out:

pytorch_linux_bionic_py3_8_gcc9_coverage_test1

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 270 times.

lqf96 · 2020-12-19T19:15:03Z

I guess this is ready for review now. Since I'm not familiar with the C++ side of the project, maybe @ezyang or someone can take a look at this?

codecov · 2020-12-19T22:06:35Z

Codecov Report

Merging #49589 (7293324) into master (70734f1) will increase coverage by 47.81%.
The diff coverage is 86.36%.

@@             Coverage Diff             @@
##           master   #49589       +/-   ##
===========================================
+ Coverage   32.88%   80.69%   +47.81%     
===========================================
  Files         511     1891     +1380     
  Lines       68918   204977   +136059     
===========================================
+ Hits        22665   165416   +142751     
+ Misses      46253    39561     -6692

ezyang · 2021-01-04T19:13:51Z

@pbelevich let me know if you need extra eyeballs on this PR

pbelevich · 2021-01-05T14:33:45Z

@pbelevich let me know if you need extra eyeballs on this PR

Will review it and other prs today, sorry for late review

facebook-github-bot

@pbelevich has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

lqf96 · 2021-01-05T19:40:13Z

@pbelevich I revisited the PR today and figured out although the implementation is correct, there's still a few places where the code quality can be improved:

Should we name the new APIs as state / set_state or rng_state / set_rng_state? The latter sounds clearer and matches the legacy APIs names (THTensor_(get|set)RNGState / THCRandom_(get|set)RNGState) better.
c10::GeneratorImpl::state (or rng_state) should accept const c10::TensorImpl& instead of c10::TensorImpl&. Similarly, at::Generator::state (or rng_state) should accept const Tensor&.

What's your opinions on these? Since the PR is already imported to FB's internal systems, should I update this PR or open a new one?

pbelevich

@lqf96 thanks a lot for this work!

Should we name the new APIs as state / set_state or rng_state / set_rng_state? The latter sounds clearer and matches the legacy APIs names (THTensor_(get|set)RNGState / THCRandom_(get|set)RNGState) better.

I think we should try to match Python API names instead of legacy names to make C++ API and Python API look similar. So, let's rename state to set_state and keep get_state as is.

2. c10::GeneratorImpl::state (or rng_state) should accept const c10::TensorImpl& instead of c10::TensorImpl&. Similarly, at::Generator::state (or rng_state) should accept const Tensor&.

Agree, let's make them const

Since the PR is already imported to FB's internal systems, should I update this PR or open a new one?

Amend this PR, it's easy to reimport it again to FB internal code

aten/src/ATen/core/Generator.cpp

aten/src/ATen/CPUGeneratorImpl.cpp

aten/src/ATen/cuda/CUDAGeneratorImpl.cpp

aten/src/ATen/CPUGeneratorImpl.cpp

pbelevich · 2021-01-06T01:06:22Z

TODO(for pbelevich): amend this PR with meta-pytorch/csprng#106 after reimport to fbcode

lqf96 · 2021-01-06T08:12:40Z

@pbelevich I applied your suggestions and this PR should be ready for review again.

facebook-github-bot

@pbelevich has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-07T02:26:27Z

This pull request has been merged in 876dfbd.

Summary: ## Rationale While most of the `torch.Generator` properties and methods are implemented as a thin wrapper of the corresponding `at::Generator` methods, `torch.Generator.get_state()` and `torch.Generator.set_state()` are implemented in legacy Torch code and are not dispatched through the `c10::GeneratorImpl` interface. This is not structured well and makes implementing generators for new backends (e.g. `XLAGeneratorImpl` for the XLA backend) inconvenient. As such, this pull request seeks to move these generator state APIs to c10 and ATen. ## What is being refactored? * Interfaces - Added `c10::GeneratorImpl::set_state` and `c10::GeneratorImpl::state` for getting and setting the internal state of a random number generator. - `at::Generator::set_state` and `at::Generator::state` wraps the above-mentioned APIs, as it's basically a PIMPL. - Added helper function `at::detail::check_rng_state` for checking the validity of new RNG state tensor. * CPU Generator - Renamed and moved `THTensor_(setRNGState)` and `THTensor_(getRNGState)` to `CPUGeneratorImpl::set_state` and `CPUGenerator::state`. - Renamed and moved `THGeneratorState` and `THGeneratorStateNew` to `CPUGeneratorStateLegacy` and `CPUGeneratorState`. * CUDA Generator - Renamed and moved `THCRandom_setRNGState` and `THCRandom_getRNGState` to `CUDAGeneratorImpl::set_state` and `CUDAGeneratorImpl::state`. * PyTorch Bindings - `THPGenerator_setState` and `THPGenerator_getState` now simply forward to `at::Generator::set_state` and `at::Generator::state`. Pull Request resolved: #49589 Reviewed By: H-Huang Differential Revision: D25785774 Pulled By: pbelevich fbshipit-source-id: 8ed79209c4ffb1a0ae8b19952ac8871ac9e0255f

Summary: ## Rationale While most of the `torch.Generator` properties and methods are implemented as a thin wrapper of the corresponding `at::Generator` methods, `torch.Generator.get_state()` and `torch.Generator.set_state()` are implemented in legacy Torch code and are not dispatched through the `c10::GeneratorImpl` interface. This is not structured well and makes implementing generators for new backends (e.g. `XLAGeneratorImpl` for the XLA backend) inconvenient. As such, this pull request seeks to move these generator state APIs to c10 and ATen. ## What is being refactored? * Interfaces - Added `c10::GeneratorImpl::set_state` and `c10::GeneratorImpl::state` for getting and setting the internal state of a random number generator. - `at::Generator::set_state` and `at::Generator::state` wraps the above-mentioned APIs, as it's basically a PIMPL. - Added helper function `at::detail::check_rng_state` for checking the validity of new RNG state tensor. * CPU Generator - Renamed and moved `THTensor_(setRNGState)` and `THTensor_(getRNGState)` to `CPUGeneratorImpl::set_state` and `CPUGenerator::state`. - Renamed and moved `THGeneratorState` and `THGeneratorStateNew` to `CPUGeneratorStateLegacy` and `CPUGeneratorState`. * CUDA Generator - Renamed and moved `THCRandom_setRNGState` and `THCRandom_getRNGState` to `CUDAGeneratorImpl::set_state` and `CUDAGeneratorImpl::state`. * PyTorch Bindings - `THPGenerator_setState` and `THPGenerator_getState` now simply forward to `at::Generator::set_state` and `at::Generator::state`. Pull Request resolved: pytorch#49589 Reviewed By: H-Huang Differential Revision: D25785774 Pulled By: pbelevich fbshipit-source-id: 8ed79209c4ffb1a0ae8b19952ac8871ac9e0255f

Summary: ## Rationale While most of the `torch.Generator` properties and methods are implemented as a thin wrapper of the corresponding `at::Generator` methods, `torch.Generator.get_state()` and `torch.Generator.set_state()` are implemented in legacy Torch code and are not dispatched through the `c10::GeneratorImpl` interface. This is not structured well and makes implementing generators for new backends (e.g. `XLAGeneratorImpl` for the XLA backend) inconvenient. As such, this pull request seeks to move these generator state APIs to c10 and ATen. ## What is being refactored? * Interfaces - Added `c10::GeneratorImpl::set_state` and `c10::GeneratorImpl::state` for getting and setting the internal state of a random number generator. - `at::Generator::set_state` and `at::Generator::state` wraps the above-mentioned APIs, as it's basically a PIMPL. - Added helper function `at::detail::check_rng_state` for checking the validity of new RNG state tensor. * CPU Generator - Renamed and moved `THTensor_(setRNGState)` and `THTensor_(getRNGState)` to `CPUGeneratorImpl::set_state` and `CPUGenerator::state`. - Renamed and moved `THGeneratorState` and `THGeneratorStateNew` to `CPUGeneratorStateLegacy` and `CPUGeneratorState`. * CUDA Generator - Renamed and moved `THCRandom_setRNGState` and `THCRandom_getRNGState` to `CUDAGeneratorImpl::set_state` and `CUDAGeneratorImpl::state`. * PyTorch Bindings - `THPGenerator_setState` and `THPGenerator_getState` now simply forward to `at::Generator::set_state` and `at::Generator::state`. Pull Request resolved: pytorch/pytorch#49589 Reviewed By: H-Huang Differential Revision: D25785774 Pulled By: pbelevich fbshipit-source-id: 8ed79209c4ffb1a0ae8b19952ac8871ac9e0255f

facebook-github-bot added the cla signed label Dec 18, 2020

pytorchbot added the open source label Dec 18, 2020

lqf96 force-pushed the aten-generator-state branch 3 times, most recently from 34b5294 to f7df038 Compare December 19, 2020 19:13

lqf96 changed the title ~~[WIP] Move generator state APIs to ATen~~ Move generator state APIs to ATen Dec 19, 2020

ngimel requested a review from pbelevich December 21, 2020 18:37

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 21, 2020

lqf96 force-pushed the aten-generator-state branch from c0a9f7b to 7293324 Compare December 22, 2020 11:02

This was referenced Dec 23, 2020

Refactor torch.Generator bindings to use pybind11 #49771

Closed

Pickling support for torch.Generator #49840

Closed

pbelevich mentioned this pull request Jan 5, 2021

Add set_state/state as required by pytorch/49589 meta-pytorch/csprng#106

Closed

facebook-github-bot reviewed Jan 5, 2021

View reviewed changes

pbelevich suggested changes Jan 6, 2021

View reviewed changes

lqf96 added 3 commits January 5, 2021 18:26

Move generator state APIs to ATen

2325426

Fix typos and leftovers in comment

ed5a350

Check validity of tensor in at::Generator::set_state

97495f8

lqf96 force-pushed the aten-generator-state branch from 7293324 to 8df4d29 Compare January 6, 2021 02:27

lqf96 added 3 commits January 5, 2021 23:28

Amend name and signature of new APIs

79759a0

Address naming and wording suggestions

b97cd10

Cast size when calling empty_cpu

93decdd

lqf96 force-pushed the aten-generator-state branch from 0520138 to 93decdd Compare January 6, 2021 08:11

facebook-github-bot reviewed Jan 6, 2021

View reviewed changes

pbelevich approved these changes Jan 6, 2021

View reviewed changes

facebook-github-bot closed this in meta-pytorch/csprng@876dfbd Jan 7, 2021

facebook-github-bot added the Merged label Jan 7, 2021

lqf96 deleted the aten-generator-state branch January 7, 2021 04:07

Move generator state APIs to ATen #49589

Move generator state APIs to ATen #49589

Uh oh!

Conversation

lqf96 commented Dec 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

What is being refactored?

Uh oh!

facebook-github-bot commented Dec 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (1/1)

Uh oh!

lqf96 commented Dec 19, 2020

Uh oh!

codecov bot commented Dec 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ezyang commented Jan 4, 2021

Uh oh!

pbelevich commented Jan 5, 2021

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

lqf96 commented Jan 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbelevich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pbelevich commented Jan 6, 2021

Uh oh!

lqf96 commented Jan 6, 2021

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lqf96 commented Dec 18, 2020 •

edited

Loading

facebook-github-bot commented Dec 18, 2020 •

edited

Loading

codecov bot commented Dec 19, 2020 •

edited

Loading

lqf96 commented Jan 5, 2021 •

edited

Loading