-
Notifications
You must be signed in to change notification settings - Fork 264
Add in API definitions for Rapids UDAF #13870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Greptile OverviewGreptile SummaryIntroduces API interfaces for GPU-accelerated User Defined Aggregate Functions (UDAF), establishing the foundation for columnar aggregation operations. This is the first step in a phased implementation approach to support UDAF on GPU.
Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User as UDAF Implementation
participant Update as updateAggregation()
participant Merge as mergeAggregation()
participant Result as getResult()
Note over User: Initial Aggregation Phase
User->>Update: Call updateAggregation()
Update->>Update: preStep(numRows, args)
Note over Update: Transform input columns<br/>(optional)
Update->>Update: aggregate/reduce operation
Note over Update: Process raw input data
Update->>Update: postStep(numRows, aggregatedData)
Note over Update: Transform output to buffer format
Note over User: Merge Phase (Distributed)
User->>Merge: Call mergeAggregation()
Merge->>Merge: preStep(numRows, args)
Note over Merge: Prepare buffer data<br/>(optional)
Merge->>Merge: aggregate/reduce operation
Note over Merge: Combine partial results
Merge->>Merge: postStep(numRows, aggregatedData)
Note over Merge: Transform to final buffer format
Note over User: Final Result Phase
User->>Result: getResult(numRows, args, outType)
Result-->>User: Final ColumnVector
Note over User: Returns single column<br/>with final UDAF result
Note over User: Empty Input Case
User->>User: getDefaultValue()
Note over User: Returns Scalar[] for<br/>zero-row reduction
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, no comments
|
build |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
|
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 2 comments
sql-plugin-api/src/main/java/com/nvidia/spark/RapidsUDAFGroupByAggregation.java
Outdated
Show resolved
Hide resolved
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
|
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 1 comment
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, no comments
|
build |
res-life
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Contributes to #13412
Rapids UDAF is designed to support executing an UDAF (User Defined Aggregate Function) in the columnar way to get accelerated by GPU.
Complete support of RapidsUDAF covers too many things and a single PR (#13450) is too large to review. So instead it's better to be added in piece by piece, and this PR is the first one who only introduces the relevant inerfaces.
RapidsUDAF- the top interface, it defines 5 methods as below, trying to follow the CPU definitions (UserDefinedAggregateFunction) as much as possible to minimize users' learning effort.updateAggregationandmergeAggregationreturn aRapidsUDAFGroupByAggregationwho contains the APIs to perform the aggregation.RapidsUDAFGroupByAggregation- base interface for GPU-accelerated UDAF aggregation implementations. It provides the contract for different aggregation strategies. it also supports an optional pair ofpreStepandpostStepto run some transformations before and after a "reduce/aggregate" operation, similar as "preMerge" and "postMerge" for the merge-stage aggregate inGpuAggregateFunction.RapidsSimpleGroupByAggregation- the child class ofRapidsUDAFGroupByAggregation, providing a standard cuDF-based aggregation step that uses built-in cuDF aggregation operations.Putting the groupby 'aggregate' API in the child class is because more types of
aggregatemay be introduced in the future via child classes. e.g. anaggregateas below can access the grouped data and keys to let users do more customization.