indexAccumulate python api #4066

jjsjann123 · 2025-03-12T22:32:27Z

Things done in this PR is to support embedding backward, which requires torch.index_put_(..., accumulate=True).

Stacked PRs:

Adding IndexPutAccumulateOp #4063
indexAccumulate python api #4066 <-- This PR

What this PR does:

Added python API Tensor fd.ops.index_accumulate(Tensor acc, Tensor index, Tensor value

github-actions · 2025-03-12T22:33:14Z

Review updated until commit 6b96692

Description

Added index_accumulate Python API
Included opinfo test for index_accumulate
Updated fusion record handling for index_accumulate
Formatted code with clang-format and BLACK

Changes walkthrough 📝

Relevant files

Enhancement

python_bindings.cpp `Add index_accumulate Python API` csrc/python_frontend/python_bindings.cpp Added `index_accumulate` function to Python bindings	+24/-0
fusion_record.cpp `Add deserialization for IndexAccumulateOpRecord` csrc/serde/fusion_record.cpp Added deserialization for `IndexAccumulateOpRecord`	+7/-0
fusion_record.h `Add IndexAccumulateOpRecord` csrc/python_frontend/fusion_record.h Added `IndexAccumulateOpRecord` struct	+22/-0

Tests

opinfo_input_generators.py `Add index_accumulate_generator` tests/python/opinfo_input_generators.py Added `index_accumulate_generator` function	+19/-0
opinfos.py `Add index_accumulate opinfo` tests/python/opinfos.py Added `index_accumulate_generator` to opinfo list Added `index_accumulate_ref` function Created `index_accumulate_opinfo` and appended to shape_ops	+26/-0

Configuration changes

fusion_cache.fbs `Add IndexAccumulateOp to RecordType` csrc/serde/fusion_cache.fbs Added `IndexAccumulateOp` to RecordType enum	+1/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Performance Goal

Ensure a clear performance goal is set and that feedback was sought early regarding the expected performance improvements of the index_accumulate function.

"index_accumulate",
[](FusionDefinition::Operators& self,
   Tensor acc,
   Tensor index,
   Tensor value) -> Tensor {
  FUSER_PERF_SCOPE("Operators.index_accumulate");
  NVF_CHECK(
      self.validUse(), "Attempting to add to a completed definition!");
  FusionDefinition* fd = self.fusion_definition;
  Tensor output = fd->defineTensor(acc.dims);
  fd->defineRecord(new IndexAccumulateOpRecord(
      {
          fd->recordingState(acc()),
          fd->recordingState(index()),
          fd->recordingState(value()),
      },
      {fd->recordingState(output())}));
  return output;
},
py::arg("acc"),
py::arg("index"),
py::arg("value"),
py::return_value_policy::reference);

Test Coverage

Verify that the test cases in index_accumulate_generator cover a wide range of scenarios and edge cases to ensure the correctness and robustness of the index_accumulate function.

def index_accumulate_generator(
    op: OpInfo, dtype: torch.dtype, requires_grad: bool = False, **kwargs
):
    make_arg = partial(
        make_tensor, device="cuda", dtype=dtype, requires_grad=requires_grad
    )
    make_index = partial(make_tensor, device="cuda", requires_grad=False)

    # vocab_size, hidden_size, seq_size
    cases = ((1024, 12, 300),)

    for vocab, hidden, seq in cases:
        for index_dtype in [torch.int, torch.long]:
            acc = make_arg((vocab, hidden))
            index = make_index((seq,), low=0, high=vocab, dtype=index_dtype)
            value = make_arg((seq, hidden))
            yield SampleInput(acc, index, value)

Error Handling

Ensure that the IndexAccumulateOpRecord operator handles potential errors gracefully, such as mismatched tensor dimensions or unsupported data types.

IndexAccumulateOpRecord(std::vector<State> args, std::vector<State> outputs)
    : RecordFunctor(
          std::move(args),
          std::move(outputs),
          "ops.index_accumulate",
          serde::RecordType::IndexAccumulateOp) {}
~IndexAccumulateOpRecord() override = default;
RecordFunctor* clone() final {
  return new IndexAccumulateOpRecord(*this);
}

void operator()(FusionState& fd) final {
  auto acc = fd.getFusionState(args_.at(0).index)->as<TensorView>();
  auto index = fd.getFusionState(args_.at(1).index)->as<TensorView>();
  auto value = fd.getFusionState(args_.at(2).index)->as<TensorView>();

  auto output = indexAccumulate(acc, index, value);
  fd.setFusionState(outputs_.at(0).index, output);
}

jjsjann123 · 2025-03-14T01:20:02Z

marking this as draft to avoid accidental merge.
But this PR is good for review as-is.

rdspring1

Do you need to define void handle(IndexAccumulateOp* iaop) in csrc/python_frontend/translation.cpp for the python clone and segmentation features?

Otherwise, the PR looks good to me.

rdspring1 · 2025-03-14T01:31:33Z

csrc/python_frontend/python_bindings.cpp

+      py::arg("acc"),
+      py::arg("index"),
+      py::arg("value"),
+      py::return_value_policy::reference);


I'm trying to improve python user experience by adding a docstring to new functions.

Docstring generated by Gemini.

m.def("index_accumulate", &indexAccumulate, py::arg("acc_tv"), py::arg("index_tv"), py::arg("value_tv"), R"( Accumulates values into a tensor at specified indices. This function performs a restricted version of `torch.index_put(..., accumulate=true)`. It adds the values from `value_tv` to the elements of `acc_tv` at the indices specified by `index_tv`. acc_tv: The tensor to accumulate into (in-place modification). index_tv: The tensor containing the indices. value_tv: The tensor containing the values to accumulate. Returns: A pointer to the modified `acc_tv` tensor. Note: This is a restricted version and may not support all features of the full `torch.index_put(..., accumulate=true)` function. )");

Hahaha, thanks for the draft~~~ will add it in.

jjsjann123 added 5 commits March 12, 2025 14:36

adding python API

293bd95

fixing more

d3640e1

adding opinfo test for index_accumulate

666a807

missed some code change

26550fd

fixing test

97eaff5

jjsjann123 added 2 commits March 13, 2025 12:04

Merge remote-tracking branch 'origin/jjsjann123/index_put' into HEAD

a92d54f

clangformat and BLACK

6b96692

jjsjann123 mentioned this pull request Mar 14, 2025

Adding IndexPutAccumulateOp #4063

Open

2 tasks

jjsjann123 marked this pull request as ready for review March 14, 2025 01:18

jjsjann123 requested review from rdspring1 and protonu March 14, 2025 01:19

jjsjann123 marked this pull request as draft March 14, 2025 01:19

rdspring1 reviewed Mar 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexAccumulate python api #4066

indexAccumulate python api #4066

jjsjann123 commented Mar 12, 2025 •

edited

Loading

github-actions bot commented Mar 12, 2025 •

edited

Loading

jjsjann123 commented Mar 14, 2025

rdspring1 left a comment •

edited

Loading

rdspring1 Mar 14, 2025

jjsjann123 Mar 14, 2025

indexAccumulate python api #4066

Are you sure you want to change the base?

indexAccumulate python api #4066

Conversation

jjsjann123 commented Mar 12, 2025 • edited Loading

github-actions bot commented Mar 12, 2025 • edited Loading

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

jjsjann123 commented Mar 14, 2025

rdspring1 left a comment • edited Loading

Choose a reason for hiding this comment

rdspring1 Mar 14, 2025

Choose a reason for hiding this comment

jjsjann123 Mar 14, 2025

Choose a reason for hiding this comment

jjsjann123 commented Mar 12, 2025 •

edited

Loading

github-actions bot commented Mar 12, 2025 •

edited

Loading

rdspring1 left a comment •

edited

Loading