LLVM lowering #4581

wolfcomos · 2025-06-05T17:17:12Z

Goal: Improve host ir latency by replacing functions using expression evaluators to LLVM JIT ORC. Follow up to #4431

Currently:
Naive Support of output tensor Inference
Naive Support of output stride inference

wolfcomos · 2025-06-05T18:19:42Z

!test

wolfcomos · 2025-06-05T18:28:29Z

!test

jjsjann123 · 2025-06-05T18:29:56Z

I'm seeing workflow from forked repo requires maintainer approval... I didn't know that's a thing.

I have already added wolfcomos as a developer in the repo. Do we need to close and open a new PR from a branch inside this repo in order to trigger CI? asking @xwang233

xwang233 · 2025-06-05T18:41:56Z

I'm seeing workflow from forked repo requires maintainer approval... I didn't know that's a thing.

I have already added wolfcomos as a developer in the repo. Do we need to close and open a new PR from a branch inside this repo in order to trigger CI? asking @xwang233

Yes, PR has to be initiated from a branch in this repo in order for ~~CI and~~ AI review to work. Sorry about that!

jjsjann123 · 2025-06-05T18:50:18Z

I'm seeing workflow from forked repo requires maintainer approval... I didn't know that's a thing.
I have already added wolfcomos as a developer in the repo. Do we need to close and open a new PR from a branch inside this repo in order to trigger CI? asking @xwang233

Yes, PR has to be initiated from a branch in this repo in order for CI and AI review to work. Sorry about that!

Oh it's not your fault. I should have known that. @wolfcomos I'm closing this PR. Would you mind pushing the branch directly to nvfuser repo and start PR there instead.

wolfcomos · 2025-06-05T21:37:07Z

!test

xwang233 · 2025-06-06T01:19:51Z

!test

jjsjann123 · 2025-06-06T08:09:16Z

!test

jjsjann123

Note for myself. Need to continue review the transform and utils.

jjsjann123 · 2025-06-06T08:14:59Z

CMakeLists.txt

+set(BUILD_LLVM ON)
+if(BUILD_LLVM)
+  # Add LLVM JIT related dependencies
+  set(LLVM_DIR "/usr/lib/llvm-18/lib/cmake/llvm")


Note for myself. do we need to force an LLVM_DIR? it's under /usr/lib, so I think it's unnecessary.

nevertheless, this probably should have been a -D instead.

If you have to specify the folder, instead of hardcoding the folder,

$ llvm-config --libdir /usr/lib/llvm-18/lib

jjsjann123 · 2025-06-06T08:21:31Z

CMakeLists.txt

+  if(BUILD_LLVM)
+    target_include_directories(test_host_ir PRIVATE ${LLVM_INCLUDE_DIRS})
+    target_compile_definitions(test_host_ir PRIVATE ${LLVM_DEFINITIONS})
+    target_link_libraries(test_host_ir PUBLIC ${LLVM_LIBS})


My naive questions is, why the mismatch between the handling of the two target?

For codegen_internal, we only added lib dependencies, but here we also need the new include dir and compile options?

jjsjann123 · 2025-06-06T08:29:46Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+// }
+
+
+// run with the following command: NVFUSER_ENABLE=host_ir_lowering ./bin/test_host_ir --gtest_filter=HostIrEvaluatorTest.LaunchKernel2


The enable option thing should be done for the given test suite.

Fuser/tests/cpp/test_host_ir_integration.cpp

Lines 75 to 80 in 43c9e13

class HostIrIntegrationTest : public NVFuserTest {

protected:

HostIrIntegrationTest() {

EnableOptionsGuard::getCurOptions().set(EnableOption::HostIrLowering);

}

};

jjsjann123 · 2025-06-06T08:30:22Z

csrc/host_ir/lower_to_llvm.h

@@ -0,0 +1,143 @@
+


license header.

jjsjann123 · 2025-06-06T08:30:36Z

csrc/host_ir/lower_to_llvm.cpp

@@ -0,0 +1,704 @@
+#include <host_ir/lower_to_llvm.h>


jjsjann123 · 2025-06-06T08:43:23Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  print_compare_tensor(output_tensor, t0);
+}
+
+// TEST_F(HostIrEvaluatorTest, LaunchKernel3) {


jjsjann123 · 2025-06-06T08:45:16Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  tv1->setAllocationDomain({tv1->axis(0), tv1->axis(1), tv1->axis(2)}, {true, true, true});
+  tv1->printTransforms();


Suggested change

tv1->setAllocationDomain({tv1->axis(0), tv1->axis(1), tv1->axis(2)}, {true, true, true});

tv1->printTransforms();

tv1->setAllocationDomain(tv1->getLoopDomain(), true);

jjsjann123 · 2025-06-06T08:47:04Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  auto allocation_domain = tv1->getAllocationDomain();
+  auto logical_domain = tv1->getLogicalDomain();
+  std::unique_ptr<llvm::orc::LLJIT> JIT = llvm_jit_init(4);
+  llvm_jit_compile_shape_infer(JIT, fusion, logical_domain, logical_domain);


shouldn't the third argument be the logical_domain of the input, i.e. tv0->getLogicalDomain().

jjsjann123 · 2025-06-06T08:48:11Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  llvm_jit_compile_stride_infer(JIT, fusion, allocation_domain, logical_domain);
+
+  auto func_infer_shape = ExitOnErr(JIT->lookup("infer_shape"));
+  auto func_infer_stride = ExitOnErr(JIT->lookup("infer_stride"));


Note: I'm curious how our benchmark was done when we were comparing the vanilla HostIr allocation/execution vs the compiled code path.

nitpick: we would want to encapsulate error checking

jjsjann123 · 2025-06-06T08:50:40Z

csrc/host_ir/lower_to_llvm.h

+// Generate shape infer llvm module
+llvm::orc::ThreadSafeModule generate_infer_shape_module(std::vector<IterDomain*>& input_logical_domain, std::vector<IterDomain*>& output_logical_domain, Fusion& fusion);
+
+/*


We only needed the API from this line below. We should hide all the above things in the cpp file if we don't need to expose that.

jjsjann123 · 2025-06-06T17:16:59Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  llvm_jit_compile_stride_infer(JIT, fusion, allocation_domain, logical_domain);
+
+  auto func_infer_shape = ExitOnErr(JIT->lookup("infer_shape"));
+  auto func_infer_stride = ExitOnErr(JIT->lookup("infer_stride"));


nitpick: we would want to encapsulate error checking

jjsjann123 · 2025-06-06T17:20:47Z

csrc/host_ir/lower_to_llvm.h

+#include <llvm/ExecutionEngine/Orc/IRCompileLayer.h>
+#include <llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h>
+#include <llvm/ExecutionEngine/JITLink/JITLink.h>
+#include "llvm/Support/Error.h"


We should also be able to hide the majority of the includes, as long as we push the implementation internal into the cpp file instead.

jjsjann123 · 2025-06-06T17:21:52Z

csrc/host_ir/lower_to_llvm.cpp

+
+// shape infer compile
+void llvm_jit_compile_shape_infer(std::unique_ptr<llvm::orc::LLJIT>& JIT, Fusion& fusion, std::vector<IterDomain*>& input_domain, std::vector<IterDomain*>& output_domain){
+  std::cout << "llvm_jit_compile shape infer" << std::endl;


nitpick: Let's remove the debug print.

If you wanted to keep these for debugging purposes, you can try adding a dump option and put them inside an if block.

jjsjann123 · 2025-06-06T17:24:03Z

csrc/host_ir/lower_to_llvm.cpp

+  std::cout << "llvm_jit_compile shape infer" << std::endl;
+  auto TSM_shape = generate_infer_shape_module(input_domain, output_domain, fusion);
+  if (auto Err = JIT->addIRModule(std::move(TSM_shape))) {
+    llvm::errs() << "Error adding module to JIT: " << llvm::toString(std::move(Err)) << "\n";


nitpick: do we need to use llvm::errs()? There's NVF_ERROR for that.
https://github.com/NVIDIA/Fuser/blob/main/csrc/exceptions.h

jjsjann123 · 2025-06-06T17:25:34Z

csrc/host_ir/lower_to_llvm.cpp

+}
+
+// shape infer compile
+void llvm_jit_compile_shape_infer(std::unique_ptr<llvm::orc::LLJIT>& JIT, Fusion& fusion, std::vector<IterDomain*>& input_domain, std::vector<IterDomain*>& output_domain){


Is this supposed to be a concatenated full ID on all inputs?

nit, inputs_domain instead.

jjsjann123 · 2025-06-06T17:28:28Z

csrc/host_ir/lower_to_llvm.cpp

+  llvm::Value* input_ptr = &*arg_it;
+  llvm::Value* output_ptr = &*arg_it+2;
+  IdModel id_model(&fusion);
+  const ValGraph& graph = id_model.buildExactGraph();


we don't want to repetitively build the same id_model. Let's start thinking about a refactor so we can re-use this across a given fusion.

same here. But we can leave it as an optimization later.

jjsjann123 · 2025-06-06T17:30:55Z

csrc/host_ir/lower_to_llvm.cpp

+  IdModel id_model(&fusion);
+  const ValGraph& graph = id_model.buildExactGraph();
+  auto exprs = traverse_expr_group(graph, input_logical_domain, output_logical_domain);
+  for(long unsigned int i = 0; i < input_logical_domain.size(); i++){


nit: We already have the exact graph. We don't need to load everything. i.e. we only needed to load unique values.

I think from this point on, we should treat all the expression at the granularity of ValGroup and ExprGroup, which should get rid of some repetitive expression.
i.e. since llvm ir isn't aware of the mapping, they won't be able to CSE those.

jjsjann123 · 2025-06-06T17:39:49Z

csrc/host_ir/lower_to_llvm.h

+};
+
+// Dependency graph entry for the shape inference
+class dependency_graph{


IIUC, you just wanted a topo order traversal?

We should revisit this... I think we have utils for sorting already.

If we don't, we should add that 😉

jjsjann123 · 2025-06-06T17:49:44Z

csrc/host_ir/lower_to_llvm.cpp

+}
+
+// Allocate the output tensor based on the shape and stride inference
+at::Tensor aten_output_allocation(FuncType shape_infer_func, FuncType stride_infer_func, const at::Tensor& input, int64_t output_tensor_dim) { 


nit: we should refactor this to encapsulate these compiled functions.

jjsjann123 · 2025-06-06T17:50:07Z

csrc/host_ir/lower_to_llvm.cpp

+  }
+
+  for(auto* val : output_vals){
+    generate_shape_llvm_ir(val, val2graph, builder);


I think we should rethink the traversal part. Let's take this offline.

…stride calculation

Llvm lowering dev

jjsjann123 · 2025-06-16T07:45:45Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  at::Tensor output_tensor = jit.allocateOutputTensor({t0});
+
+  // Print Output Tensor Info
+  print_tensor_info(output_tensor);


nit: debug print should be removed.

jjsjann123 · 2025-06-16T07:49:29Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  // [N, H, W, C]
+  tv1->merge(1);
+  // [N, H*W, C]
+  tv1->setAllocationDomain(tv1->getLoopDomain(), {true, true, true});


Is this a legal case?

Since we are re-ordering the split last dimension, can we still merge it back? This might need to trigger an assert.

jjsjann123 · 2025-06-16T07:54:21Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  // [N * H, W * C/4, 2, 2]
+  TensorView* tv1 = set(tv0);
+
+  tv1->setAllocationDomain(tv1->getLoopDomain(), {true, true, true, true});


tv1 is not used?

jjsjann123 · 2025-06-16T07:55:04Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  // [N * H, W, C/4, 4]
+  tv0->split(3,2);
+  // [N * H, W, C/4, 2, 2]
+  tv0->merge(1,2);


QQ: tv0 transformation doesn't really do anything, since it's done on inputs. Is there anything I missed?

jjsjann123 · 2025-06-16T07:57:01Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+  TensorView* tv2 = makeContigConcreteTensor({N, H, W});
+  fusion.addInput(tv2);
+  auto tv3 = broadcast(tv2, {false, false, false, true});
+  auto tv4 = add(tv2, tv3);


what's happening with tv4 here?

it should be of shape (N, H, W, 1)?

jjsjann123 · 2025-06-16T09:11:16Z

csrc/host_ir/lower_to_llvm.cpp

+  // Map the output values to the input values if they are the same
+  for(auto* val : output_values){
+    auto index = mapToInputDomain(boundary_vals, val, graph);
+    if(index != -1){
+      val2llvm_val[graph.toGroup(val)] = val2llvm_val[graph.toGroup(boundary_vals[index])];
+    }
+  }
+
+  // Store the output values to the preallocated output buffer
+  for(size_t i = 0; i < output_values.size(); i++){
+    auto* output_i_ptr = builder.CreateGEP(int64Ty, output_ptr, builder.getInt64(i), "ptr");
+    builder.CreateStore(val2llvm_val[graph.toGroup(output_values[i])], output_i_ptr);
+  }


these two for loops could be merged.

jjsjann123 · 2025-06-16T09:18:34Z

csrc/host_ir/lower_to_llvm.cpp

+By default, we assume it is in typological order, which means input values are ready to use
+
+*/
+void generate_shape_llvm_ir(Expr* expr, llvm::IRBuilder<>& builder, std::unordered_map<ValGroup,llvm::Value*>& val2llvm, std::unordered_map<int, Val*>& boundary_vals, const ValGraph& graph) {


why do we need boundary_vals? is this an optimization for simpler math, i.e. just use input shapes whenever possible.

jjsjann123 · 2025-06-16T09:22:36Z

csrc/host_ir/lower_to_llvm.cpp

+    }
+    else{
+      input_outer_llvm_val = val2llvm[graph.toGroup(merge_input_outer_val)];
+    }


I'm seeing this pattern a lot vvv

auto* v = ...; int index = mapToInputDomain(...); llvm::Value* val = nullptr; if (index != -1) { val = val2llvm[graph.toGroup(boundary_vals[index])]; } else { val = val2llvm[graph.toGroup(v)]; }

Maybe make it a lambda to keep the code easier to read.

jjsjann123 · 2025-06-16T09:31:16Z

csrc/host_ir/lower_to_llvm.cpp

+        }
+      }
+      // outer = input + 1
+      llvm::Value* minus_1 = builder.CreateSub(input_llvm_val, builder.getInt64(1), "minus_1");


The comment says + 1 but the code is minus_1?

Looks like we are trying to do a ceil_div? looks like the comment is just wrong.

jjsjann123 · 2025-06-16T09:34:16Z

csrc/host_ir/lower_to_llvm.cpp

+      // outer = (input + 1 + inner) / inner
+      val2llvm[graph.toGroup(split_output_outer_val)] = builder.CreateUDiv(sum_ab, val2llvm[graph.toGroup(split_output_inner_val)], split_output_outer_val->as<IterDomain>()->extent()->toString());
+    }
+    else{


nitpick: this looks like the exact same logic as with inner split. let's merge the logic together.

wujingyue · 2025-06-16T21:53:16Z

CMakeLists.txt

@@ -834,7 +864,15 @@ if(BUILD_TEST)
    ${NVFUSER_ROOT}/tests/cpp/test_host_ir_integration.cpp
    ${NVFUSER_ROOT}/tests/cpp/test_host_ir_stream_lowering.cpp
  )
+  if(BUILD_LLVM)
+  list(APPEND HOSTIR_TEST_SRCS
+    ${NVFUSER_ROOT}/tests/cpp/test_host_ir_llvm_lowering.cpp


Suggested change

${NVFUSER_ROOT}/tests/cpp/test_host_ir_llvm_lowering.cpp

${NVFUSER_ROOT}/tests/cpp/test_host_ir_compilation.cpp

LLVM lowering is also correct, but terminology wise I tend to use "lowering" for fusion IR => host IR, and "compilation" for host IR to LLVM.

wujingyue · 2025-06-16T21:56:40Z

csrc/host_ir/lower_to_llvm.cpp

Maybe compile_to_llvm.cpp?

wujingyue · 2025-06-16T21:58:03Z

csrc/host_ir/lower_to_llvm.cpp

+using ShapeInferFunc = void (*)(const int64_t* input_tensor_shape, int64_t input_tensor_shape_buffer_size, 
+int64_t* output_tensor_shape, int64_t output_tensor_shape_buffer_size);


Suggested change

using ShapeInferFunc = void (*)(const int64_t* input_tensor_shape, int64_t input_tensor_shape_buffer_size,

int64_t* output_tensor_shape, int64_t output_tensor_shape_buffer_size);

using ShapeInferFunc = std::function<void(const int64_t*, int64_t, int64_t*, int64_t)>;

wujingyue · 2025-06-16T21:58:26Z

csrc/host_ir/lower_to_llvm.cpp

+// Dependency graph entry for the stride inference
+struct StrideInfo {
+public:
+    llvm::Value* llvm_extent = nullptr;   // LLVM Value for the extent of this IterDomain


Can you lintrunner -a?

wujingyue · 2025-06-16T21:59:10Z

csrc/host_ir/lower_to_llvm.cpp

+template <typename T>
+T ExitOnErr(llvm::Expected<T> &&E) {
+    if (!E) {
+        NVF_ERROR(false, "LLVM JIT Initialization Error: " + llvm::toString(E.takeError()));


Suggested change

NVF_ERROR(false, "LLVM JIT Initialization Error: " + llvm::toString(E.takeError()));

NVF_ERROR(false, "LLVM JIT Initialization Error: ", llvm::toString(E.takeError()));

wujingyue · 2025-06-16T22:19:59Z

csrc/host_ir/lower_to_llvm.h

+  void compile(TensorView* output_tv);
+
+  // Execute the compiled functions to allocate and return an output tensor.
+  at::Tensor allocateOutputTensor(const std::vector<at::Tensor>& input_tensors);


Is HostIrLlvmJit per TensorView? What happens when I call compile(tv0) and then compile(tv1) on the same object? The function compiled out of tv0 is overwritten?

I'm asking this because the end state is unclear to me.

While I'm happy to take experimental/intermediate code, what do you think about the following?

class HostIrLlvmJit { public: struct CompileOptions { int num_threads; }; HostIrLlvmJit(HostIrContainer* container, CompileOptions options); // Used for the first stage where we use LLVMJIT only for fast allocation. at::Tensor allocate(kir::Allocate* allocate); // Used for the second stage where we use LLVMJIT to run the entire HostIrContainer. At that point, it even makes sense for HostIrLlvmJit to take the ownership of the HostIrContainer, and therefore HostIrLlvmJit(std::unique<HostIrContainer> container, ...) KernelArgumentHolder run(const KernelArgumentHolder& inputs);

It definitely shouldn't be per TensorView. We should be able to reuse it across a given HostIrContainer at least. ^^^ the above looks good to me.

wujingyue · 2025-06-16T22:33:29Z

csrc/host_ir/lower_to_llvm.cpp

+    if (!E) {
+        NVF_ERROR(false, "LLVM JIT Initialization Error: " + llvm::toString(E.takeError()));
+        llvm::errs() << llvm::toString(E.takeError()) << "\n";
+        exit(1);


throw an exception? It's not necessarily the end of the world when compilation fails. We could always fall back to interpretation.

wujingyue · 2025-06-16T22:34:30Z

csrc/host_ir/lower_to_llvm.cpp

+  std::string op_string = std::string(expr->getOpString());
+
+  // Perform the merge -> mul transformation
+  if(op_string == "Merge"){


Suggested change

if(op_string == "Merge"){

if (auto* merge = dynamic_cast<Merge*>(expr)) {

wujingyue · 2025-06-16T23:19:27Z

csrc/host_ir/lower_to_llvm.cpp

+    boundary_vals[i] = input_vals[i];
+  }
+
+  IdModel id_model(&fusion);


I suspect IdModel buys you little. For top-level expressions (e.g. allocation), we mostly care about extents, which are mapped across ops by construction, not indices. I also doubt we exercised IdModel at all on non-SSA IR (e.g. host IR and kernel IR).

Let's discuss this one in our meeting offline.

wujingyue · 2025-06-16T23:35:36Z

csrc/host_ir/lower_to_llvm.cpp

+      logical_sharded_shape_result.size());
+
+  // Create the output tensor with the computed shape and strides
+  at::Tensor allocated_tensor = at::empty_strided(logical_sharded_shape_result, logical_stride_result, input_tensors[0].options());


This is far from what I had in mind. I expect LLVM JIT to take a TensorView* and generate a LLVM IR function like the following

at::Tensor allocate_tv1(int64_t i0, int64_t i1, ...) { // args represent logical domain extents of tv1 calculate local tensor sizes (e.g. for multi-GPU) calculate strides return at::empty_strided(local_tensor_sizes, strides) }

logical_shape_infer_fn and logical_stride_infer_fn shouldn't even exist. They should be inlined to the above allocate_tv1 function for performance and for simplicity. at::empty_strided should be called by allocate_tv1 rather than the JIT.

I'm less worried about this particular IR than what we'll try to deliver by the end of the internship. I'm happy to discuss this in person.

I think we are on the same page. i.e. eventually we would want to have a better encapsulation.

I'm trying to limit the scope of the PR to have functional things merged in first. We should refactor this in follow up PRs during integration.

wujingyue · 2025-06-17T02:13:41Z

tests/cpp/test_host_ir_llvm_lowering.cpp

+}
+
+TEST_F(HostIrLLVMTest, AllocationMergeSplit1) {
+  Fusion fusion;


This looks wrong. HostIrLlvmJit, similar to HostIrEvaluator, should take HostIrContainer, not Fusion.

I think this goes back to the topic on how we would expect the inference to be done. The only reason I suggested to go with fusion directly is just so that I feel more comfortable about IdModel running on that.

wolfcomos and others added 2 commits June 5, 2025 02:06

standalone llvm lowering

04469fd

Merge branch 'NVIDIA:main' into llvm_lowering

23be50d

jjsjann123 self-requested a review June 5, 2025 18:17

jjsjann123 closed this Jun 5, 2025

xwang233 reopened this Jun 5, 2025

add merge op verifier

b417500

jjsjann123 reviewed Jun 6, 2025

View reviewed changes

wolfcomos and others added 14 commits June 6, 2025 15:42

minor fixes on traversal, add 3 correct examples for normal/permuted …

0ac9c59

…stride calculation

add singleton hostIrLLVM jit interface

956ab71

tested refactored interface

8756d22

integrations

2543a26

minor fix on input domain mapping

9fa7f17

add reshape example

11c0455

refactor shape inference

f82d5b8

cleanup

5ae3573

refactored stride calculation

58536cf

minor fix

223ad7f

revert host ir integration prints

602b442

update cmakelist

5835de4

Merge pull request #1 from wolfcomos/llvm_lowering_dev

2c1d92e

Llvm lowering dev

Cleanup

414c44a

wolfcomos and others added 16 commits June 9, 2025 15:35

add license header

4711e55

minor fix

6cef12f

fix verifier

c022b18

Merge pull request #2 from wolfcomos/llvm_lowering_dev

478dbd5

Llvm lowering dev

add tests

794c5fe

replace llvm codegen granularity from Val to ValGroup

3b12a3c

cleanups

0eb3f86

change std::cerr to NVF ERROR

28d9c44

Merge pull request #3 from wolfcomos/llvm_lowering_dev

2af3477

Llvm lowering dev

support DID allocation split

bbcac65

add test and refactor for DID domain support

8c2c8c4

add more tests, minor fix

afa9727

Merge pull request #4 from wolfcomos/llvm_lowering_dev

0b02b12

Llvm lowering dev

change cmakelist

8365b4e

add LLVM build as a flag

a236484

Merge pull request #5 from wolfcomos/llvm_lowering_dev

5411d72

Llvm lowering dev

wolfcomos requested a review from jjsjann123 June 12, 2025 18:32

jjsjann123 reviewed Jun 16, 2025

View reviewed changes

wujingyue reviewed Jun 16, 2025

View reviewed changes

wujingyue reviewed Jun 17, 2025

View reviewed changes

jjsjann123 mentioned this pull request Jun 18, 2025

Host IR LLVM Lowering 1: Build Config Change & Initial Allocate support #4651

Merged

		// }


		// run with the following command: NVFUSER_ENABLE=host_ir_lowering ./bin/test_host_ir --gtest_filter=HostIrEvaluatorTest.LaunchKernel2

	class HostIrIntegrationTest : public NVFuserTest {
	protected:
	HostIrIntegrationTest() {
	EnableOptionsGuard::getCurOptions().set(EnableOption::HostIrLowering);
	}
	};

		tv1->setAllocationDomain({tv1->axis(0), tv1->axis(1), tv1->axis(2)}, {true, true, true});
		tv1->printTransforms();

	tv1->setAllocationDomain({tv1->axis(0), tv1->axis(1), tv1->axis(2)}, {true, true, true});
	tv1->printTransforms();
	tv1->setAllocationDomain(tv1->getLoopDomain(), true);

	${NVFUSER_ROOT}/tests/cpp/test_host_ir_llvm_lowering.cpp
	${NVFUSER_ROOT}/tests/cpp/test_host_ir_compilation.cpp

		using ShapeInferFunc = void ()(const int64_t input_tensor_shape, int64_t input_tensor_shape_buffer_size,
		int64_t* output_tensor_shape, int64_t output_tensor_shape_buffer_size);

	using ShapeInferFunc = void ()(const int64_t input_tensor_shape, int64_t input_tensor_shape_buffer_size,
	int64_t* output_tensor_shape, int64_t output_tensor_shape_buffer_size);
	using ShapeInferFunc = std::function<void(const int64_t, int64_t, int64_t, int64_t)>;

	NVF_ERROR(false, "LLVM JIT Initialization Error: " + llvm::toString(E.takeError()));
	NVF_ERROR(false, "LLVM JIT Initialization Error: ", llvm::toString(E.takeError()));

	if(op_string == "Merge"){
	if (auto* merge = dynamic_cast<Merge*>(expr)) {

LLVM lowering #4581

Are you sure you want to change the base?

LLVM lowering #4581

Uh oh!

Conversation

wolfcomos commented Jun 5, 2025

Uh oh!

wolfcomos commented Jun 5, 2025

Uh oh!

wolfcomos commented Jun 5, 2025

Uh oh!

jjsjann123 commented Jun 5, 2025

Uh oh!

xwang233 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjsjann123 commented Jun 5, 2025

Uh oh!

wolfcomos commented Jun 5, 2025

Uh oh!

xwang233 commented Jun 6, 2025

Uh oh!

jjsjann123 commented Jun 6, 2025

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

xwang233 commented Jun 5, 2025 •

edited

Loading