Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instanced rendering not working with Metal. #891

Open
aleph1 opened this issue Feb 21, 2025 · 12 comments
Open

Instanced rendering not working with Metal. #891

aleph1 opened this issue Feb 21, 2025 · 12 comments

Comments

@aleph1
Copy link
Contributor

aleph1 commented Feb 21, 2025

Describe the bug
As per , instanced rendering is not implemented for Metal on macOS. I took a quick look at pipeline.m.h and it seems like the issue is that the code that iterates the pipeline->inputLayout array does not account for the correct number of input layouts, as other backends account for this. I have a patched version that is based on the Vulkan back-end that no longer triggers a run-time error, however if I compile and run the instanced drawing sample, only one triangle is drawn versus the expected number.

To Reproduce
Steps to reproduce the behavior:

  1. As per download and build the sample for instanced rendering using -g metal
  2. Build the Xcode project.
  3. Run the Xcode project and it will throw a runtime error.

Expected behavior
Instanced rendering should work as expected.

Additional context
I am fairly new to metal, but this is the partially modified code that no longer throws a runtime error, but it still only draws a single triangle, so I am missing something. I have included a note as to where the code modifications in this function begin and end.

void kinc_g5_pipeline_compile(kinc_g5_pipeline_t *pipeline) {
	kinc_log(KINC_LOG_LEVEL_INFO, "kinc_g5_pipeline_compile()");
	MTLRenderPipelineDescriptor *renderPipelineDesc = [[MTLRenderPipelineDescriptor alloc] init];
	renderPipelineDesc.vertexFunction = (__bridge id<MTLFunction>)pipeline->vertexShader->impl.mtlFunction;
	renderPipelineDesc.fragmentFunction = (__bridge id<MTLFunction>)pipeline->fragmentShader->impl.mtlFunction;
	for (int i = 0; i < pipeline->colorAttachmentCount; ++i) {
		renderPipelineDesc.colorAttachments[i].pixelFormat = convert_render_target_format(pipeline->colorAttachment[i]);
		renderPipelineDesc.colorAttachments[i].blendingEnabled =
		    pipeline->blend_source != KINC_G5_BLEND_ONE || pipeline->blend_destination != KINC_G5_BLEND_ZERO ||
		    pipeline->alpha_blend_source != KINC_G5_BLEND_ONE || pipeline->alpha_blend_destination != KINC_G5_BLEND_ZERO;
		renderPipelineDesc.colorAttachments[i].sourceRGBBlendFactor = convert_blending_factor(pipeline->blend_source);
		renderPipelineDesc.colorAttachments[i].destinationRGBBlendFactor = convert_blending_factor(pipeline->blend_destination);
		renderPipelineDesc.colorAttachments[i].rgbBlendOperation = convert_blending_operation(pipeline->blend_operation);
		renderPipelineDesc.colorAttachments[i].sourceAlphaBlendFactor = convert_blending_factor(pipeline->alpha_blend_source);
		renderPipelineDesc.colorAttachments[i].destinationAlphaBlendFactor = convert_blending_factor(pipeline->alpha_blend_destination);
		renderPipelineDesc.colorAttachments[i].alphaBlendOperation = convert_blending_operation(pipeline->alpha_blend_operation);
		renderPipelineDesc.colorAttachments[i].writeMask =
		    (pipeline->colorWriteMaskRed[i] ? MTLColorWriteMaskRed : 0) | (pipeline->colorWriteMaskGreen[i] ? MTLColorWriteMaskGreen : 0) |
		    (pipeline->colorWriteMaskBlue[i] ? MTLColorWriteMaskBlue : 0) | (pipeline->colorWriteMaskAlpha[i] ? MTLColorWriteMaskAlpha : 0);
	}
	renderPipelineDesc.depthAttachmentPixelFormat = MTLPixelFormatInvalid;
	renderPipelineDesc.stencilAttachmentPixelFormat = MTLPixelFormatInvalid;

	MTLVertexDescriptor *vertexDescriptor = [[MTLVertexDescriptor alloc] init];

        // Begin code modifications

	int vertexBindingCount = 0;
	for (int i = 0; i < 16; ++i) {
		if (pipeline->inputLayout[i] == NULL) {
			break;
		}
		//vertexAttributeCount += pipeline->inputLayout[i]->size;
		vertexBindingCount++;
	}

	int attr = 0;
	
	for (int binding = 0; binding < vertexBindingCount; ++binding) {
		int offset = 0;
		int stride = 0;
		for (int i = 0; i < pipeline->inputLayout[binding]->size; ++i) {
			kinc_g5_vertex_element_t element = pipeline->inputLayout[binding]->elements[i];
			int index = findAttributeIndex(renderPipelineDesc.vertexFunction.vertexAttributes, element.name);

			if (index < 0) {
				kinc_log(KINC_LOG_LEVEL_WARNING, "Could not find vertex attribute %s\n", element.name);
			}

			if (index >= 0) {
				vertexDescriptor.attributes[attr].bufferIndex = attr;
				vertexDescriptor.attributes[attr].offset = offset;
				offset += kinc_g4_vertex_data_size(element.data);
				stride += kinc_g4_vertex_data_size(element.data);

				switch (element.data) {
				case KINC_G4_VERTEX_DATA_NONE:
					assert(false);
					break;
				case KINC_G4_VERTEX_DATA_F32_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatFloat;
					break;
				case KINC_G4_VERTEX_DATA_F32_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatFloat2;
					break;
				case KINC_G4_VERTEX_DATA_F32_3X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatFloat3;
					break;
				case KINC_G4_VERTEX_DATA_F32_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatFloat4;
					break;
				case KINC_G4_VERTEX_DATA_F32_4X4:
					assert(false);
					break;
				case KINC_G4_VERTEX_DATA_I8_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar;
					break;
				case KINC_G4_VERTEX_DATA_U8_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar;
					break;
				case KINC_G4_VERTEX_DATA_I8_1X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatCharNormalized;
					break;
				case KINC_G4_VERTEX_DATA_U8_1X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUCharNormalized;
					break;
				case KINC_G4_VERTEX_DATA_I8_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar2;
					break;
				case KINC_G4_VERTEX_DATA_U8_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar2;
					break;
				case KINC_G4_VERTEX_DATA_I8_2X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar2Normalized;
					break;
				case KINC_G4_VERTEX_DATA_U8_2X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar2Normalized;
					break;
				case KINC_G4_VERTEX_DATA_I8_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar4;
					break;
				case KINC_G4_VERTEX_DATA_U8_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar4;
					break;
				case KINC_G4_VERTEX_DATA_I8_4X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar4Normalized;
					break;
				case KINC_G4_VERTEX_DATA_U8_4X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar4Normalized;
					break;
				case KINC_G4_VERTEX_DATA_I16_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort;
					break;
				case KINC_G4_VERTEX_DATA_U16_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort;
					break;
				case KINC_G4_VERTEX_DATA_I16_1X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShortNormalized;
					break;
				case KINC_G4_VERTEX_DATA_U16_1X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShortNormalized;
					break;
				case KINC_G4_VERTEX_DATA_I16_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort2;
					break;
				case KINC_G4_VERTEX_DATA_U16_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort2;
					break;
				case KINC_G4_VERTEX_DATA_I16_2X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort2Normalized;
					break;
				case KINC_G4_VERTEX_DATA_U16_2X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort2Normalized;
					break;
				case KINC_G4_VERTEX_DATA_I16_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort4;
					break;
				case KINC_G4_VERTEX_DATA_U16_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort4;
					break;
				case KINC_G4_VERTEX_DATA_I16_4X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort4Normalized;
					break;
				case KINC_G4_VERTEX_DATA_U16_4X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort4Normalized;
					break;
				case KINC_G4_VERTEX_DATA_I32_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatInt;
					break;
				case KINC_G4_VERTEX_DATA_U32_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUInt;
					break;
				case KINC_G4_VERTEX_DATA_I32_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatInt2;
					break;
				case KINC_G4_VERTEX_DATA_U32_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUInt2;
					break;
				case KINC_G4_VERTEX_DATA_I32_3X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatInt3;
					break;
				case KINC_G4_VERTEX_DATA_U32_3X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUInt3;
					break;
				case KINC_G4_VERTEX_DATA_I32_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatInt4;
					break;
				case KINC_G4_VERTEX_DATA_U32_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUInt4;
					break;
				default:
					assert(false);
					break;
				}
				attr++;
			}
		}

		vertexDescriptor.layouts[binding].stride = offset;
		vertexDescriptor.layouts[binding].stepFunction = MTLVertexStepFunctionPerVertex;
	}

        // End code modifications

	renderPipelineDesc.vertexDescriptor = vertexDescriptor;

	NSError *errors = nil;
	MTLRenderPipelineReflection *reflection = nil;
	id<MTLDevice> device = getMetalDevice();

	pipeline->impl._pipeline = (__bridge_retained void *)[device newRenderPipelineStateWithDescriptor:renderPipelineDesc
	                                                                                          options:MTLPipelineOptionBufferTypeInfo
	                                                                                       reflection:&reflection
	                                                                                            error:&errors];
	if (errors != nil)
		NSLog(@"%@", [errors localizedDescription]);
	assert(pipeline->impl._pipeline && !errors);

	renderPipelineDesc.depthAttachmentPixelFormat = MTLPixelFormatDepth32Float_Stencil8;
	renderPipelineDesc.stencilAttachmentPixelFormat = MTLPixelFormatDepth32Float_Stencil8;
	pipeline->impl._pipelineDepth = (__bridge_retained void *)[device newRenderPipelineStateWithDescriptor:renderPipelineDesc
	                                                                                               options:MTLPipelineOptionBufferTypeInfo
	                                                                                            reflection:&reflection
	                                                                                                 error:&errors];
	if (errors != nil)
		NSLog(@"%@", [errors localizedDescription]);
	assert(pipeline->impl._pipelineDepth && !errors);

	pipeline->impl._reflection = (__bridge_retained void *)reflection;

	MTLDepthStencilDescriptor *depthStencilDescriptor = [MTLDepthStencilDescriptor new];
	depthStencilDescriptor.depthCompareFunction = convert_compare_mode(pipeline->depthMode);
	depthStencilDescriptor.depthWriteEnabled = pipeline->depthWrite;
	pipeline->impl._depthStencil = (__bridge_retained void *)[device newDepthStencilStateWithDescriptor:depthStencilDescriptor];

	depthStencilDescriptor.depthCompareFunction = MTLCompareFunctionAlways;
	depthStencilDescriptor.depthWriteEnabled = false;
	pipeline->impl._depthStencilNone = (__bridge_retained void *)[device newDepthStencilStateWithDescriptor:depthStencilDescriptor];
}
@aleph1
Copy link
Contributor Author

aleph1 commented Feb 22, 2025

Looking at the Vulkan backend as a point of comparison it seems like Metal is not using the instanced value of each inputLayout to alter the step function for the MTLVertexDescriptor’s layouts. There might be other issues, but I think that is the glaring one.

It seems like Kope has the beginnings of support for this with, typedef enum kope_metal_vertex_step_mode { KOPE_METAL_VERTEX_STEP_MODE_VERTEX, KOPE_METAL_VERTEX_STEP_MODE_INSTANCE } kope_metal_vertex_step_mode; but the pipeline seems to assume the same thing with no branching in the pipeline to adjust the step function.

@RobDangerous
Copy link
Member

Kope (I now call it Kore 3) only started drawing its first triangle recently, it will be a little while.

@aleph1
Copy link
Contributor Author

aleph1 commented Feb 22, 2025

I think I am a bit closer to a fix for the Metal backend in Kore 2, but I’m struggling with one thing. I don’t think the stride value for the memory layouts are being set correctly. Currently the stride value is set based on the MTLVertexFormat values KINC_G4_VERTEX_DATA_, but it looks like stride has to be set using MemoryLayout, but this appears to be Swift only. Any of the working code samples I can find utilize MemoryLayout for calculating the stride, but I have never done any calling of Swift from C.

Swift code from a working example:

let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = MTLVertexFormat.float3 // the metal pipeline in Kore seems to set this correctly
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride // the metal pipeline in Kore sets this using MTLVertexFormat, which can be different than the value provided by MemoryLayout

I suppose I could just create a swift app to output all of the stride values and create a switch based on the KINC_G4_VERTEX_DATA_ types, but it would be nice to be able to utilize MemoryLayout directly.

If you would prefer me to not update this issue with research, please let me know.

@RobDangerous
Copy link
Member

We really don't need "MemoryLayout" to figure out the size of a float3. It's 12. kinc_g4_vertex_data_size does the right thing (and if you doubt it feel free to compare the values you need to whatever you get from "MemoryLayout".

@aleph1
Copy link
Contributor Author

aleph1 commented Feb 23, 2025

Putting together a small test for this, the value for MemoryLayout<SIMD3>.size is actually 16 not 12. This is also mentioned on page 28 of Apple’s documentation for Metal (screenshot attached). As per the documentation the float3 type uses the same size and alignment as float4.

Image

Metal rendering seems to work as expected in Kore 2 when instancing isn’t applied, and I am doing some experimenting with the pipeline for Metal specifically to see if I can get instanced rendering working as expected.

You can run the following in Swift to see that MemoryLayout<SIMD3>.size does in fact return 16, which is inline with what Apple’s Metal documentation says.

//: A UIKit based Playground for presenting user interface
  
import UIKit
import PlaygroundSupport

class MyViewController : UIViewController {
    override func loadView() {
        
        let view = UIView()
        view.backgroundColor = .white
        
        // vertex descriptor set with MemoryLayout<SIMD3<Float32>>.size vs 12
        //let vertexDescriptor = MTLVertexDescriptor();
        //vertexDescriptor.attributes[0].format = MTLVertexFormat.float3
        //vertexDescriptor.attributes[0].offset = 0
        //vertexDescriptor.attributes[0].bufferIndex = 0;
        //vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float32>>.size;
        //vertexDescriptor.layouts[0].stepRate = 1;
        //vertexDescriptor.layouts[0].stepFunction = MTLVertexStepFunction.perVertex;
        
        let label = UILabel()
        label.frame = CGRect(x: 150, y: 200, width: 200, height: 100)
        label.lineBreakMode = .byWordWrapping
        label.numberOfLines = 3
        label.textColor = .black
        let float_stride = MemoryLayout<Float>.stride * 3;
        let float_size = MemoryLayout<Float>.size * 3
        let stride_size = MemoryLayout<SIMD3<Float32>>.size;
        label.text = "\(float_stride)\n\(float_size)\n\(stride_size)"
        view.addSubview(label)
        
        self.view = view
    }
}
// Present the view controller in the Live View window
PlaygroundPage.current.liveView = MyViewController()

@RobDangerous
Copy link
Member

Good find, I'll trust you on that. Feel free to update kinc_g4_vertex_data_size with an ifdef for Metal.

@aleph1
Copy link
Contributor Author

aleph1 commented Feb 24, 2025

Thanks. I think I have actually identified the main issue with the Metal rendering pipeline in that only the first vertex buffer is used in the kinc_g5_command_list_set_vertex_buffers function in the Metal backend.

void kinc_g5_command_list_set_vertex_buffers(kinc_g5_command_list_t *list, struct kinc_g5_vertex_buffer **buffers, int *offsets, int count) {
kinc_g5_internal_vertex_buffer_set(buffers[0], offsets[0]);
}

If you add kinc_log(KINC_LOG_LEVEL_INFO, "number of buffers %i", count); at the beginning of that function and run the (instanced rendering sample made for Metal)[https://github.com/Kode/Kore-Samples/tree/main/11_instanced_rendering], the logged value will be 2, but only the first is used. It looks like the function should perform similarly to the same function in the Vulkan backend which correctly loops through each index of the buffers and offsets.

https://github.com/Kode/Kore/blob/188e29cad31d623198742292cc2a22d8fac7baa2/Backends/Graphics5/Vulkan/Sources/kinc/backend/graphics5/commandlist.c.h#L465C1-L480C2

I had a go at rewriting this function for the Metal backend using MTLRenderCommandEncoder.setVertexbuffers but I am unsure how to calculate the range value it is expecting from the array of buffers.

This might be of use if you or anyone else wants to finalize the metal implementation.

@RobDangerous
Copy link
Member

range should simply be {0, count}.

@aleph1
Copy link
Contributor Author

aleph1 commented Feb 28, 2025

I think this problem has multiple parts and that I am closer to having it working. These are the things that seem to be causing the issue.

  1. kinc_g5_command_list_set_vertex_buffers current only handles one buffer, but I have made an attempt at rewriting the function which I think it correct, or close to it.
  2. kinc_g5_pipeline_compile needs refactoring, I have started on this, but still having some issues with stride.
  3. The compiled metal shader for the instancing example needs to look a lot different as memory layouts that share per instance and per vertex stepFunctions don't seem to allow a simple use of stage_in in the vertex shader (some sample code below). Note, I think I was wrong about this and managed to fix instancing without changing the shader but will keep the code below just in case.

I believe the kinc_g5_command_list_set_vertex_buffers should be written as follows. It is based on the implementation in the Vulkan backend but utilizes Metal.

void kinc_g5_command_list_set_vertex_buffers(kinc_g5_command_list_t *list, struct kinc_g5_vertex_buffer **vertexBuffers, int *offsets_, int count) {
    id<MTLRenderCommandEncoder> encoder = getMetalEncoder();
    // Array to store Metal buffers and offsets in bytes
    id<MTLBuffer> buffers[count];
    NSUInteger offsets[count];
    
    for (int i = 0; i < count; ++i) {
        buffers[i] = (__bridge id<MTLBuffer>)vertexBuffers[i]->impl.mtlBuffer;
        offsets[i] = (NSUInteger)(offsets_[i] * kinc_g5_vertex_buffer_stride(vertexBuffers[i]));
    }
    
    // Metal allows binding multiple buffers at once with a range of indices
    [encoder setVertexBuffers:buffers offsets:offsets withRange:NSMakeRange(0, count)];
}

I believe the generated shader will end up having to look like this as stage_in does not include per instance variables. I might be able to look at the compiled Vulkan shader to gain some insight as well. Note: I was wrong about this, stage_in seems to be smart enough to pass the per vertex and per instance data as needed.

// shader_vert_main

#include <metal_stdlib>
#include <simd/simd.h>

using namespace metal;

struct shader_vert_main_out
{
    float4 gl_Position [[position]];
};

// Per-vertex data structure
struct VertexData
{
    float3 pos;
};

// Per-instance data structure
struct InstanceData
{
    float3 off;
};

vertex shader_vert_main_out shader_vert_main(
                                             device const VertexData* vertexData [[buffer(0)]],
                                             device const InstanceData* instanceData [[buffer(1)]],
                                             uint vertexID [[vertex_id]],
                                             uint instanceID [[instance_id]]
)
{
    shader_vert_main_out out = {};
    
    // Access data from the appropriate buffers using IDs
    float3 pos = vertexData[vertexID].pos;
    float3 off = instanceData[instanceID].off;
        
    // Use the vertex and instance data
    out.gl_Position = float4(pos.x + off.x, pos.y + off.y, 0.5, 1.0);
    out.gl_Position.z = (out.gl_Position.z + out.gl_Position.w) * 0.5; // Adjust clip-space for Metal
    return out;
}

@aleph1
Copy link
Contributor Author

aleph1 commented Feb 28, 2025

I think I have it working! I was wrong about the requirement of changing the vertex shader format, it looks like stage_in is smart enough to recognize per vertex and per instance attributes. I'll clean up the code changes I've made to the Metal backend and submit them by the weekend at the latest.

The only issue I think I still have to resolve is with kinc_g5_command_list_set_vertex_constant_buffer, as there can be a vertexBuffer index conflict due to atIndex being hardcoded as 1. The Vulkan backend takes a different approach to this function, so I will look closer at that implementation and take the same approach. Does that work for you?

void kinc_g5_command_list_set_vertex_constant_buffer(kinc_g5_command_list_t *list, struct kinc_g5_constant_buffer *buffer, int offset, size_t size) {
	id<MTLBuffer> buf = (__bridge id<MTLBuffer>)buffer->impl._buffer;
	id<MTLRenderCommandEncoder> encoder = getMetalEncoder();
	[encoder setVertexBuffer:buf offset:offset atIndex:1]; // this can conflict with an index that is used by the fixed kinc_g5_command_list_set_vertex_buffers 
}

Screenshot of the latest metal test and some of the updated code:

Image

@RobDangerous
Copy link
Member

Super cool! And no need to hurry.

@aleph1
Copy link
Contributor Author

aleph1 commented Feb 28, 2025

I have moved this into a commit, but require your input on one outstanding issue regarding vertex buffer indexes and the metal code generated by kinc/make. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants