-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Universal Inference API #183
Comments
char input_buffer[512]; MyModel model; //generated printf(“The inference result is: %d”, result[0]); |
I think the metadata memory allocator should be fixed in size at model construction, but I am OK with the data scratchpad being on the heap. Might look something like this: |
template<size_t MetaDataSize=2048>
class MyModel {
private:
FixedTensorArenaAllocator<MetaDataSize> defaultMetaDataAllocator;
DynamicTensorArenaAllocator defaultTensorDataAllocator;
...
}; |
We should be able to update the following draft to the re-arch without problem. template<size_t MetaDataSize=2048>
class MyModel {
private:
//FixedTensorArenaAllocator<MetaDataSize> defaultMetaDataAllocator;
//DynamicTensorArenaAllocator defaultTensorDataAllocator;
Context& ctx;
public:
//auto generated
struct {
Tensor* tensor0 = nullptr;
Tensor* tensor1 = nullptr;
Tensor* tensor2 = nullptr;
} tensors;
void run(void);
};
template<classtype T>
void copy_tensor<T>(S_TENSOR& tensor_src, S_TENSOR& tensor_dst) {
for(size_t i = 0; i < tensor_src.getSize(); i++) {
tensor_dst->write<T>(0, i) = *(tensor_src->read<T>(0, i));
}
}
//auto generated
void MyModel::run(void) {
//allocator to re-use the space in input tensors -> allow modify
//and output tensors
get_deep_mlp_ctx(ctx, tensors.tensor0, tensors.tensor1);
ctx.eval();
S_TENSOR result = ctx.get("tensor2");
//copy the tensor out, as application should own the output memory
copy_tensor(result, tensors.tensor2);
ctx.gc();
}
// Example
char input_buffer[512];
ExampleTensorObject* input_tensor_obj; //a class with Tensor interface
int result[1];
MyModel model; //generated
model.tensors.tensor0 = RamTensor({10, 10}), i8);
model.tensors.tensor1 = WrappedRamTensor({10, 10}), input_buffer, i8);
model.tensors.tensor2 = RamTensor({10, 10}), result, i32); //output
model.run();
print(result[0]);
//do something with input_buffer
model.run();
print(result[0]); @mbartling Thoughts? |
Just as an FYI my brain is totally dedicated to the rearch right now so I might be misreading your concerns. The primary issue here is where do the meta-data allocator and RAM data allocators live, or if they are separate entities at all.
This is the job of the model class, either at construction or at model run.
Honestly I am in favor of the user requesting references to input/output lists contained by the model itself. This way we are less prone to dealing with the user providing invalid input tensors. I imagine input tensors would be a fixed type of tensor specialization (or tensor handle) that can provide some compile time guarantees. |
Abstract
Individual frameworks such as uTensor and TFLM have their own sets of on-device APIs. In some cases, significant boilerplate code and framework-specific knowledge are required to implement an inference task at its simplest form. A developer-friendly universal high-level inference API will be valuable for on-device ML.
On-device inferencing is generalized into these steps:
The code snippets below aim to illustrate the current API designs for uTensor and Tensorflow. The newly proposed API will likely utilize the code-generation technology to create an adaptor layer between the universal interface and the underlying framework-specific APIs.
Examples:
uTensor:
TFLM:
Please refer to this hello-world example
Requirements
The newly proposed API should have high-level abstraction aims to accelerate, simplify application development, and, helps to streamline the edge-ML deployment flow, especially for resource-constrained devices.
The new API should:
Proposals
This is at its most minimal. The generated
bind
method names corresponding to the tensor names in the graph. The method’s signatures reflect their respective tensor-data-types. Additional methods can be implemented to support advanced configurations.What’s Next
This issue serves as a starting point for this discussion. It will be reviewed by uTensor core-devs, Arduino, ISG data scientists, IPG engineers, and Google. We are be particular interested in reviewing use-cases which the current proposed API cannot cover. We are looking to reiterate and converge on a design in the next weeks.
The text was updated successfully, but these errors were encountered: