Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal Inference API #183

Open
neil-tan opened this issue Oct 3, 2019 · 5 comments
Open

Universal Inference API #183

neil-tan opened this issue Oct 3, 2019 · 5 comments
Assignees

Comments

@neil-tan
Copy link
Member

neil-tan commented Oct 3, 2019

Abstract
Individual frameworks such as uTensor and TFLM have their own sets of on-device APIs. In some cases, significant boilerplate code and framework-specific knowledge are required to implement an inference task at its simplest form. A developer-friendly universal high-level inference API will be valuable for on-device ML.

On-device inferencing is generalized into these steps:

  • Configuration (optional)
  • Setting up the input
  • Evaluating the model
  • Reading the output

The code snippets below aim to illustrate the current API designs for uTensor and Tensorflow. The newly proposed API will likely utilize the code-generation technology to create an adaptor layer between the universal interface and the underlying framework-specific APIs.

Examples:

uTensor:

  Context ctx;  //creating the context class, the stage where inferences take place 
  //wrapping the input data in a tensor class
  Tensor* input_x = new WrappedRamTensor<float>({1, 784}, (float*) input_data);
  get_deep_mlp_ctx(ctx, input_x);  // pass the tensor to the context
  S_TENSOR pred_tensor = ctx.get("y_pred:0");  // getting a reference to the output tensor
  ctx.eval(); //trigger the inference

TFLM:
Please refer to this hello-world example

Requirements

The newly proposed API should have high-level abstraction aims to accelerate, simplify application development, and, helps to streamline the edge-ML deployment flow, especially for resource-constrained devices.

The new API should:

  • Framework/tool and platform-independent
  • Encapsulate/abstract framework-specific boilerplate code
  • Provide a clear interface that enables collaboration between data scientists and embedded engineers
  • Prioritize developer-experience and simplicity

Proposals

  1. Single-function-call inferencing, by @janjongboom ,
uint8_t utensor_mem_pool[4096]; // <-- CLI should tell me how much I need

utensor_something_autogenerated_init(utensor_mem_pool);

float input[33] = { 1,2,3,4 ... }
float output[5];

utensor_run_something_autogenerated(input, 33, output, 5);
  1. Model object, discussion with @sandeepmistry, @mbartling and @neil-tan
char input_buffer[512];
int result[1];

MyModel model; //generated
model.setArenaSize(1024);
model.bind_input0(input_buffer, input_buffer_size);
model.bind_prediction0(result, 1);
model.run();

printf(“The inference result is: %d”, result[0]);

This is at its most minimal. The generated bind method names corresponding to the tensor names in the graph. The method’s signatures reflect their respective tensor-data-types. Additional methods can be implemented to support advanced configurations.

What’s Next

This issue serves as a starting point for this discussion. It will be reviewed by uTensor core-devs, Arduino, ISG data scientists, IPG engineers, and Google. We are be particular interested in reviewing use-cases which the current proposed API cannot cover. We are looking to reiterate and converge on a design in the next weeks.

@neil-tan neil-tan self-assigned this Oct 3, 2019
@neil-tan
Copy link
Member Author

neil-tan commented Nov 6, 2019

char input_buffer[512];
ExampleTensorObject* input_tensor_obj;
int result[1];

MyModel model; //generated
model.setArenaSize(1024);
model.bind_input0(input_buffer, shape, type);
model.bind_input1(input_tensor_obj);
model.bind_prediction0(result, 1);
model.run();

printf(“The inference result is: %d”, result[0]);

@mbartling
Copy link
Member

I think the metadata memory allocator should be fixed in size at model construction, but I am OK with the data scratchpad being on the heap.

Might look something like this:
MyModel<MetaDataSize> model;
model.setTensorDataMemSize(ScratchPadSize);

@mbartling
Copy link
Member

template<size_t MetaDataSize=2048> 
class MyModel {
private:
  FixedTensorArenaAllocator<MetaDataSize> defaultMetaDataAllocator;
  DynamicTensorArenaAllocator defaultTensorDataAllocator;
...
};

@neil-tan
Copy link
Member Author

neil-tan commented Jan 3, 2020

We should be able to update the following draft to the re-arch without problem.

template<size_t MetaDataSize=2048> 
class MyModel {
private:
  //FixedTensorArenaAllocator<MetaDataSize> defaultMetaDataAllocator;
  //DynamicTensorArenaAllocator defaultTensorDataAllocator;

  Context& ctx;

public:

  //auto generated
  struct {
    Tensor* tensor0 = nullptr;
    Tensor* tensor1 = nullptr;
    Tensor* tensor2 = nullptr;
  } tensors;

  void run(void);

};

template<classtype T>
void copy_tensor<T>(S_TENSOR& tensor_src, S_TENSOR& tensor_dst) {
  for(size_t i = 0; i < tensor_src.getSize(); i++) {
    tensor_dst->write<T>(0, i) = *(tensor_src->read<T>(0, i));
  }
}

//auto generated
void MyModel::run(void) {
    //allocator to re-use the space in input tensors -> allow modify
    //and output tensors
    get_deep_mlp_ctx(ctx, tensors.tensor0, tensors.tensor1);

    ctx.eval();

    S_TENSOR result = ctx.get("tensor2");
    //copy the tensor out, as application should own the output memory

    copy_tensor(result, tensors.tensor2);

    ctx.gc();
  }



// Example

char input_buffer[512];
ExampleTensorObject* input_tensor_obj; //a class with Tensor interface
int result[1];

MyModel model; //generated
model.tensors.tensor0 = RamTensor({10, 10}), i8);
model.tensors.tensor1 = WrappedRamTensor({10, 10}), input_buffer, i8);
model.tensors.tensor2 = RamTensor({10, 10}), result, i32);  //output
model.run();

print(result[0]);

//do something with input_buffer
model.run();
print(result[0]);

@mbartling Thoughts?
One issue I have is that tensors cannot be created before the model, unless we want to explicitly instantize the context and allocators.
And, what would be a good way to keep the input/output tensors alive? Maybe create a utility tensor-factory class for initializing the context and alloc classes? The purpose of the tensor-factory is mainly for syntax sugaring, making things more approachable for the hobbyist communities.

@dboyliao visibility for code-gen
@Knight-X

@mbartling
Copy link
Member

mbartling commented Jan 3, 2020

Just as an FYI my brain is totally dedicated to the rearch right now so I might be misreading your concerns.

The primary issue here is where do the meta-data allocator and RAM data allocators live, or if they are separate entities at all.

Maybe create a utility tensor-factory class for initializing the context and alloc classes?

This is the job of the model class, either at construction or at model run.

And, what would be a good way to keep the input/output tensors alive?

Honestly I am in favor of the user requesting references to input/output lists contained by the model itself. This way we are less prone to dealing with the user providing invalid input tensors. I imagine input tensors would be a fixed type of tensor specialization (or tensor handle) that can provide some compile time guarantees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants