Tengine 使用 TensorRT 进行部署

编译

运行

模型格式

TensorRT 支持加载 Float32 tmfile，如果工作在 Float16 推理精度模式下，Tengine 框架将在加载 Float32 tmfile 后自动在线转换为 Float16 数据进行推理。

推理精度设置

TensorRT 支持 Float32 、 Float16 、 Int8 三种精度模型进行网络模型推理，需要在执行 prerun_graph_multithread(graph_t graph, struct options opt) 之前通过 struct options opt 显式设置推理精度。

Enable GPU FP32 mode

/* set runtime options */
struct options opt;
opt.num_thread = num_thread;
opt.cluster = TENGINE_CLUSTER_ALL;
opt.precision = TENGINE_MODE_FP32;
opt.affinity = 0;

Enable GPU FP16 mode

/* set runtime options */
struct options opt;
opt.num_thread = num_thread;
opt.cluster = TENGINE_CLUSTER_ALL;
opt.precision = TENGINE_MODE_FP16;
opt.affinity = 0;

Enable GPU Int8 mode

/* set runtime options */
struct options opt;
opt.num_thread = num_thread;
opt.cluster = TENGINE_CLUSTER_ALL;
opt.precision = TENGINE_MODE_INT8;
opt.affinity = 0;

后端硬件绑定

在加载模型前，需要显式指定 TensorRT 硬件后端 context，并在调用 graph_t create_graph(context_t context, const char* model_format, const char* fname, ...) 时传入该参数。

/* create NVIDIA TensorRT backend */
context_t trt_context = create_context("trt", 1);
add_context_device(trt_context, "TRT");

/* create graph, load tengine model xxx.tmfile */
create_graph(trt_context, "tengine", model_file);

参考 Demo

源码请参考 tm_classification_trt.cpp

执行结果

nvidia@xaiver:~/tengine-lite-tq/build-linux-trt$ ./tm_classification_trt -m mobilenet_v1.tmfile -i cat.jpg -g 224,224 -s 0.017,0.017,0.017 -w 104.007,116.669,122.679 -r 10
Tengine plugin allocator TRT is registered.
tengine-lite library version: 1.2-dev
Tengine: Try using inference precision TF32 failed, rollback.

model file : /home/nvidia/tengine-test/models/mobilenet_v1.tmfile
image file : /home/nvidia/tengine-test/images/cat.jpg
img_h, img_w, scale[3], mean[3] : 224 224 , 0.017 0.017 0.017, 104.0 116.7 122.7
Repeat 1 times, thread 1, avg time 2.10 ms, max_time 3.10 ms, min_time 2.03 ms
--------------------------------------
8.574147, 282
7.880117, 277
7.812574, 278
7.286457, 263
6.357487, 281
--------------------------------------