Tengine 使用 ACL 进行部署

编译

参考源码编译（ACL）章节生成部署所需要的以下库文件：

3rdparty/acl/lib/
├── libarm_compute.so
├── libarm_compute_core.so
└── libarm_compute_graph.so

build-acl-arm64/install/lib/
└── libtengine-lite.so

运行

模型格式

ACL 支持直接加载 Float32 tmfile，如果工作在 Float16 推理精度模式下，Tengine 框架将在加载 Float32 tmfile 后自动在线转换为 Float16 数据进行推理。

推理精度设置

ACL 支持 Float32 和 Float16 两种精度模型进行网络模型推理，需要在执行 prerun_graph_multithread(graph_t graph, struct options opt) 之前通过 struct options opt 显式设置推理精度。

Enable GPU FP32 mode

/* set runtime options */
struct options opt;
opt.num_thread = num_thread;
opt.cluster = TENGINE_CLUSTER_ALL;
opt.precision = TENGINE_MODE_FP32;
opt.affinity = 0;

Enable GPU FP16 mode

/* set runtime options */
struct options opt;
opt.num_thread = num_thread;
opt.cluster = TENGINE_CLUSTER_ALL;
opt.precision = TENGINE_MODE_FP16;
opt.affinity = 0;

后端硬件绑定

在加载模型前，需要显式指定 ACL 硬件后端 context，并在调用 graph_t create_graph(context_t context, const char* model_format, const char* fname, ...) 时传入该参数。

/* create arm acl backend */
acl_context = create_context("acl", 1);
add_context_device(acl_context, "ACL");

/* create graph, load tengine model xxx.tmfile */
create_graph(acl_context, "tengine", model_file);

参考 Demo

源码请参考 tm_classification_acl.c

执行结果

[root@localhost tengine-lite]# ./tm_mssd_acl -m mssd.tmfile -i ssd_dog.jpg -t 1 -r 10
start to run register cpu allocator
start to run register acl allocator
tengine-lite library version: 1.0-dev
run into gpu by acl
Repeat 10 times, thread 2, avg time 82.32 ms, max_time 135.70 ms, min_time 74.10 ms
--------------------------------------
detect result num: 3 
dog     :99.8%
BOX:( 138 , 209 ),( 324 , 541 )
car     :99.7%
BOX:( 467 , 72 ),( 687 , 171 )
bicycle :99.6%
BOX:( 106 , 141 ),( 574 , 415 )
======================================
[DETECTED IMAGE SAVED]:
======================================