Industry Expert Blogs Making the most of Arm NN for GPU inference: FP16 and FastMath arm Blogs - Roberto Lopez Mendez, Arm Jan. 27, 2021 Most operations in deep learning involve massive amounts of data, but simple control logic. As a parallel processor, GPUs are very suitable for this type of task. Current high-end mobile GPUs can provide substantial throughput thanks to having hundreds of Arithmetic Logic Units (ALUs). In fact, GPUs are built with a single purpose – parallel data processing, initially for 3D graphics and later for more general parallel computing. Additionally, GPUs are energy-efficient processors. Nowadays, the number of operations per watt (TOPs/W) is used to evaluate the energy efficiency of mobile processors and embedded devices. GPUs have higher TOPs/W, due to the relatively simple control unit and lower working frequency.