This paper attempts to address and reconcile two different issues: the existence of multiple numerical data formats (such as int8, bfloat16, fp8, etc., often non optimal for the application and not directly compatible with one another) and the necessity to reduce their bandwidth requirements, especially in the case of power hungry and slow DRAM.
Bit Layer Multiplier Accumulator (BLMAC) is an efficient method to perform dot products without multiplications that exploits the bit level sparsity of the weights. A total of 1,980,000 low, high, band pass and band stop type I FIR filters were generated by systematically sweeping through the cut off frequencies and by varying the number of taps from 55 to 255.
This paper discusses the hardware implementation of an encoder and a decoder for the QOI lossless RGB image format. Some results are given for the AMD/Xilinx architecture, reaching over 800 Mpixels/s in Ultrascale+ and more than 4K@30 in Artix-7. A compression improvement up to ~15% is also observed when natural images are scanned along a Hilbert curve instead of the normal raster order.