Atomics in AArch64 : vimarsana.com

Atomics in AArch64


CPU fun
Introduction
In this post we’ll look at the performance of a simple atomic operation on a couple of Arm® AArch64 machines. In particular we’ll show the improvement that comes from using the simple, single-instruction, atomics in the Arm V8.1a architecture in preference to the more general Load-Locked, Store-Conditional (LL-SC) implementation in the earlier architectures. The improved performance of the newer architecture was mentioned in a tweet, so as I already had a benchmark for this for “The Book”, re-running those benchmarks and writing this up seemed worthwhile.
The Problem
Atomics
In a parallel program there are occasions when different threads need to update shared state in a safe way. At a high level that can be achieved using locks and critical sections. However, that just pushes the problem down a level since the locks themselves must be implemented. That leads us (and hardware architects!) to realise that the hardware must provide instructions which can guarantee that an update to a location can be made without interference from another logicalCPU1 sharing the same address-space.2

Related Keywords

, Instruction Set Computer , Compiler Explorer ,

© 2025 Vimarsana