SPRAAK
|
Provide a set of basic & advanced atomic operations. More...
Data Structures | |
struct | SprAtomicSlot |
Functions | |
void | spr_atomic_nop_MBwrite (void) |
void | spr_atomic_nop_MBread (void) |
void | spr_atomic_nop_MBfull (void) |
Impose a full (read+write) memory barrier. More... | |
uint32_t | spr_atomic_get32_MBacquire (volatile uint32_t *ptr) |
uint32_t | spr_atomic_twinget32_MBnone (uint32_t *ptr, uint32_t *val2) |
uint64_t | spr_atomic_twingetc48_18_MBnone (uint64_t *ptr, uint64_t *val2) |
void | spr_atomic_set32_MBrelease (volatile uint32_t *ptr, uint32_t value) |
void | spr_atomic_twinset32_MBrelease (volatile uint32_t *ptr, uint32_t value1, uint32_t value2) |
void | spr_atomic_inc32_MBnone (volatile uint32_t *ptr) |
uint32_t | spr_atomic_dec32_MBnone (volatile uint32_t *ptr) |
uint32_t | spr_atomic_add32_MBnone (volatile uint32_t *ptr, uint32_t value) |
uint32_t | spr_atomic_or32_MBnone (volatile uint32_t *ptr, uint32_t value) |
uint32_t | spr_atomic_and32_MBnone (volatile uint32_t *ptr, uint32_t value) |
uint32_t | spr_atomic_swap32_MBnone (volatile uint32_t *ptr, uint32_t value) |
int | spr_atomic_cas32_MBnone (volatile uint32_t *ptr, uint32_t tst_value, uint32_t new_value) |
int | spr_atomic_twincas32_MBnone (volatile uint32_t *ptr, uint32_t tst_value1, uint32_t tst_value2, uint32_t new_value1, uint32_t new_value2) |
int | spr_atomic_bitset32_MBnone (volatile uint32_t *ptr, int bit_ndx) |
int | spr_atomic_bitclr32_MBnone (volatile uint32_t *ptr, int bit_ndx) |
int | spr_atomic_slot_acquire_nowait (SprAtomicSlot *slot) |
int | spr_atomic_slot_acquire_wait (SprAtomicSlot *slot) |
int | spr_atomic_slot_acquire0_nowait (SprAtomicSlot *slot) |
int | spr_atomic_slot_acquire0_wait (SprAtomicSlot *slot) |
void | spr_atomic_slot_signal_ready (SprAtomicSlot *slot, int ndx) |
int | spr_atomic_slot_signal_ready_acquire0_nowait (SprAtomicSlot *slot, int ndx) |
void | spr_atomic_slot_release (SprAtomicSlot *slot, int ndx) |
void | spr_atomic_slot_release0 (SprAtomicSlot *slot) |
Release slot 0. More... | |
uint32_t | spr_atomic_slot_release_nocheck (SprAtomicSlot *slot, int ndx) |
void | spr_atomic_slot_release0_check (SprAtomicSlot *slot) |
int | spr_atomic_slot_release_check0 (SprAtomicSlot *slot, int ndx) |
int | spr_atomic_slot_acquire_all (SprAtomicSlot *slot) |
void | spr_atomic_slot_release_all (SprAtomicSlot *slot) |
Provide a set of basic & advanced atomic operations.
The basic atomic operations are provided as a set of macro's. We opted to go with macro's for the basic atomic operations instead of (inline) functions for the following two reasons:
The more advanced atomic(-like) operations use (inline) function calls.
The name of every basic atomic operation macro follows the following naming convention:
<op> is the operation and can be any of the following:
<sz> is the operand size and can be any of the following:
All quantities are assumed to be unsigned. Casting to signed (or pointer to signed) will give the desired result.
<mfence> indicates the type of memory synchronisation (barrier) enforced by the atomic operation, and can be any of the following:
In order the better specify the effect of the memory barriers, lets define six distinct sets of operations:
Then the different memory barriers impose the following ordering:
full
memory barrier may look like a safe option, but on certain computer architectures this also has a huge impact on performance.The advanced atomic operations provided by SPRAAK are:
When making multi-threaded applications, two main techniques are used to deal with the problems resulting from concurrent access to shared resources (mostly memory).
The most common (and easiest) way is the use of mutexes (which uses locks, read-write locks, semaphores and conditions to get everything synchronized). Mutexes avoid race conditions by serializing the operations around the critical part of the code (usually when shared data is accessed), i.e. only one thread can execute the protected part of the code. This serialization is also the biggest down-side of mutexes. If the shared data access cannot be isolated in a small and localized fraction of the code, the threads may end-up waiting on each other the better part of the time, hence nullifying any possible speedup when working with multiple threads.
The alternative to mutexes is the use of lock-free algorithms. Such lock-free algorithms use atomic operations to provide coherent access to the shared data (e.g. concurrently accessible queues and hash-tables). Unfortunately, the standard libraries do not yet provide such lock-free algorithms (although there exist proprietary libraries). Furthermore, not all data structures are well suited for full lock-free access (e.g. double linked lists). Another (and possibly the biggest) down-side of concurrent algorithms is that they are extremely hard to develop (e.g. bugs are virtually untraceable and correctness proofs are extremely difficult). Only with better hardware support (multiple outstanding transactions in a single atomic operation) lock-free programming will become an option for the non-experts.
SPRAAK provides the atomic operations that allow the creation of concurrent data structures and algorithms. However, given the above mentioned problems, they are used only when absolutley crucial. In most situation, mutexes are preferred.
In order to guarantee the causality of a program, memory access inside a single CPU-core (read 'thread') will appear to happen in program order. I.e. the compiler or the CPU may reorder memory access but only if this will not affect the outcome of the program. However, to the outside world –this includes any other CPU-cores (read 'threads') accessing the same memory pool– there are no guarantees whatsoever w.r.t. memory access. This is mainly done for efficiency reasons. For example, a core may keep data in its cache and not write it to memory, or it may read data well before its time (pre-fetching), or it may arbitrarily rearrange reads and writes (fusing and/or grouping according to memory bank). In order to still allow concurrency, special instructions have been added to all modern CPU's to ensure that interaction with the outside world happens in an orderly fashion. These memory barriers impose a perceived partial ordering between the memory operations specified on either side of the barrier on all other CPU's of the system. This for example guarantees that all elements in a new list element are initialized and exported to the out-side world before the new list element is added to the list using an atomic operation.
Not all atomic operations are (or even can be) implemented for all platforms.
Developing concurrent algorithms, choosing the correct memory barrier, and to a certain extent every piece of code that needs atomic operations is difficult and bugs will be extremely difficult to trace. So avoid them unless absolutely necessary.