-
Matthew Wilcox authored
This differs slightly from the IDR equivalent in five ways. 1. It can allocate up to UINT_MAX instead of being limited to INT_MAX, like xa_alloc(). Also like xa_alloc(), it will write to the 'id' pointer before placing the entry in the XArray. 2. The 'next' cursor is allocated separately from the XArray instead of being part of the IDR. This saves memory for all the users which do not use the cyclic allocation API and suits some users better. 3. It returns -EBUSY instead of -ENOSPC. 4. It will attempt to wrap back to the minimum value on memory allocation failure as well as on an -EBUSY error, assuming that a user would rather allocate a small ID than suffer an ID allocation failure. 5. It reports whether it has wrapped, which is important to some users. Signed-off-by:
Matthew Wilcox <willy@infradead.org>
Matthew Wilcox authoredThis differs slightly from the IDR equivalent in five ways. 1. It can allocate up to UINT_MAX instead of being limited to INT_MAX, like xa_alloc(). Also like xa_alloc(), it will write to the 'id' pointer before placing the entry in the XArray. 2. The 'next' cursor is allocated separately from the XArray instead of being part of the IDR. This saves memory for all the users which do not use the cyclic allocation API and suits some users better. 3. It returns -EBUSY instead of -ENOSPC. 4. It will attempt to wrap back to the minimum value on memory allocation failure as well as on an -EBUSY error, assuming that a user would rather allocate a small ID than suffer an ID allocation failure. 5. It reports whether it has wrapped, which is important to some users. Signed-off-by:
Matthew Wilcox <willy@infradead.org>
XArray
Author: | Matthew Wilcox |
---|
Overview
The XArray is an abstract data type which behaves like a very large array of pointers. It meets many of the same needs as a hash or a conventional resizable array. Unlike a hash, it allows you to sensibly go to the next or previous entry in a cache-efficient manner. In contrast to a resizable array, there is no need to copy data or change MMU mappings in order to grow the array. It is more memory-efficient, parallelisable and cache friendly than a doubly-linked list. It takes advantage of RCU to perform lookups without locking.
The XArray implementation is efficient when the indices used are densely
clustered; hashing the object and using the hash as the index will not
perform well. The XArray is optimised for small indices, but still has
good performance with large indices. If your index can be larger than
ULONG_MAX
then the XArray is not the data type for you. The most
important user of the XArray is the page cache.
Each non-NULL
entry in the array has three bits associated with
it called marks. Each mark may be set or cleared independently of
the others. You can iterate over entries which are marked.
Normal pointers may be stored in the XArray directly. They must be 4-byte aligned, which is true for any pointer returned from :c:func:`kmalloc` and :c:func:`alloc_page`. It isn't true for arbitrary user-space pointers, nor for function pointers. You can store pointers to statically allocated objects, as long as those objects have an alignment of at least 4.
You can also store integers between 0 and LONG_MAX
in the XArray.
You must first convert it into an entry using :c:func:`xa_mk_value`.
When you retrieve an entry from the XArray, you can check whether it is
a value entry by calling :c:func:`xa_is_value`, and convert it back to
an integer by calling :c:func:`xa_to_value`.
Some users want to store tagged pointers instead of using the marks described above. They can call :c:func:`xa_tag_pointer` to create an entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve the tag of an entry. Tagged pointers use the same bits that are used to distinguish value entries from normal pointers, so each user must decide whether they want to store value entries or tagged pointers in any particular XArray.
The XArray does not support storing :c:func:`IS_ERR` pointers as some conflict with value entries or internal entries.
An unusual feature of the XArray is the ability to create entries which
occupy a range of indices. Once stored to, looking up any index in
the range will return the same entry as looking up any other index in
the range. Setting a mark on one index will set it on all of them.
Storing to any index will store to all of them. Multi-index entries can
be explicitly split into smaller entries, or storing NULL
into any
entry will cause the XArray to forget about the range.
Normal API
Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
for statically allocated XArrays or :c:func:`xa_init` for dynamically
allocated ones. A freshly-initialised XArray contains a NULL
pointer at every index.
You can then set entries using :c:func:`xa_store` and get entries
using :c:func:`xa_load`. xa_store will overwrite any entry with the
new entry and return the previous entry stored at that index. You can
use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a
NULL
entry. There is no difference between an entry that has never
been stored to, one that has been erased and one that has most recently
had NULL
stored to it.
You can conditionally replace an entry at an index by using :c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if the entry at that index has the 'old' value. It also returns the entry which was at that index; if it returns the same entry which was passed as 'old', then :c:func:`xa_cmpxchg` succeeded.
If you want to only store a new entry to an index if the current entry
at that index is NULL
, you can use :c:func:`xa_insert` which
returns -EBUSY
if the entry is not empty.
You can enquire whether a mark is set on an entry by using
:c:func:`xa_get_mark`. If the entry is not NULL
, you can set a mark
on it by using :c:func:`xa_set_mark` and remove the mark from an entry by
calling :c:func:`xa_clear_mark`. You can ask whether any entry in the
XArray has a particular mark set by calling :c:func:`xa_marked`.
You can copy entries out of the XArray into a plain array by calling :c:func:`xa_extract`. Or you can iterate over the present entries in the XArray by calling :c:func:`xa_for_each`. You may prefer to use :c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present entry in the XArray.
Calling :c:func:`xa_store_range` stores the same entry in a range of indices. If you do this, some of the other operations will behave in a slightly odd way. For example, marking the entry at one index may result in the entry being marked at some, but not all of the other indices. Storing into one index may result in the entry retrieved by some, but not all of the other indices changing.
Sometimes you need to ensure that a subsequent call to :c:func:`xa_store`
will not need to allocate memory. The :c:func:`xa_reserve` function
will store a reserved entry at the indicated index. Users of the
normal API will see this entry as containing NULL
. If you do
not need to use the reserved entry, you can call :c:func:`xa_release`
to remove the unused entry. If another user has stored to the entry
in the meantime, :c:func:`xa_release` will do nothing; if instead you
want the entry to become NULL
, you should use :c:func:`xa_erase`.
Using :c:func:`xa_insert` on a reserved entry will fail.
If all entries in the array are NULL
, the :c:func:`xa_empty` function
will return true
.
Finally, you can remove all entries from an XArray by calling :c:func:`xa_destroy`. If the XArray entries are pointers, you may wish to free the entries first. You can do this by iterating over all present entries in the XArray using the :c:func:`xa_for_each` iterator.
Allocating XArrays
If you use :c:func:`DEFINE_XARRAY_ALLOC` to define the XArray, or
initialise it by passing XA_FLAGS_ALLOC
to :c:func:`xa_init_flags`,
the XArray changes to track whether entries are in use or not.
You can call :c:func:`xa_alloc` to store the entry at an unused index in the XArray. If you need to modify the array from interrupt context, you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable interrupts while allocating the ID.
Using :c:func:`xa_store`, :c:func:`xa_cmpxchg` or :c:func:`xa_insert` will
also mark the entry as being allocated. Unlike a normal XArray, storing
NULL
will mark the entry as being in use, like :c:func:`xa_reserve`.
To free an entry, use :c:func:`xa_erase` (or :c:func:`xa_release` if
you only want to free the entry if it's NULL
).
By default, the lowest free entry is allocated starting from 0. If you
want to allocate entries starting at 1, it is more efficient to use
:c:func:`DEFINE_XARRAY_ALLOC1` or XA_FLAGS_ALLOC1
. If you want to
allocate IDs up to a maximum, then wrap back around to the lowest free
ID, you can use :c:func:`xa_alloc_cyclic`.
You cannot use XA_MARK_0
with an allocating XArray as this mark
is used to track whether an entry is free or not. The other marks are
available for your use.
Memory allocation
The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`, :c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t parameter in case the XArray needs to allocate memory to store this entry. If the entry is being deleted, no memory allocation needs to be performed, and the GFP flags specified will be ignored.
It is possible for no memory to be allocatable, particularly if you pass a restrictive set of GFP flags. In that case, the functions return a special value which can be turned into an errno using :c:func:`xa_err`. If you don't need to know exactly which error occurred, using :c:func:`xa_is_err` is slightly more efficient.
Locking
When using the Normal API, you do not have to worry about locking. The XArray uses RCU and an internal spinlock to synchronise access:
- No lock needed:
-
- :c:func:`xa_empty`
- :c:func:`xa_marked`
- Takes RCU read lock:
-
- :c:func:`xa_load`
- :c:func:`xa_for_each`
- :c:func:`xa_find`
- :c:func:`xa_find_after`
- :c:func:`xa_extract`
- :c:func:`xa_get_mark`
- Takes xa_lock internally:
-
- :c:func:`xa_store`
- :c:func:`xa_store_bh`
- :c:func:`xa_store_irq`
- :c:func:`xa_insert`
- :c:func:`xa_insert_bh`
- :c:func:`xa_insert_irq`
- :c:func:`xa_erase`
- :c:func:`xa_erase_bh`
- :c:func:`xa_erase_irq`
- :c:func:`xa_cmpxchg`
- :c:func:`xa_cmpxchg_bh`
- :c:func:`xa_cmpxchg_irq`
- :c:func:`xa_store_range`
- :c:func:`xa_alloc`
- :c:func:`xa_alloc_bh`
- :c:func:`xa_alloc_irq`
- :c:func:`xa_reserve`
- :c:func:`xa_reserve_bh`
- :c:func:`xa_reserve_irq`
- :c:func:`xa_destroy`
- :c:func:`xa_set_mark`
- :c:func:`xa_clear_mark`
- Assumes xa_lock held on entry:
-
- :c:func:`__xa_store`
- :c:func:`__xa_insert`
- :c:func:`__xa_erase`
- :c:func:`__xa_cmpxchg`
- :c:func:`__xa_alloc`
- :c:func:`__xa_reserve`
- :c:func:`__xa_set_mark`
- :c:func:`__xa_clear_mark`
If you want to take advantage of the lock to protect the data structures that you are storing in the XArray, you can call :c:func:`xa_lock` before calling :c:func:`xa_load`, then take a reference count on the object you have found before calling :c:func:`xa_unlock`. This will prevent stores from removing the object from the array between looking up the object and incrementing the refcount. You can also use RCU to avoid dereferencing freed memory, but an explanation of that is beyond the scope of this document.
The XArray does not disable interrupts or softirqs while modifying the array. It is safe to read the XArray from interrupt or softirq context as the RCU lock provides enough protection.
If, for example, you want to store entries in the XArray in process context and then erase them in softirq context, you can do that this way:
void foo_init(struct foo *foo)
{
xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
}
int foo_store(struct foo *foo, unsigned long index, void *entry)
{
int err;
xa_lock_bh(&foo->array);
err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
if (!err)
foo->count++;
xa_unlock_bh(&foo->array);
return err;
}
/* foo_erase() is only called from softirq context */
void foo_erase(struct foo *foo, unsigned long index)
{
xa_lock(&foo->array);
__xa_erase(&foo->array, index);
foo->count--;
xa_unlock(&foo->array);
}
If you are going to modify the XArray from interrupt or softirq context,
you need to initialise the array using :c:func:`xa_init_flags`, passing
XA_FLAGS_LOCK_IRQ
or XA_FLAGS_LOCK_BH
.
The above example also shows a common pattern of wanting to extend the coverage of the xa_lock on the store side to protect some statistics associated with the array.
Sharing the XArray with interrupt context is also possible, either using :c:func:`xa_lock_irqsave` in both the interrupt handler and process context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock` in the interrupt handler. Some of the more common patterns have helper functions such as :c:func:`xa_store_bh`, :c:func:`xa_store_irq`, :c:func:`xa_erase_bh`, :c:func:`xa_erase_irq`, :c:func:`xa_cmpxchg_bh` and :c:func:`xa_cmpxchg_irq`.
Sometimes you need to protect access to the XArray with a mutex because that lock sits above another mutex in the locking hierarchy. That does not entitle you to use functions like :c:func:`__xa_erase` without taking the xa_lock; the xa_lock is used for lockdep validation and will be used for other purposes in the future.
The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also available for situations where you look up an entry and want to atomically set or clear a mark. It may be more efficient to use the advanced API in this case, as it will save you from walking the tree twice.
Advanced API
The advanced API offers more flexibility and better performance at the cost of an interface which can be harder to use and has fewer safeguards. No locking is done for you by the advanced API, and you are required to use the xa_lock while modifying the array. You can choose whether to use the xa_lock or the RCU lock while doing read-only operations on the array. You can mix advanced and normal operations on the same array; indeed the normal API is implemented in terms of the advanced API. The advanced API is only available to modules with a GPL-compatible license.
The advanced API is based around the xa_state. This is an opaque data structure which you declare on the stack using the :c:func:`XA_STATE` macro. This macro initialises the xa_state ready to start walking around the XArray. It is used as a cursor to maintain the position in the XArray and let you compose various operations together without having to restart from the top every time.
The xa_state is also used to store errors. You can call
:c:func:`xas_error` to retrieve the error. All operations check whether
the xa_state is in an error state before proceeding, so there's no need
for you to check for an error after each call; you can make multiple
calls in succession and only check at a convenient point. The only
errors currently generated by the XArray code itself are ENOMEM
and
EINVAL
, but it supports arbitrary errors in case you want to call
:c:func:`xas_set_err` yourself.
If the xa_state is holding an ENOMEM
error, calling :c:func:`xas_nomem`
will attempt to allocate more memory using the specified gfp flags and
cache it in the xa_state for the next attempt. The idea is that you take
the xa_lock, attempt the operation and drop the lock. The operation
attempts to allocate memory while holding the lock, but it is more
likely to fail. Once you have dropped the lock, :c:func:`xas_nomem`
can try harder to allocate more memory. It will return true
if it
is worth retrying the operation (i.e. that there was a memory error and
more memory was allocated). If it has previously allocated memory, and
that memory wasn't used, and there is no error (or some error that isn't
ENOMEM
), then it will free the memory previously allocated.
Internal Entries
The XArray reserves some entries for its own purposes. These are never
exposed through the normal API, but when using the advanced API, it's
possible to see them. Usually the best way to handle them is to pass them
to :c:func:`xas_retry`, and retry the operation if it returns true
.
Other internal entries may be added in the future. As far as possible, they will be handled by :c:func:`xas_retry`.
Additional functionality
The :c:func:`xas_create_range` function allocates all the necessary memory to store every entry in a range. It will set ENOMEM in the xa_state if it cannot allocate memory.
You can use :c:func:`xas_init_marks` to reset the marks on an entry
to their default state. This is usually all marks clear, unless the
XArray is marked with XA_FLAGS_TRACK_FREE
, in which case mark 0 is set
and all other marks are clear. Replacing one entry with another using
:c:func:`xas_store` will not reset the marks on that entry; if you want
the marks reset, you should do that explicitly.
The :c:func:`xas_load` will walk the xa_state as close to the entry as it can. If you know the xa_state has already been walked to the entry and need to check that the entry hasn't changed, you can use :c:func:`xas_reload` to save a function call.
If you need to move to a different index in the XArray, call :c:func:`xas_set`. This resets the cursor to the top of the tree, which will generally make the next operation walk the cursor to the desired spot in the tree. If you want to move to the next or previous index, call :c:func:`xas_next` or :c:func:`xas_prev`. Setting the index does not walk the cursor around the array so does not require a lock to be held, while moving to the next or previous index does.
You can search for the next present entry using :c:func:`xas_find`. This is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`; if the cursor has been walked to an entry, then it will find the next entry after the one currently referenced. If not, it will return the entry at the index of the xa_state. Using :c:func:`xas_next_entry` to move to the next present entry instead of :c:func:`xas_find` will save a function call in the majority of cases at the expense of emitting more inline code.
The :c:func:`xas_find_marked` function is similar. If the xa_state has not been walked, it will return the entry at the index of the xa_state, if it is marked. Otherwise, it will return the first marked entry after the entry referenced by the xa_state. The :c:func:`xas_next_marked` function is the equivalent of :c:func:`xas_next_entry`.
When iterating over a range of the XArray using :c:func:`xas_for_each`
or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop
the iteration. The :c:func:`xas_pause` function exists for this purpose.
After you have done the necessary work and wish to resume, the xa_state
is in an appropriate state to continue the iteration after the entry
you last processed. If you have interrupts disabled while iterating,
then it is good manners to pause the iteration and reenable interrupts
every XA_CHECK_SCHED
entries.
The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and :c:func:`xas_clear_mark` functions require the xa_state cursor to have been moved to the appropriate location in the xarray; they will do nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set` immediately before.
You can call :c:func:`xas_set_update` to have a callback function called each time the XArray updates a node. This is used by the page cache workingset code to maintain its list of nodes which contain only shadow entries.
Multi-Index Entries
The XArray has the ability to tie multiple indices together so that operations on one index affect all indices. For example, storing into any index will change the value of the entry retrieved from any index. Setting or clearing a mark on any index will set or clear the mark on every index that is tied together. The current implementation only allows tying ranges which are aligned powers of two together; eg indices 64-127 may be tied together, but 2-6 may not be. This may save substantial quantities of memory; for example tying 512 entries together will save over 4kB.
You can create a multi-index entry by using :c:func:`XA_STATE_ORDER`
or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`.
Calling :c:func:`xas_load` with a multi-index xa_state will walk the
xa_state to the right location in the tree, but the return value is not
meaningful, potentially being an internal entry or NULL
even when there
is an entry stored within the range. Calling :c:func:`xas_find_conflict`
will return the first entry within the range or NULL
if there are no
entries in the range. The :c:func:`xas_for_each_conflict` iterator will
iterate over every entry which overlaps the specified range.
If :c:func:`xas_load` encounters a multi-index entry, the xa_index in the xa_state will not be changed. When iterating over an XArray or calling :c:func:`xas_find`, if the initial index is in the middle of a multi-index entry, it will not be altered. Subsequent calls or iterations will move the index to the first index in the range. Each entry will only be returned once, no matter how many indices it occupies.
Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state is not supported. Using either of these functions on a multi-index entry will reveal sibling entries; these should be skipped over by the caller.
Storing NULL
into any index of a multi-index entry will set the entry
at every index to NULL
and dissolve the tie. Splitting a multi-index
entry into entries occupying smaller ranges is not yet supported.