Dynamically flattening struct trees: Generalised indexing scheme for 1D memory layout?
I'm implementing a system of structs, or rather concepts. Concepts define certain groupings of data. They look like structs, but they do not define memory layout, i.e. the ordering of members. Instead they provide field sizes (either as primitives or by reference to other concepts already defined), and hints as to the type of indexing needed for each array in each struct in the tree. These hints point to indexer functions which decide the memory layout (same indexers used on read and write) within an adequately sized buffer. The idea is: By changing the indexer, we change the layout, without touching existing user code. Type-information Setup I define my list of concepts. Where concept's members are non-primitive, we pass a reference to another concept-type that is already defined. By referencing each other, these form a tree. This total tree size is calculated (after padding, alignment, bitpacking) and allocated in a contiguous buffer of bytes / 32-bit integers that can be shared across threads. I extract a set of unique types from the struct tree(s), process them in descending depth order in order to: (a) calculate shallower / dependent struct sizes, (b) align bitfields into a primitives-only struct header, (c) pad to word and cache line widths and (d) align all fields (header and arrays). Once the tree is ready, I flatten all "struct" type's offsets, bit-lengths and access bit-widths into a single, large, contiguous map of values, for rapid runtime indexing. Runtime Data Access Data access works, but it's messy - there's a lot of indexing when we need to drill down. So for example, for r = 0..numRegions for i = 0..numIsles for i = 0..numIsles isOpenVal = world.regions[r].isles[i].links[l].isOpen becomes (simplified pseudocode, ignore member sizes and offsets accessible inside *Idx functions) worldIdx = 0 //assume world is at start of heap for r = 0..numRegions regionsIdx = worldIdx + membIdx(regions) regionIdx = regionsIdx + elemIdx(world, regions, r) for i = 0..numIsles islesIdx = regionIdx + membIdx(regions) isleIdx = islesIdx + elemIdx(region, isles, i) for l = 0..numLinks linksIdx = isleIdx, membIdx(links) linkIdx = linksIdx + elemIdx(isle, links, l) isOpenIdx = linkIdx, membIdx(isOpen) isOpenVal = heap[isOpen] So, for each level of data access, we have a couple of calls. We cache the total index at each step so we can reuse that part of the index, later. The challenge It's hard to abstract indexing. Because we're calculating successive indexing offsets and summing them to a final index, this lays the data out in interleaved fashion, when we write. I want instead to abstract to any layout, not just interleaved. I want to be able to flatten the tree partially or completely (to a series of flat arrays of numbers and nothing more). (Partial tree-flattening would be e.g. flattening links into its container isle or region, or right down to the root of the heap, which here is represented by world, the single struct that contains all other structs.) It should be possible to use different indexers at different levels, combined with interleaving, e.g. regions might be indexed linearly (flat); isles may be indexed using the Morton ordering / Z-order curve for spatial performance... again, these are defined in the concepts tree. But then how to generalise user code indexers? Thoughts There exist plenty of non-interleaved indexing schemes for which we do not need to successively calculate offsets this way; in fact, this successive calculation is an optimisation on my part to avoid constantly recalculating the total offset to a given nested element like isle above; instead we calculate parts like regionIdx and then share that between all isles within that region (I do not know if C compilers also do this, under the hood). What I do see is that interleaving is the most generalised of all approaches, maybe that is why C defined it as it did. At worst, the user can flatten their struct tree manually, without the C compiler's generalised struct layout and access needing to change; when you want your data mostly flat, you can put multiple arrays of primitives into a single very large struct. The problem with that is maintenance; if you decide to change the overall structure used by your app... a common issue these days as we move data around between CPU and GPU compatible formats, never mind optimising for each. The point of my approach is that I don't want any compiler (or my own compiler-like code) deciding the layout. I want the indexers to do so, without my having to touch either my concepts or my existing user code. I want to be able to generate numerous different configurations and test their runtime performance to find the best for a given dataset. How can I best generalise my indexers?
I'm implementing a system of struct
s, or rather concepts. Concepts define certain groupings of data. They look like struct
s, but they do not define memory layout, i.e. the ordering of members. Instead they provide field sizes (either as primitives or by reference to other concepts already defined), and hints as to the type of indexing needed for each array in each struct in the tree. These hints point to indexer functions which decide the memory layout (same indexers used on read and write) within an adequately sized buffer.
The idea is: By changing the indexer, we change the layout, without touching existing user code.
Type-information Setup
I define my list of concepts. Where concept's members are non-primitive, we pass a reference to another concept-type that is already defined. By referencing each other, these form a tree. This total tree size is calculated (after padding, alignment, bitpacking) and allocated in a contiguous buffer of bytes / 32-bit integers that can be shared across threads.
I extract a set of unique types from the struct tree(s), process them in descending depth order in order to: (a) calculate shallower / dependent struct sizes, (b) align bitfields into a primitives-only struct header, (c) pad to word and cache line widths and (d) align all fields (header and arrays).
Once the tree is ready, I flatten all "struct" type's offsets, bit-lengths and access bit-widths into a single, large, contiguous map of values, for rapid runtime indexing.
Runtime Data Access
Data access works, but it's messy - there's a lot of indexing when we need to drill down. So for example,
for r = 0..numRegions
for i = 0..numIsles
for i = 0..numIsles
isOpenVal = world.regions[r].isles[i].links[l].isOpen
becomes (simplified pseudocode, ignore member sizes and offsets accessible inside *Idx
functions)
worldIdx = 0 //assume world is at start of heap
for r = 0..numRegions
regionsIdx = worldIdx + membIdx(regions)
regionIdx = regionsIdx + elemIdx(world, regions, r)
for i = 0..numIsles
islesIdx = regionIdx + membIdx(regions)
isleIdx = islesIdx + elemIdx(region, isles, i)
for l = 0..numLinks
linksIdx = isleIdx, membIdx(links)
linkIdx = linksIdx + elemIdx(isle, links, l)
isOpenIdx = linkIdx, membIdx(isOpen)
isOpenVal = heap[isOpen]
So, for each level of data access, we have a couple of calls. We cache the total index at each step so we can reuse that part of the index, later.
The challenge
It's hard to abstract indexing. Because we're calculating successive indexing offsets and summing them to a final index, this lays the data out in interleaved fashion, when we write.
I want instead to abstract to any layout, not just interleaved. I want to be able to flatten the tree partially or completely (to a series of flat arrays of numbers and nothing more).
(Partial tree-flattening would be e.g. flattening links
into its container isle
or region
, or right down to the root of the heap, which here is represented by world
, the single struct that contains all other structs.)
It should be possible to use different indexers at different levels, combined with interleaving, e.g. regions might be indexed linearly (flat); isles may be indexed using the Morton ordering / Z-order curve for spatial performance... again, these are defined in the concepts tree.
But then how to generalise user code indexers?
Thoughts
There exist plenty of non-interleaved indexing schemes for which we do not need to successively calculate offsets this way; in fact, this successive calculation is an optimisation on my part to avoid constantly recalculating the total offset to a given nested element like isle
above; instead we calculate parts like regionIdx
and then share that between all isles within that region (I do not know if C compilers also do this, under the hood).
What I do see is that interleaving is the most generalised of all approaches, maybe that is why C defined it as it did. At worst, the user can flatten their struct tree manually, without the C compiler's generalised struct layout and access needing to change; when you want your data mostly flat, you can put multiple arrays of primitives into a single very large struct. The problem with that is maintenance; if you decide to change the overall structure used by your app... a common issue these days as we move data around between CPU and GPU compatible formats, never mind optimising for each.
The point of my approach is that I don't want any compiler (or my own compiler-like code) deciding the layout. I want the indexers to do so, without my having to touch either my concepts or my existing user code. I want to be able to generate numerous different configurations and test their runtime performance to find the best for a given dataset.
How can I best generalise my indexers?