Generalised indexing scheme (in user code) for variable memory layouts?
I'm implementing a system of structs, or rather concepts. Concepts define certain groupings of data. They look like structs, but they do not define memory layout, i.e. the ordering of members. Instead they provide field sizes (either as primitives or by reference to other concepts already defined), and hints as to the type of indexing needed for each array in each struct in the tree. These hints point to indexer functions which decide the memory layout (same indexers used on read and write) within an adequately sized buffer. The idea is: By changing the indexer, we change the layout, without touching existing user code. Type-information Setup Define the list of concepts. Where a concept's member is non-primitive, we pass a reference to another concept-type that is already defined. By referencing each other, these form a tree. Extract a set of unique types from the struct tree(s), process them in descending depth order in order to: calculate deeper concept / struct sizes first, so that shallower concepts / structs can have their sizes calculated thereafter, align bitfields into a primitives-only struct header, pad to word and cache line widths and align all fields (header and arrays). Calculate total tree size (after padding, alignment, bitpacking) and allocated a contiguous buffer of bytes / 32-bit integers for this. extract and flatten all "struct" type's offsets, bit-lengths and bit-widths into a single, large, contiguous map of values, for fast runtime indexing. Runtime Data Access Data access works, but it's messy - heavy indexing occurs when we need to drill down, e.g. for r = 0..numRegions for i = 0..numIsles for l = 0..numIsles isOpenVal = world.regions[r].isles[i].links[l].isOpen becomes (member sizes, offsets, array factors are accessible inside *Idx functions) worldIdx = 0 //assume world is at start of heap for r = 0..numRegions regionsIdx = worldIdx + membIdx(regions) regionIdx = regionsIdx + elemIdx(world, regions, r) for i = 0..numIsles islesIdx = regionIdx + membIdx(regions) isleIdx = islesIdx + elemIdx(region, isles, i) for l = 0..numLinks linksIdx = isleIdx, membIdx(links) linkIdx = linksIdx + elemIdx(isle, links, l) isOpenIdx = linkIdx, membIdx(isOpen) isOpenVal = heap[isOpen] So, for each level of data access, we have a couple of calls. We cache the partial index in each loop depth so we can reuse it later. The challenge It's hard to abstract indexing. Because we're calculating successive indexing offsets and summing them to a final index, this lays the data out in interleaved fashion, when we write. I want instead to abstract to any layout, not just interleaved. I want to be able to flatten the tree partially or completely (to a series of flat arrays of numbers and nothing more). (Partial tree-flattening would be e.g. flattening links into its container isle or region; full flattening would be pushing this right down to the tree root or base of the heap, which here is represented by world, the single struct that contains all other structs.) It should be possible to use different indexers at different levels, combined with interleaving, e.g. regions might be indexed linearly (flat); isles may be indexed using the Morton ordering / Z-order curve for spatial performance... again, these are defined in the concepts tree. But then how to generalise user code indexers? Thoughts There exist plenty of non-interleaved indexing schemes for which we do not need to successively calculate offsets this way; in fact, this successive calculation is an optimisation on my part to avoid constantly recalculating the total offset to a given nested element like isle above; instead we calculate parts like regionIdx and then share that between all isles within that region (I don't know whether C compilers also do this kind of partial index caching in generated assembly). Interleaving however seems to be the most generalised of all approaches, maybe that is why C uses this approach. At worst, the user can flatten their struct tree manually, without the C compiler's generalised struct layout and access needing to change; when you want your data mostly flat, you can put multiple arrays of primitives into a single very large struct. The problem with that is maintenance; if you decide to change the overall structure used by your app... which is a common issue these days, as we move data around between CPU and GPU compatible formats, never mind optimising for each. For this reason, perhaps I need to continue to use this interleaved style of access, even if, under the hood, the indexers do not necessarily lay the data out like that. That's the tough part to figure out. The point of my approach is that I don't want any compiler (or my own compiler-like code) deciding the layout. I want the indexers to do so, without my having to touch either my concepts or my existing user code
I'm implementing a system of struct
s, or rather concepts. Concepts define certain groupings of data. They look like struct
s, but they do not define memory layout, i.e. the ordering of members. Instead they provide field sizes (either as primitives or by reference to other concepts already defined), and hints as to the type of indexing needed for each array in each struct in the tree. These hints point to indexer functions which decide the memory layout (same indexers used on read and write) within an adequately sized buffer.
The idea is: By changing the indexer, we change the layout, without touching existing user code.
Type-information Setup
Define the list of concepts. Where a concept's member is non-primitive, we pass a reference to another concept-type that is already defined. By referencing each other, these form a tree.
Extract a set of unique types from the struct tree(s), process them in descending depth order in order to:
- calculate deeper concept / struct sizes first, so that shallower concepts / structs can have their sizes calculated thereafter,
- align bitfields into a primitives-only struct header,
- pad to word and cache line widths and
- align all fields (header and arrays).
Calculate total tree size (after padding, alignment, bitpacking) and allocated a contiguous buffer of bytes / 32-bit integers for this.
extract and flatten all "struct" type's offsets, bit-lengths and bit-widths into a single, large, contiguous map of values, for fast runtime indexing.
Runtime Data Access
Data access works, but it's messy - heavy indexing occurs when we need to drill down, e.g.
for r = 0..numRegions
for i = 0..numIsles
for l = 0..numIsles
isOpenVal = world.regions[r].isles[i].links[l].isOpen
becomes (member sizes, offsets, array factors are accessible inside *Idx
functions)
worldIdx = 0 //assume world is at start of heap
for r = 0..numRegions
regionsIdx = worldIdx + membIdx(regions)
regionIdx = regionsIdx + elemIdx(world, regions, r)
for i = 0..numIsles
islesIdx = regionIdx + membIdx(regions)
isleIdx = islesIdx + elemIdx(region, isles, i)
for l = 0..numLinks
linksIdx = isleIdx, membIdx(links)
linkIdx = linksIdx + elemIdx(isle, links, l)
isOpenIdx = linkIdx, membIdx(isOpen)
isOpenVal = heap[isOpen]
So, for each level of data access, we have a couple of calls. We cache the partial index in each loop depth so we can reuse it later.
The challenge
It's hard to abstract indexing. Because we're calculating successive indexing offsets and summing them to a final index, this lays the data out in interleaved fashion, when we write.
I want instead to abstract to any layout, not just interleaved. I want to be able to flatten the tree partially or completely (to a series of flat arrays of numbers and nothing more).
(Partial tree-flattening would be e.g. flattening links
into its container isle
or region
; full flattening would be pushing this right down to the tree root or base of the heap, which here is represented by world
, the single struct that contains all other structs.)
It should be possible to use different indexers at different levels, combined with interleaving, e.g. regions might be indexed linearly (flat); isles may be indexed using the Morton ordering / Z-order curve for spatial performance... again, these are defined in the concepts tree.
But then how to generalise user code indexers?
Thoughts
There exist plenty of non-interleaved indexing schemes for which we do not need to successively calculate offsets this way; in fact, this successive calculation is an optimisation on my part to avoid constantly recalculating the total offset to a given nested element like isle
above; instead we calculate parts like regionIdx
and then share that between all isles within that region (I don't know whether C compilers also do this kind of partial index caching in generated assembly).
Interleaving however seems to be the most generalised of all approaches, maybe that is why C uses this approach. At worst, the user can flatten their struct tree manually, without the C compiler's generalised struct layout and access needing to change; when you want your data mostly flat, you can put multiple arrays of primitives into a single very large struct.
The problem with that is maintenance; if you decide to change the overall structure used by your app... which is a common issue these days, as we move data around between CPU and GPU compatible formats, never mind optimising for each.
For this reason, perhaps I need to continue to use this interleaved style of access, even if, under the hood, the indexers do not necessarily lay the data out like that. That's the tough part to figure out.
The point of my approach is that I don't want any compiler (or my own compiler-like code) deciding the layout. I want the indexers to do so, without my having to touch either my concepts or my existing user code. I want to be able to generate numerous different configurations and test their runtime performance to find the best for a given dataset.
How can I generalise indexing in user code, such that it can work with any possible indexing scheme at every level of the tree? Is there a framework for considering this problem?