In "Star Trek" memory allocation is fast. Not so much in the real world...
In the grand sceheme of things, dynamic memory allocation is slow. If you need your code to be lightning fast, you should avoid dynamic memory allocation in your inner loops. In this article, I'll explain why that is and give you a handy "frame allocator" that's suitable for use in your projects.
Why is Dynamic Memory Allocation Slow?
Dynamic memory allocation is an interesting problem to solve. There's a plethora of factors to balance when designing and implementing such a system. There are literally tens of solutions to this problem. Each one striking their relative balance between the relevant issues: performance, fragmentation, locality of reference, debuggability, etc.
In short, a memory allocator needs to track all of the allocations and free space in your heap. When a new block of memory is requested, the allocator needs to find an existing free block with enough space and carve some bytes out of it. This can involve rather nasty linear searches through your free space. This isn't a problem for an application that does small amount of allocation. But for a program with millions of allocations, this can be a real issue!
Some heap managers are more clever than this linear search implementation. But they give tradeoffs on fragmentation and per-allocation waste. Depending on your platform, your default malloc / new implementation may be a source of nasty performance hits.
Sometimes You Can't Avoid Allocation
There are several places in my code where dynamic memory allocation in a tight loop in unavoidable. This generally happens in situations where the amount of memory required to process some data is variable or undefined at compile time. One relevant example is code that generates temporary streams of data used for per-frame rendering. Another is preparation of message data or formatting strings to be used and then tossed away. These cases sadly require dynamic memory allocation. And that won't do!
Enter The Frame Allocator
As I stated earlier, there are several possible solutions for this kind of problem. In those cases where I can't avoid making temporary memory allocations, I rely on the "frame allocator" to get the job done. This simple allocator is given a fixed block of RAM and provides constant-time allocation / deallocation of data. It's called a frame allocator because its allocations are only allowed to persist for a single game frame before being freed.
The system is really simple to understand. Here's the basic implementation:
- Allocate a fixed block of RAM based on how much you expect to need in a worst-case simulation frame. Let's call that the scratchpad.
- Maintain a next block pointer that is incremented as memory is allocated. It starts out pointing at the beginning of the scratchpad.
- When memory is allocated, it's a simple matter of returning the next block and advancing it by the size of your allocation.
- At the end of each frame, rewind the next block pointer to the beinning of the scratchpad.
Implementation
Implementing a frame allocator is pretty straightforward. I generally add some nice helper functions to allocate objects using placement new and save / restore allocator state. The latter is nice because it allows you to get more use out of your intra-frame allocations. There are cases where a single function does a bunch of frame allocations and wants to free them before returning. Very handy.
Consider the following implementation:
//
// Simple class to manage a block of memory suitable for rapid intra-frame allocations.
//
class FrameAllocator
{
public:
// maximum count of pointers that can be saved with ::Save
static const int MaxSavedPointer = 16;
protected:
// pointer to the beginning of our scratchpad
void *_ScratchpadStart;
// pointer to the end of our scratchpad
void *_ScratchpadEnd;
// pointer to our next allocation
void *_NextAllocation;
// pointer to our stack of saved pointers
void *_SavedPointers[MaxSavedPointer];
int _NumSavedPointers;
public:
FrameAllocator()
: _ScratchpadStart(NULL)
, _ScratchpadEnd(NULL)
, _NextAllocation(NULL)
, _NumSavedPointers(0)
{
}
~FrameAllocator()
{
if(_ScratchpadStart)
{
free(_ScratchpadStart);
_ScratchpadStart = NULL;
_ScratchpadEnd = NULL;
}
}
void Init(size_t Size)
{
assert(_ScratchpadStart == _ScratchpadEnd && _ScratchpadStart == _NextAllocation && !_NumSavedPointers);
_ScratchpadStart = realloc(_ScratchpadStart, Size);
_ScratchpadEnd = ((char *)_ScratchpadStart) + Size;
_NextAllocation = _ScratchpadStart;
}
//
// Call to allocate "NumBytes" from the allocator. Will return NULL if there's not enough
// space left in the scratchpad.
//
inline void *Allocate(size_t NumBytes)
{
void *NewNextAllocation = ((char *)_NextAllocation) + NumBytes;
// see if this allocation puts us beyond the limit of the scratchpad
if(NewNextAllocation > _ScratchpadEnd)
return NULL;
// allocation is simple, just return the next allocation pointer!
void *Result = _NextAllocation;
_NextAllocation = NewNextAllocation;
return Result;
}
//
// Call to allocate a new structure from the scratchpad. Constructor is invoked!
//
template<typename T>
T *Allocate()
{
void *Storage = Allocate(sizeof(T));
return new(Storage) T();
}
//
// Call each frame to reset this allocator for use.
//
inline void Reset()
{
_NextAllocation = _ScratchpadStart;
}
//
// Return the current next allocation pointer.
//
inline void *GetNextAllocation()
{
return _NextAllocation;
}
//
// Saves the current state of the next allocation pointer and returns it.
//
// Useful if you have limited space in the frame allocator and can want to release
// temporary allocations before the end of the frame.
//
inline void *Save()
{
assert(_NumSavedPointers < MaxSavedPointer);
_SavedPointers[_NumSavedPointers++] = _NextAllocation;
return _NextAllocation;
}
//
// Restores the last saved allocation pointer.
//
inline void Restore()
{
assert(_NumSavedPointers > 0);
_NumSavedPointers--;
_NextAllocation = _SavedPointers[_NumSavedPointers];
}
};
This implementation is suitable for your production coding needs! Enjoy it!
Alignment Issues
So long as you're allocating blocks of memory based on your optimally packed structures, alignment shouldn't be an issue. Compilers already pack structures for proper alignment. So, sizeof(SomeStruct) returns a size suitable for alignment. However, if you're allocating random chunks of RAM, you may need to ensure your alignment is appropriate for your platform. I'll leave that as an exercise for the reader.
Source Code
Click here to get a copy of the source for this allocator. It's released under the MIT license. Enjoy!
Yep, handy.
Another handy use of this is if you pipeline your game loop - ie in frame N, doing simulation for frame N, and rendering for frame N-1. If you can afford the extra frame of latency. In that case you'll have two of these and persist the memory for two frames.
Even more handy than this if you have a real producer/consumer pair of systems - ring buffers :)
Posted by: MikeNicolella | 09/16/2012 at 05:57 PM
Also, this is *almost* production ready, if you remove those offensive leading underscores ;)
I know everyone scoffs at it, but there's the whole "identifiers with leading underscore + uppercase letter are reserved by by implementation"
Posted by: MikeNicolella | 09/16/2012 at 05:59 PM
LOL! Offensive leading underscores, eh? ;)
Posted by: Stephen Nichols | 09/16/2012 at 07:23 PM