Buddy memory allocation

The buddy memory allocation technique is a memory allocation algorithm that divides memory into partitions to try to satisfy a memory request as suitably as possible. This system makes use of splitting memory into halves to try to give a best fit. According to Donald Knuth, the buddy system was invented in 1963 by Harry Markowitz, and was first described by Kenneth C. Knowlton (published 1965).^[1] The Buddy memory allocation is relatively easy to implement. It supports limited but efficient splitting and coalescing of memory blocks.

Algorithm edit

There are various forms of the buddy system; those in which each block is subdivided into two smaller blocks are the simplest and most common variety. Every memory block in this system has an order, where the order is an integer ranging from 0 to a specified upper limit. The size of a block of order n is proportional to 2ⁿ, so that the blocks are exactly twice the size of blocks that are one order lower. Power-of-two block sizes make address computation simple, because all buddies are aligned on memory address boundaries that are powers of two. When a larger block is split, it is divided into two smaller blocks, and each smaller block becomes a unique buddy to the other. A split block can only be merged with its unique buddy block, which then reforms the larger block they were split from.

Starting off, the size of the smallest possible block is determined, i.e. the smallest memory block that can be allocated. If no lower limit existed at all (e.g., bit-sized allocations were possible), there would be a lot of memory and computational overhead for the system to keep track of which parts of the memory are allocated and unallocated. However, a rather low limit may be desirable, so that the average memory waste per allocation (concerning allocations that are, in size, not multiples of the smallest block) is minimized. Typically the lower limit would be small enough to minimize the average wasted space per allocation, but large enough to avoid excessive overhead. The smallest block size is then taken as the size of an order-0 block, so that all higher orders are expressed as power-of-two multiples of this size.

The programmer then has to decide on, or to write code to obtain, the highest possible order that can fit in the remaining available memory space. Since the total available memory in a given computer system may not be a power-of-two multiple of the minimum block size, the largest block size may not span the entire memory of the system. For instance, if the system had 2000 K of physical memory and the order-0 block size was 4 K, the upper limit on the order would be 8, since an order-8 block (256 order-0 blocks, 1024 K) is the biggest block that will fit in memory. Consequently, it is impossible to allocate the entire physical memory in a single chunk; the remaining 976 K of memory would have to be allocated in smaller blocks.

Example edit

The following is an example of what happens when a program makes requests for memory. Assume that in this system, the smallest possible block is 64 kilobytes in size, and the upper limit for the order is 4, which results in a largest possible allocatable block, 2⁴ times 64 K = 1024 K in size. The following shows a possible state of the system after various memory requests.

Step	64 K	64 K	64 K	64 K	64 K	64 K
1	2⁴
2.1	2³					2³
2.2	2²			2²		2³
2.3	2¹		2¹	2²		2³
2.4	2⁰	2⁰	2¹	2²		2³
2.5	A: 2⁰	2⁰	2¹	2²		2³
3	A: 2⁰	2⁰	B: 2¹	2²		2³
4	A: 2⁰	C: 2⁰	B: 2¹	2²		2³
5.1	A: 2⁰	C: 2⁰	B: 2¹	2¹	2¹	2³
5.2	A: 2⁰	C: 2⁰	B: 2¹	D: 2¹	2¹	2³
6	A: 2⁰	C: 2⁰	2¹	D: 2¹	2¹	2³
7.1	A: 2⁰	C: 2⁰	2¹	2¹	2¹	2³
7.2	A: 2⁰	C: 2⁰	2¹	2²		2³
8	2⁰	C: 2⁰	2¹	2²		2³
9.1	2⁰	2⁰	2¹	2²		2³
9.2	2¹		2¹	2²		2³
9.3	2²			2²		2³
9.4	2³					2³
9.5	2⁴

This allocation could have occurred in the following manner

The initial situation.
Program A requests memory 34 K, order 0.
1. No order 0 blocks are available, so an order 4 block is split, creating two order 3 blocks.
2. Still no order 0 blocks available, so the first order 3 block is split, creating two order 2 blocks.
3. Still no order 0 blocks available, so the first order 2 block is split, creating two order 1 blocks.
4. Still no order 0 blocks available, so the first order 1 block is split, creating two order 0 blocks.
5. Now an order 0 block is available, so it is allocated to A.
Program B requests memory 66 K, order 1. An order 1 block is available, so it is allocated to B.
Program C requests memory 35 K, order 0. An order 0 block is available, so it is allocated to C.
Program D requests memory 67 K, order 1.
1. No order 1 blocks are available, so an order 2 block is split, creating two order 1 blocks.
2. Now an order 1 block is available, so it is allocated to D.
Program B releases its memory, freeing one order 1 block.
Program D releases its memory.
1. One order 1 block is freed.
2. Since the buddy block of the newly freed block is also free, the two are merged into one order 2 block.
Program A releases its memory, freeing one order 0 block.
Program C releases its memory.
1. One order 0 block is freed.
2. Since the buddy block of the newly freed block is also free, the two are merged into one order 1 block.
3. Since the buddy block of the newly formed order 1 block is also free, the two are merged into one order 2 block.
4. Since the buddy block of the newly formed order 2 block is also free, the two are merged into one order 3 block.
5. Since the buddy block of the newly formed order 3 block is also free, the two are merged into one order 4 block.

As you can see, what happens when a memory request is made is as follows:

If memory is to be allocated

Look for a memory slot of a suitable size (the minimal 2^k block that is larger or equal to that of the requested memory)
1. If it is found, it is allocated to the program
2. If not, it tries to make a suitable memory slot. The system does so by trying the following:
  1. Split a free memory slot larger than the requested memory size into half
  2. If the lower limit is reached, then allocate that amount of memory
  3. Go back to step 1 (look for a memory slot of a suitable size)
  4. Repeat this process until a suitable memory slot is found

If memory is to be freed

Free the block of memory
Look at the neighboring block – is it free too?
If it is, combine the two, and go back to step 2 and repeat this process until either the upper limit is reached (all memory is freed), or until a non-free neighbour block is encountered

Implementation and efficiency edit

In comparison to other simpler techniques such as dynamic allocation, the buddy memory system has little external fragmentation, and allows for compaction of memory with little overhead. The buddy method of freeing memory is fast, with the maximal number of compactions required equal to O(highest order) = O(log₂(total memory size)). Typically the buddy memory allocation system is implemented with the use of a binary tree to represent used or unused split memory blocks. The address of a block's "buddy" is equal to the bitwise exclusive OR (XOR) of the block's address and the block's size.

However, there still exists the problem of internal fragmentation – memory wasted because the memory requested is a little larger than a small block, but a lot smaller than a large block. Because of the way the buddy memory allocation technique works, a program that requests 66 K of memory would be allocated 128 K, which results in a waste of 62 K of memory. This problem can be solved by slab allocation, which may be layered on top of the more coarse buddy allocator to provide more fine-grained allocation.

One version of the buddy allocation algorithm was described in detail by Donald Knuth in volume 1 of The Art of Computer Programming.^[2] The Linux kernel also uses the buddy system, with further modifications to minimise external fragmentation, along with various other allocators to manage the memory within blocks.^[3]

jemalloc^[4] is a modern memory allocator that employs, among others, the buddy technique.

References edit

^ Kenneth C. Knowlton. A Fast storage allocator. Communications of the ACM 8(10):623–625, Oct 1965. also Kenneth C Knowlton. A programmer's description of L6. Communications of the ACM, 9(8):616–625, Aug. 1966 [see also : Google books [1] page 85]
^ Knuth, Donald (1997). Fundamental Algorithms. The Art of Computer Programming. Vol. 1 (Second ed.). Reading, Massachusetts: Addison-Wesley. pp. 435–455. ISBN 0-201-89683-4.
^ Mauerer, Wolfgang (October 2008). Professional Linux Kernel Architecture. Wrox Press. ISBN 978-0-470-34343-2.
^ Evans, Jason (16 April 2006), A Scalable Concurrent malloc(3) Implementation for FreeBSD (PDF), pp. 4–5

[1] Kenneth C. Knowlton. A Fast storage allocator. Communications of the ACM 8(10):623–625, Oct 1965. also Kenneth C Knowlton. A programmer's description of L6. Communications of the ACM, 9(8):616–625, Aug. 1966 [see also : Google books [1] page 85]

[2] Knuth, Donald (1997). Fundamental Algorithms. The Art of Computer Programming. Vol. 1 (Second ed.). Reading, Massachusetts: Addison-Wesley. pp. 435–455. ISBN 0-201-89683-4.

[3] Mauerer, Wolfgang (October 2008). Professional Linux Kernel Architecture. Wrox Press. ISBN 978-0-470-34343-2.

[4] Evans, Jason (16 April 2006), A Scalable Concurrent malloc(3) Implementation for FreeBSD (PDF), pp. 4–5

[1]

[2]

[3]

[4]

Buddy memory allocation

Contents

Algorithm edit

Example edit

Implementation and efficiency edit

See also edit

References edit