Interleaved memory

In computing, interleaved memory is a design which compensates for the relatively slow speed of dynamic random-access memory (DRAM) or core memory, by spreading memory addresses evenly across memory banks. That way, contiguous memory reads and writes use each memory bank in turn, resulting in higher memory throughput due to reduced waiting for memory banks to become ready for the operations.

It is different from multi-channel memory architectures, primarily as interleaved memory does not add more channels between the main memory and the memory controller. However, channel interleaving is also possible, for example in freescale i.MX6 processors, which allow interleaving to be done between two channels.^{[citation needed]}

Overview edit

With interleaved memory, memory addresses are allocated to each memory bank in turn. For example, in an interleaved system with two memory banks (assuming word-addressable memory), if logical address 32 belongs to bank 0, then logical address 33 would belong to bank 1, logical address 34 would belong to bank 0, and so on. An interleaved memory is said to be n-way interleaved when there are $n$ banks and memory location $i$ resides in bank $i mod n$ .

Memory interleaving example with 4 banks. Red banks are refreshing and can't be used.

Interleaved memory results in contiguous reads (which are common both in multimedia and execution of programs) and contiguous writes (which are used frequently when filling storage or communication buffers) actually using each memory bank in turn, instead of using the same one repeatedly. This results in significantly higher memory throughput as each bank has a minimum waiting time between reads and writes.

Interleaved DRAM edit

Main memory (random-access memory, RAM) is usually composed of a collection of DRAM memory chips, where a number of chips can be grouped together to form a memory bank. It is then possible, with a memory controller that supports interleaving, to lay out these memory banks so that the memory banks will be interleaved.

Data in DRAM is stored in units of pages. Each DRAM bank has a row buffer that serves as a cache for accessing any page in the bank. Before a page in the DRAM bank is read, it is first loaded into the row-buffer. If the page is immediately read from the row-buffer (or a row-buffer hit), it has the shortest memory access latency in one memory cycle. If it is a row buffer miss, which is also called a row-buffer conflict, it is slower because the new page has to be loaded into the row-buffer before it is read. Row-buffer misses happen as access requests on different memory pages in the same bank are serviced. A row-buffer conflict incurs a substantial delay for a memory access. In contrast, memory accesses to different banks can proceed in parallel with a high throughput.

The issue of row-buffer conflicts has been well studied with an effective solution.^[1] The size of a row-buffer is normally the size of a memory page managed by the operating system. Row-buffer conflicts or misses come from a sequence of accesses to difference pages in the same memory bank. The study^[1] shows that a conventional memory interleaving method would propagate address-mapping conflicts at a cache level to the memory address space, causing row-buffer misses in a memory bank. The permutation-based interleaved memory method solved the problem with a trivial microarchitecture cost.^[1] Sun Microsystems adopted this the permutation interleaving method quickly in their products.^[2] This patent-free method can be found in many commercial microprocessors, such as AMD, Intel and NVIDIA, for embedded systems, laptops, desktops, and enterprise servers.^[3]

In traditional (flat) layouts, memory banks can be allocated a contiguous block of memory addresses, which is very simple for the memory controller and gives equal performance in completely random access scenarios, when compared to performance levels achieved through interleaving. However, in reality memory reads are rarely random due to locality of reference, and optimizing for close together access gives far better performance in interleaved layouts.

The way memory is addressed has no effect on the access time for memory locations which are already cached, having an impact only on memory locations which need to be retrieved from DRAM.

History edit

Early research into interleaved memory was performed at IBM in the 60s and 70s in relation to the IBM 7030 Stretch computer,^[4] but development went on for decades improving design, flexibility and performance to produce modern implementations.

References edit

^ ^a ^b ^c Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang (2000). A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. MICRO' 33.{{cite conference}}: CS1 maint: multiple names: authors list (link)
^ "Sun letter to the Director of the Technology Transfer Office of the College of William and Mary" (PDF). July 15, 2005.
^ "Professor Xiaodong Zhang Receives 2020 ACM Microarchitecture Test of Time Award". Department of Computer Science and Engineering, College of Engineering, Ohio State University. January 19, 2021.
^ Mark Smotherman (July 2010). "IBM Stretch (7030) — Aggressive Uniprocessor Parallelism". clemson.edu. Retrieved 2013-12-07.

External links edit

[Interleaving-1] Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang (2000). A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. MICRO' 33.{{cite conference}}: CS1 maint: multiple names: authors list (link)

[2] "Sun letter to the Director of the Technology Transfer Office of the College of William and Mary" (PDF). July 15, 2005.

[3] "Professor Xiaodong Zhang Receives 2020 ACM Microarchitecture Test of Time Award". Department of Computer Science and Engineering, College of Engineering, Ohio State University. January 19, 2021.

[4] Mark Smotherman (July 2010). "IBM Stretch (7030) — Aggressive Uniprocessor Parallelism". clemson.edu. Retrieved 2013-12-07.

[1]

[2]

[3]

[4]