Distributed Memory Programming

2.1.6 Interleaved

Interleave

On shared memory machines such as the Cray Y-MP, etc. it is possible to have data object dimensions (e.g., multi-dimensional arrays) and strides (the natural size of a piece of the data object which will be operated on as a single unit, e.g., an entire column of a matrix involved in a linear transformation) which give rise to repeated accesses from the same memory bank.

Clock cycle time is much faster than the memory access time, leading to very poor performance because the memory needs time to recover between accesses. For this reason, high performance memory systems are typically organized so that successive words are in physically distinct memory banks (see illustration above, and notice how successive rows for a given column are each handled by a different memory-bank). This improves bandwidth for unit strides (single data cells), but is catastrophic for strided accesses where the stride is a multiple of the number of memory banks (i.e., where the same memory bank is going to be called upon to deliver up more than one piece of data in the same logical operation -- because it can only return a single data element at a time (assuming that it's single-ported, i.e., it only has one I/O interface to the communications bus), it needs extra time to be able to access and return any other data elements required).