Distributed Memory Programming

4. Time Hierarchy

Different amounts of memory exist at different "time-distances" from any given processor -- the longer you're willing to wait, the more memory you have to make use of.

Access

Intra-node (via memory bus)

Within the node, there are essentially three levels of memory, both in terms of amount and "time-distance", and not counting what may be called mass-storage ... what we're looking at, here, is memory that all uses the same communication mechanism, and, within the node, that's the internal memory bus:

  1. Registers (single cycle)

    Registers are banks of fast-access data cells, sometimes split between both type of usage (instruction or data) and type of data (integer or floating point). There are typically between 10 and 200 or so registers, and the more registers, the more expensive the chip. Access is on the order of a single cycle, as the registers are, in effect, directly addressable right on the chip.

  2. Cache (a couple of cycles)

    Caches are usually banks of the same kinds of memory that make up the next category, main memory, but are usually located right on the motherboard along with the CPU and other system-function hardware. Caches, too, can be found split between instruction and data, and usually range in size from 100KB on up to .5MB or more, with access times usually lower than 10 cycles.

  3. Main memory (tens of cycles)

    Main memory, once the darling of applications programmers, is now increasingly relegated to the role of working storage, while sophisticated schemes are used by both compiler and programmer to insure that important data is loaded into either registers or cache well before it is ever needed. Main memory, even for personal computers, is typically never less than 4MB, and, for high-end workstations, can range up to 1GB. Access time is generally under 100 cycles.

External I/O (rotating media via I/O bus) (millions of cycles)

Long-term storage that is still intended for well-nigh immediate access is the domain of the various kinds of disks, be they cartridge, SCSI, RAID-array, CD, etc.; this is distinct from "long term storage where we don't know if we'll ever want it again, but we might"...i.e., tape, where the amount of data that can be stored is virtually infinite, as is the amount of time necessary to find what you so desperately need when a squirrel fries both itself on a transformer, and your disk-pack from the surge and power-loss.

Disks actually serve two distinct uses:

  • Virtual memory

    In order to allow computers to concurrently execute many more programs than can actually fit into the available main-memory, programs are loaded as separate pages, each of which is of a size that fits naturally within the segmentation setup of the main-memory, and only those pages actually needed (or last used, or other sophisticated criteria) are kept in main-memory ... all other pages associated with the program are written out to virtual-memory storage areas on disk, and are paged-in as needed, and paged-out when not needed(and something else is).

  • Application defined file I/O

    Whether the "application" is your shell, or some program that actually does something important, anything that reads or writes to or from "files", ends up, sooner or later, accessing disk. Lots of things can get in the way: filling up I/O buffers (in order to make the communication more "efficient"), higher-priority tasks (disk I/O is characteristically considered fairly low-priority in the whole scheme of things), queued up requests, other traffic on the bus, waiting for the right sector to come under the heads ... when you look at the whole suite of considerations, you sometimes marvel that, as long as it a disk access takes, it ever gets done at all ... if it does, of course.

Can you remember when PCs came complete with (gasp!) a "10MB hard-drive"? Today, 10MB is sneered at for main-memory -- you can't even hold an operating system on a 10MB disk, much less actually boot it. You could buy hard drives in the under-100MB range, but who'd want to, since it costs about the same to get anywhere from 200-300MB, and, if you've got the money a single drive holding 1GB will cost you on the order of $750, and terabytes (10e12) of disk are not uncommon in large supercomputer facilities. Access-time is in the not-unexpected (given what was said earlier) 10e6 cycles range.

Inter-node (via I/O bus and interconnection network)

Attaching many fully-loaded uni-processors to a communications network, and giving them the means to request data from one another, opens up a whole new dimension of storage possibilities:

  • Contents of memory modules on other processors (hundreds to thousands of cycles)

    If we're interested in data that's already been loaded into some remote main memory (or cache, or registers, but main-memory is the limiting case in this instance), then the amount of time that it takes the remote processor to act on the request (see the earlier description concerning main-memory, and consider that from the point of view of the remote processor being the local one) is completely overshadowed by the amount of time that it takes to communicate that data back to the requester ... network latencies and bandwidths being what they are, and assuming a local-area network (i.e., a cluster-computing situation), hundreds to thousands of cycles isn't claiming too much; if we widen our scope to include wide-area networks, even out to cross-country ones, but also allow ourselves to use the most state-of-the-art networking architecture (i.e., vBNS), current capabilities are in the 20MB/sec range, while it's expected that 200-300MB/sec will be achievable around the Millenium.

    One late-breaking note: a recent announcement claims that a cluster-computing communications benchmark, using the newly-developed SCI (Scalable Coherent Interface) protocol, has demonstrated inter-processor latencies on the same order as intra-processor ones ... i.e., the 100's - 1000's has been dropped down to the ~10 cycle range.

  • Contents of disks on other processors (millions to 10's of millions of cycles)

    Once you've gotten the data-request all the way across the network to the remote processor that controls the data, it's reasonable to expect that some of your requests will require a remote disk-access. Here again, the time it takes for the remote processor to access its disk is so much more than the time it takes to return the data, that the entire operation is comparable to the local processor accessing its own disk ... okay, maybe on a moderate-use day, with a few people playing xtrek and a few more running heavy emacs sessions, but, still, this is no more than a few extra finger-drums while you stare blankly at your screen.