Distributed Memory Programming

9.3 Scaling

Now the problem becomes one of determining how well your application is going to be able to handle growth: bigger data sets, more processors, more memory, more, more, MORE! It's not simply a matter of making your data arrays larger, or tossing more processors at your problem: in fact, any particular decision you make in regards to growing one part of the picture is probably going to end up having effects on every other aspect of it.

The intent, here, isn't one of trying to talk you out of scaling up your application, but rather in helping you understand that, in order to do so intelligently, you'll probably have to scale up multiple dimensions of the entire environment in order to maintain a balanced working system.

  1. Independent variables

    These are the components that stand relatively isolated from changes in other things, and which tend, conversely, to have massive effects on everything else when they change:

    • Problem size

      As you get your program tuned to your initial data set, and find that the answers you have coming out seem to be looking as they should, you almost invariably get this overwhelming urge to see what happens when you crank-up the data-volume: let's stop tossing toy problems at this thing and see what it can do on some real data! More timesteps, higher precision, more frequent sampling, sampling over a larger domain ... making things more realistic, and necessarily more complex. But just because it's more realistic doesn't mean that it's going to run, as we'll see when we get a bit further on in the discussion.

    • Processor count

      You generally start small, just to get things running, and to have a smaller area to search through for the inevitable bugs that hatch in new codes. But, once things seem running smoothly on 1, 2 and 4 processors, then, just as with problem sizes, you suddenly start asking yourself "I wonder what would happen if I turned the full 2 million DEC-Alphas in my closet loose on this problem?"

      Just as with problem size, the number of processors you have available is not itself affected by anything else that might change in the application environment ... after all, they're going to keep sucking watts and churning out cycles regardless of what's happening to your application. But their addition to the computational environment will have very marked effects on the other components.

  2. Dependent variables

    The following components of your execution environment have behavior which is affected by both the independent variables just described, and by the other dependent ones we're discussing now:

    • Execution time between communication events

      The more the processors for a given amount of data, the less, in general, gets distributed to any particular processor, and the less each will (again, in general) have to work, which drives down the amount of time separating communications events. The less time between messages, the higher the load, and the more usage the buffering system will receive, which cuts down on the amount of memory available for other things, etc.

      In general, you want to strive for as high a computation-to-communication ratio as you can achieve, most simply because processors are so much better at computing than networks are at supporting communications between them. However, this isn't quite so cut-and-dried: in many application scenarios, the more the computation, the more space is taken up with material to be communicated, and that leads to other potential bottlenecks. Even so, it's usually better to put off communications until you've got lots of it to do, and then do it all at once.

    • Per node memory requirements

      Data size goes up, the amount of data being supported on a given node probably goes up; more messages, more space needed for buffers; more processors, possibly larger routing tables required ...

      The amount of memory (both RAM and disk) that you have available on a given node is usually a parameter that you have no control over -- it's a constraint of the environment that you have to live within, and make allowances for. So you have to be aware of the kinds of space requirements a scaling-up of your application is going to translate into, and be ready to scale-back-down to something that is actually supportable within your particular execution environment ... and this means looking at the entire computational context, all of the users and their applications, not simply your own.

    • Size and number of messages

      More data usually leads to more messages, and/or larger ones; more processors usually leads to more messages, both data and control. In general, it is considered a safer and more economical use of the network to use fewer messages of larger size, than more messages of smaller size, although the actual determination of the suitability of this maxim depends quite heavily on the latency and bandwidth characteristics of the communications environment.

    • Number of global synchronization events

      More processors, more need to send around control messages. You should try to keep the number of synchronization events required by your application to a minimum, simply because every one you use adds to the overall latency experienced in your runs; but it's better to have longer running codes producing correct results, than it is to have faster ones producing errors, so use them if the design calls for it.