Distributed Memory Programming

9.1 Owner Rule

Remember that computation is cheap, communication is expensive. Try to guarantee that, as far as possible, each process will be working with local data:

  1. Dynamically updated global data should be distributed across processors

    Global data, is, after all, global, and needs to be made available to all participants as soon after its production as possible... the longer a remote process has to wait for a refresh, the higher the latency and the more likely that an invalid value will be used.

  2. Maintaining logical consistency of distributed data is the programmer's responsibility

    Building on that last point, any time your application uses distributed data, you have to be very conscious of guaranteeing that everyone agrees on currently-correct values for that data. Consider tagging data values with sequencing indices, in order to be able to identify the actual phase of the run from which a value was obtained.

  3. Try to obtain remote data in advance of its local use

    When you know that a particular piece of data is going to be needed, and that a remote access will be required to obtain it, arrange to do a non-blocking acquisition of that data far enough in advance so that you can be as sure as possible that the data will be local by the time the computation requires it.

  4. Frequently used static data can be replicated across processes to reduce communication overheads

    Before the application actually gets underway, either as compiled-in code or as part of the initialization start-up process, make sure that all processes have copies of any data that will never change during the run.