You've got a serial code in-hand, or an idea in-mind for what you'd like to be able to do in a parallel application; in either case, the single most important consideration is how to discover parallelism.
What is to be partitioned and distributed is a question at the heart of determining what major style of parallelism your application is going to exhibit: functional (parallel or different functions) and data (parallel or different data). These two categories contain an ever-growing number of sub-specializations, but the demarcation is still largely relevant. Another useful dimension is that of execution-type, either concurrent or pipelined; the combination of these two dimensions leads to the following categorization of parallel designs:

- Functional parallelism
Multiple, independent processes.
- Data parallelism
Multiple, independent data streams.
- Concurrent Execution
Processes are all working at roughly the same time, i.e., concurrently.
- Pipelined Execution
Processes are arranged in a strict head-of-next-to-tail-of-last queue, so that the output of one becomes the input to the next, and on and on...
- Function-parallel/Concurrent-execution (MPMD)
The most general form of distributed parallelism: all others can be constructed as special cases of Multiple-Processes, Multiple-Data (streams) .
- Data-parallel/Concurrent-execution (SPMD)
Single-Process, Multiple-Data (streams) means that there is one process image even though there might be multiple copies of it in actual use (if you didn't know that they were all the same, you'd swear you were looking at an MPMD case). This isn't always the case, however; for example, the older designation SIMD, which stands for Single-Instruction, Multiple-Data (streams), and is a special case of the more general SPMD, had several implementations in which an actual instruction would be communicated to every processor (along with control information indicating whether or not that processor should actually execute that instruction).
Anyway, SPMD type parallelism is very often called domain-decomposition, where the domain is the data set, and the decomposition is the splitting up of the data among a number of processes all of which will be running exactly the same code, but on different parts of the same overall data-object.
- Function-parallel/Pipelined-execution (Frequency Filters)
You have a signal coming in, full of interference in several wide frequency bands, and possibly carrying information in several much narrower ones. One useful approach to cleaning that data stream up and extracting the data is to arrange a number of different filters (programs targetted at specific parts of the spectrum and implementing different strategies for either masking or enhancing signals) in a pipeline, and feed the signal into one end and through each successive filter until the data is clean enough to be of use.
- Data-parallel/Pipelined-execution (Systolic Arrays)