The Data Model (Introduction)

2.2 The "Connections" Component
Connections are encoded as integer vectors. The integers are the indices of the positions as if the positions were listed out explicitly and numbered. To understand how that applies, we have to digress to an important related topic.

DX is inherently a "C" oriented language; that implies two conventions: counting starts at 0 (not 1), and arrays are listed such that the last element is the fastest changing ("lastest is fastest" is my mnemonic). Both of these are in contrast to FORTRAN oriented data conventions where counting starts at 1 and arrays are listed such that the first element varies fastest (not to worry, DX can read FORTRAN style data as well). Let's show an example.

We declare a small spatial system with a 2x3 [X, Y] orthogonal grid starting at the origin with deltas of 1 in each dimension. We could use the compact positions declaration form I just showed above to describe this grid to DX. That would look like:

Origin 0 0
Deltas 1 0  // X
 0 1  // Y
Counts 2 3

Or we could write out all the grid coordinate pairs in a list explicitly if we chose to. That list, in the order that DX would expect it to be presented, would be:

(X Y)   // column heads are not required nor normally included
0 0    // in data files, except for "spreadsheet" format data
0 1
0 2
1 0
1 1
1 2

Note that the last listed coordinate changes faster than the first: Y varies faster, then after all the Y values have been listed for one X, we advance X, and vary for all Y's again. This is referred to as "row major" order since the rows (X) vary more slowly; the minor (Y) dimension varies more quickly. The opposite is the FORTRAN convention called "column major."

 

Without peeking below, draw this grid on a piece of scratch paper. How many vertices are there? How many grid cells (spaces) are there? Label each vertex with its X, Y coordinates.

Answer

The counting started at 0, so there are 3 Y values: 0, 1, 2. There is no value 3 for Y. C programmers are hard-wired to understand this; others lose hair and sleep over it.

 

If I told you I had a 3D [X, Y, Z] grid in row major order, which axis varies fastest? Which second fastest? Work up a small example (2x2x3) and write out the coordinates in order to test your understanding of this; assume the grid starts at the origin and has deltas of 1 for all axes. How many positions are there in your positions Component?

Answer

 

Now assume the same grid but in column major order. List the coordinates in the order they would be found in a data file as output by a FORTRAN program in "column major" order.

Answer

 

Optional: View this Technical Aside if you want more detail on this subject; the information is not required to understand the upcoming material.

Let's now add some data that is sampled at the coordinate points of the 2x3 grid we discussed earlier. If we sampled data at the intersections of the grid, we would have 6 data values (2x3). Let's use 1, 5, 6, -2, 4, 9 for test data. You should realize that very often the data values appear by themselves in a data file: no "coordinates" appear in that file (typically a data array would have many more values, but the principles are identical regardless of the data count).

We present the data array to DX in the order just listed, inform DX the data is in row-major order, and declare it be sampled on a 2x3 grid. DX would map the data to the spatial points, so:

X Y Data
0 0 1
0 1 5
0 2 6
1 0 -2
1 1 4
1 2 9

The data alone, without reference to the positions, appears to be a single stream of numbers. It is neither shown as nor stored as a two-dimensional grid. In fact, the "data" Component's metadata contains no reference to the fact that the spatial sampling coordinate system was two-dimensional; it declares 6 items in a list. It is the Field that links the "data" stream to the "positions" vectors in the order just described, thus mapping the data back onto the space it was sampled from. Let me say it again: without the "positions," the "data" has no spatial referent. You cannot tell by inspection of the "data" Component if the data is supposed to be mapped onto a 1x6, 6x1, 2x3, or 3x2 grid. It could be any of those. Or it could be scattered data (no grid at all).

 

Here's a teaser: in DX, you can rip the data out of one Field, say a 2x3 grid Field, and paste it into another Field with 1x6 spatial positions if you wanted to visualize the data on a 1D line.

Here's another case: this data is measured along a one-dimensional line, for example, elevations along the centerline of a straight highway. If the spatial grid lay along X for 6 locations, the map could be:

X Data
0 1
1 5
2 6
3 …quot;2
4 4
5 9

Note that the data would be provided in exactly the same array of 6 values in the same order as our previous example. The spatial coordinate system is not contained in the data. It is described within the context of the Field that maps the data onto the specified space; in this case, the space is one-dimensional. It is neither 1x6 nor 6x1, it is [6].

 

Why isn't it 6x1? Well, then why isn't it 6x1x1? Because we made no declaration of the existence of any other spatial dimensions. We've declared only a 1D space (X axis). Our 1D declaration of the grid would be:

Origin 0
Deltas 1
Counts 6

The Origin is a 1-vector. The Delta is a single 1-vector. There is only one Count.

What if the spatial samples were collected along Y (the vertical direction in DX's world); for example, temperatures at different elevations over a fixed point? To represent the spatial coordinates directly in DX terms, you would declare that the coordinate system was 2D, with a fixed value of X and varying Y's. However, you could also use the same X…quot;Data map we just saw, import the file, then rotate the axis by 90° when you got the data into DX. You can't declare an array that is only one-dimensional in Y, since DX always assumes the first coordinate is X, second Y, and so on.

The 2D declaration would be:

Origin 0 0
Deltas 0 0   // X delta vector (for each point, don't move)
        0 1   // Y delta vector (for each, move up 1)
Counts 1 6   // note that we have to show 1 for X: there is
                         one X column and 1x6 = 6 data values

Or, in the fully expanded version:

X Y Data
0 0 1
0 1 5
0 2 6
0 3 -2
0 4 4
0 5 9

The nice part about using this seemingly more complicated or redundant form is that the spatial system is now inherently upward pointing when you plot the data in DX. That is, it accurately recreates the actual locations in a 2D space as it was sampled without resorting to complicated rotational schemes (that are hard to keep straight in your mind). In general, I recommend that you describe the "native" spatial coordinate system. You save virtually nothing by trying to take a shortcut in a case like this. Then everything will be right side up when you import the data and visualize it.

While I may appear to have gone off on a tangent, this discussion was preface to a more complete description of "connections." You'll notice that in none of the above "positions" lists did there appear an actual "index" value. If it had, it would have looked like:

Index  X Y Data
0  0 0 1
1  0 1 5
2  0 2 6
3  1 0 -2
4  1 1 4
5  1 2 9

for our 2D grid example, and:

Index  X Y Data
0  0 0 1
1  0 1 5
2  0 2 6
3  0 3 -2
4  0 4 4
5  0 5 9

for the example of the vertical measures plotted in 2-space along a line.

Let's take the line first. It's reasonable to assume that temperature varies linearly in our sample domain. We'd like to consider the 6 points to be connected together in a continuous line so we can find any value along this line by interpolating temperature data values from sampled positions. By inspection, the value at position [0, 0.5], should be 3 (halfway between data sample values of 1 and 5 positioned at location indices 0 and 1).

But DX needs to be explicitly told that these 6 sample positions are in fact to be considered to lie along a continuum. We do this with the "connections" declaration. This says that position index 0 connects to position index 1, position 1 to position 2, and so on, using a "lines" connection element.

 

The formal description is not used here: the syntax of the DX native file format is described in the Users Guide. You may wish to read about this after you've completed the introductory materials provided in this workshop. The link is:

http://opendx.npaci.edu/docs/pdf/userguide.pdf

See Appendix B.2

Big Rule of Connections: No connections: no interpolation permitted.

In the case of the 2D grid, each point has an implicit index number, which appears in the order that the grid coordinates occur if fully expanded, following the rule that "lastest is fastest." In this case, we would like to describe a surface made up of quadrilaterals (rectangles, if you like) that cover all the area between adjacent sets of 4 points. The convention used by DX to list the connected points is as follows:

Quad 0 is made by connected points 0 1 3 4. Note that this describes an "N" shaped path (it is neither clockwise nor counterclockwise order, but in fact is a Peano curve). See the illustration.

Quad 1 is the next one up in Y, so is described by points 1 2 4 5.

 

Note that the connection element list for quadrilaterals has 4 indices in it, as you might expect to describe a 4-point object. You don't need to list the first point again to "close" the curve. The connection element is called "quads" in this case.

For the earlier "lines" example, there are only 2 points in each connection integer vector, such as:

0 1  // connects the first two points
1 2  // connects the second to the third point

and so on. Other connection elements in DX include "triangles" (3 indices each), "tetrahedra" (4, but connecting 3D positions), and "cubes" (8, in 3D, but these need not be pure cubes, i.e., with equal length sides, it's just easier to remember and spell than "hexahedra").

 

Optional: View this Technical Aside if you want more detail on this subject; the information is not required to understand the upcoming material.

There are many other standard Components in DX, such as "colors," "opacities," "normals," and "invalid positions." And, the writer of a DX program can create his or her own custom components. Usually, this is done so that the same "positions" and "connections" can carry more than one data Component. Each of the user's data Components can then be appropriately named, for example, "temperature," "pressure," "wind vector," etc. to keep them straight.

 

Optional: View this Technical Aside if you want more detail on this subject; the information is not required to understand the upcoming material.

We will discuss more details about Components in another section of this workshop, but for now, we've introduced sufficient information to understand many simple array data sets in visualization. As we begin to work with the DX modules that manipulate these Fields and the Components of Fields, I'll point out how the details we've discussed apply.