Manipulating Data

This section covers arithmetic operations, setting limits, averages, as well as some other analyses.

Manipulating Data

This section covers arithmetic operations, setting limits, averages, as well as some other analyses.

Note that when specific time periods or locations are not selected, these operations are applied to the time and spatial grids in their entirety by default. It also important to note that when multiple data variables are to be compared, as is common in the examples in this section, the time and spatial grids of those variables must be identical. You will see this issue addressed frequently.

Basic Arithmetic Operations

Adding a number to a field

Example: Add 2.5°F to the minimum temperature data from Bismarck, ND.

Start at the GLOBALSOD* dataset main page.

Select the station in Bismarck, ND. CHECK
Select the minimum temperature data variable. CHECK EXPERT
When adding a number to the field, the units of that data variable are automatically used. Note the units of the minimum temperature data variable under the Other Info heading.

CHECK

While in expert mode, enter the following line below the text already there.

2.5 add
Click "OK". CHECK

To see the results of this operation:
Select Tables link > Agree button> columnar table link. CHECK Compare with ORIGINAL DATA

Adding fields

Example: Create the observed monthly SST data by adding the monthly climatological SST data and the monthly SST anomaly data.

Start at the Reyn_Smith* dataset main page.

Because we are combining an observational data string with a climatological data string, we do not need to worry about the time grids matching. We must only make sure that the two string have the same time scale (temporal resolution). In this example, the time grids of each of the data variables look like this:

Climatological SST
Time grid: /T (months since 01-Jan) periodic Jan to Dec by 1. N= 12 pts :grid

SSTA
Time grid: /T (months since 1960-01-01) ordered Nov 1981 to Jul 2002 by 1. N= 249 pts :grid

Note that both of the data variables have a monthly time scale and that the time grid for the climatological data is periodic. Python will automatically match that periodic data properly with the SSTA time grid.

Select the monthly SSTA and climatological SST data variables. EXPERT

CHECK

While in expert mode, enter the following line below the text already there.

add
Click "OK". CHECK

To see the results of this operation:
Select a small region (175.5-150°W, 10-15°N) and a single time step (May 1988) to make the size of the data file more manageable. CHECK START
Select Tables link > Agree button> columnar table link. CHECK Compare with CLIMO DATA , ANOMALY DATA and OBSERVED DATA .

Subtracting a number from a field

Example: Subtract 2.5°F from the minimum temperature data from Bismarck, ND.
Refer to the example in Section Adding a Number to a Field.

Substitute the following Python command for 2.5 add.

2.5 sub
Click "OK". CHECK
Note that Python subtracts the second number/field listed (e.g., 2.5) from the first number/field listed (e.g., min. temperature).

Subtracting fields

Example: Create the monthly climatological SST data by subtracting the monthly SSTA data from the observed monthly SST data.

Start at the Reyn_Smith* dataset main page.

Note the time grids of the two variables to be compared.

SSTA
Time grid: /T (months since 1960-01-01) ordered Nov 1981 to Jul 2002 by 1. N= 249 pts :grid

SST
Time grid: /T (months since 1960-01-01) ordered Nov 1981 to Jul 2002 by 1. N= 249 pts :grid

As is common when comparing two variables from the same dataset, their time grids match exactly. However, you should get always make a of point of checking this.

Make sure that the spatial grids of these two variables match.
Now that we are sure that the grids match properly, this example is very much like that in Section 1.b where we added the two fields. The primary difference here is the order that the variables are listed. As noted in Section 1.c, Python subtracts the second field listed from the first field listed.

Select the monthly SST and then the monthly SSTA data variables. EXPERT

CHECK

While in expert mode, enter the following line below the text already there.

sub
Click "OK". CHECK

Multiplying a field by a number

Example: Convert the units of the mean sea level pressure data from mb to Pa by multiplying the field by 100.
Note: there is an Python command that converts units themselves instead of just the data values. This is just an example and the units of the data will still appear as mb after the arithmetic operation.

Start at the GLOBALSOD* dataset main page.
Select the mean sea level pressure data variable.
CHECK EXPERT
While in expert mode, enter the following line below the text already there.

100 mul
Click "OK". EXPERT

Multiplying fields

This operation works just like that covered in Sections 1.b and 1.d.

Ensure that the grids of the variables match, select both of them, and use mul as the operator in expert mode. The mul command can also be used to find common entries in two different data strings.

Dividing a field by a number

Example: Convert the units of the precipitation data from inches to cm by dividing the field by 2.54.
Note: there is an Python command that converts units themselves instead of just the data values. This is just an example and the units of the data will still appear as inches after the arithmetic operation.

Start at the GLOBALSOD* dataset main page.
Select the precipitation data variable.
CHECK EXPERT
While in expert mode, enter the following line below the text already there.

2.54 div
Click "OK". CHECK

Note that Python divides the first number/field listed (e.g., precipitation) by the second number/field listed (e.g., 2.54).

Dividing fields

This operation works just like that covered in Sections 1.b and 1.d.

Ensure that the grids of the variables match, select both of them, and use div as the operator in expert mode. Again, note that Python divided the first field listed by the second field listed.

Setting Limits

It is often useful to limit data values. You may want to set minimum and maximum limits on data as a means of quality control or only use data that meets particular criteria. Python makes these types of operations very easy. Below are some common examples.

Setting a minimum/maximum

Example: Create a data string where all minimum temperature data values less than 0°C are given a value of 0°C.

Start at the GLOBALSOD* dataset main page.
Select the minimum temperature data variable.
CHECK EXPERT

As previously described, it is a good idea to note the units of the data in question as all values in Python are automatically referenced to the units of the data variable.

Note that units of temperature by looking at the information under the Other Info heading.
In this case, the units are in Fahrenheit and we must therefore give our desired minimum temperature in Fahrenheit.

While in expert mode, enter the following line below the text already there.

32. max
Click "OK".

To see the results of this operation:
Select a single station (Vienna, WMO ID:110360) and short time period (e.g., 1996) to make the size of the data file more manageable. CHECK

Select Tables link > Agree button> columnar table link. CHECK Compare with the ORIGINAL DATA .

An analogous operation, setting a maximum value of 0°C, can be done by replacing the command 32. max with 32. min.

Finding a minimum/maximum

There are two common uses of these feature. You may want to find a minimum/maximum value in a particular region or time period. Let's look at examples of these operations.

Example: Find the largest SSTAs for the entire time grid.
This example finds the largest SSTA from the entire time grid for each grid point. The result is the largest SSTA as function of X (longitude) and Y (latitude). Of course, you can limit the time grid to find the largest SSTAs in a more specific time period.

Start at the Reyn_Smith* dataset main page.
Select the monthly SSTA data variable.
CHECK EXPERT

To find the largest positive SSTA:
While in expert mode, enter the following line below the text already there.

[T] maxover
Click "OK". CHECK

To find the largest negative SSTA:
While in expert mode, enter the following line below the text already there.

[T] minover
Click "OK". CHECK

To see the results of this operation:
Select views icon furthest to the left in the function bar that has the land in black. CHECK

Example: Find the largest SSTAs for the entire spatial grid.
This example finds the largest SSTA from the entire spatial grid for each time step. The result is the maximum global SSTA as a function of T (time). Of course, you can limit the spatial grid to find the largest SSTAs in a specific region.

Start at the Reyn_Smith* dataset main page.
Select the monthly SSTA data variable.
CHECK EXPERT
To find the largest positive SSTA:
While in expert mode, enter the following line below the text already there.

[X Y] maxover
Click "OK". CHECK

To find the largest negative SSTA:
While in expert mode, enter the following line below the text already there.

[X Y] minover
Click "OK". CHECK

To see the results of this operation:
Select Tables link > columnar table link. CHECK

Creating a numerical mask

Masks make data values that meet a particular threshold equal to NaN.

Example: Mask out the maximum temperature values greater than 100°F.

Start at the GLOBALSOD* dataset main page.
Select the maximum temperature data variable.
CHECK
Note that the temperature unit is Fahrenheit.
This is good because our mask threshold is also in Fahrenheit. If the units had not agreed, then we would have had to convert the mask threshold to the units of the data variable.

While in expert mode, enter the following line below the text already there.

100. maskgt
Click "OK". CHECK

An analogous operation, masking out the maximum temperature values less than 100°F, can be done by replacing maskgt with masklt.

To see the results of this operation:
Select a single station (Damascus, WMO ID: 400800) and a short time period (Jun 1994) to make the size of the data file more manageable. CHECK

Select Tables link > Agree button> columnar table link. CHECK Compare with the ORIGINAL DATA . Note that the data value from June 15 is missing. (Tables exclude NaN values.)

Flagging Data

Flags create a binary version of any variable based on a particular threshold. Those data that meet the threshold are given a value of 1 and those that do not receive a value of 0.

Example: Flag snow depth values greater than 1 meter.

Start at the GLOBALSOD* dataset main page.
Select the snow depth data variable.
CHECK
Note that the depth unit is inches.
Our flag threshhold is in meters, so we must convert that depth to give Python the threshhold in the units of the data variable.

While in expert mode, enter the following line below the text already there.

39.4 flaggt
Click "OK". CHECK

To see the results of this operation:
Select a single station (Pellston, MI, WMO ID: 727347) and a short time period (Jan-Feb 1996) to make the size of the data file more manageable. CHECK

Select Tables link > Agree button> columnar table link. CHECK Compare with the ORIGINAL DATA .

An analogous operation, flagging snow depths less than 1 meter, can be done by replacing flaggt with flaglt.

Creating Averages

Spatial averages

When creating a spatial average of station of data, one typically wants to take into account the location of each station (e.g., weighted average). That operation is beyond the scope of this tutorial. However, creating a spatial average of gridded data is much more straightforward and an example is given here.

Example: Find the spatial average of monthly SST data in a region in the Gulf of Mexico defined by 83°-97°W, 21°-30°N for Jan-Dec 1998.

Start at the Reyn_Smith* dataset main page.
Select the monthly SST data variable.
CHECK
Select the 1998 time period and the lat/lon defined region. CHECK EXPERT

Enter expert mode and enter the following line below the text already there.

[X Y] average
Click "OK". CHECK

To see the results of this operation:
Select one of the views links in the function bar.

This procedure can be easily applied to other types of spatial averaging. For example, if you wanted to create a zonal average, then you would use the following line of Python instead.

[X] average
This creates a zonal average as a function of T (time) and Y (latitude). Click here to see an example of this operation.

Seasonal/chunk averages

Example: Create seasonal averages (DJF, MAM, JJA, SON) of monthly SST data from 1990-1999.

Start at the Reyn_Smith* dataset main page.
Select the monthly SST data variable.
CHECK
Select the Dec 1989-Nov 1999 time period. CHECK EXPERT

While in expert mode, enter the following line below the text already there.

T 3 boxAverage
Click "OK". CHECK

To see an animation of the seasonal averages you just created:
Select the views link furthest to the left in the function bar.
Enter the "Jan 1990 to Oct 1999" in the time text box at the top of the data viewer.
Click "Redraw".
CHECK

Note that if you had wanted JFM, AMJ, JAS, OND seasonal averages, then the selected time period would have been Jan 1990 to Dec 1999. Another important point here is that the step over which the average is created is always in the units of the data variable in question. For example, had the SST data been at a daily time scale, the above Python command would have created a 3-day average instead of a 3-month average. Therefore, it is an excellent idea to get in the habit of making sure the units of the data variable and the step agree with each other. The technique used in this example is particularly useful when 12 is evenly divisible by the step over which you want to average. The next example addresses the cases when this is not true.

Example: Create a May-Sept averages of SSTA data for the time period 1985-1994.
This example creates an average over 5 months. Twelve is not evenly divisible by this step (e.g., 5 months) so we much use a different technique than the one above.

Start at the Reyn_Smith* dataset main page.
Select the monthly SSTA data variable.
CHECK
Select the Jan 1985- Dec 1994 time period. CHECK EXPERT

While in expert mode, enter the following line below the text already there.

T 12 splitstreamgrid
Click "OK". CHECK

This Python command splits the time grid with a period of 12. That is, in this example, it creates a dataset of Jan data, a dataset of Feb data, etc. This is an important step, but we are not quite finished.

Select May-Sept grids and average over them with the following Python commands.

T (May) (Jun) (Jul) (Aug) (Sep) VALUES

[T] average
Click "OK". CHECK

There is also a convenient option if you want to create averages/climatologies of single months.

Example: Create a monthly climatology of SST data for the time period 1982-2001.

Start at the Reyn_Smith* dataset main page.
Select the monthly SST data variable.
CHECK
Select the Jan 1982-Dec 2001 time period. CHECK EXPERT

Select the Filters link in the function bar.
Select the monthly climatology link.

CHECK EXPERT

Note that this command can only be applied to monthly data.

Running averages

This operation offers a fast and easy way to smooth data temporally. Let's look at an example.

Example: Create a 15-day running average of precipitation data.

Start at the GLOBALSOD* dataset main page.
Select the precipitation data variable.
CHECK

At this point, it is a good habit to check the temporal unit to make sure it agrees with how you want to define your average step. In this example, we want to create a 15-day running mean . Therefore, the unit over which we want to average is a day.

Make sure that the temporal unit of the precipitation data is the same as the unit over which you want to average.
While in expert mode, enter the following line below the text already there.

T 15 runningAverage
Click "OK". CHECK

Note that this operation will truncate the data to fit the step. In this example, we have a step of 15 days and are using the full time grid of Jan 8, 1994 - Dec 25, 1999. Therefore, after the running mean is created, the data will include the dates Jan 15, 1994 - Dec 18, 1999.

To see the results of this operation:
Select a station (Barcelona, Spain, WMO ID: 81810).

START

Select one of the views links in the function bar. Compare with those from the line , bar , and scatter plots of the original data.

Statistical and Other Mathematical Operations

Anomalies

Earth science data is commonly viewed in term of anomalies (i.e., difference between observations and climatology) rather than as raw values. Anomalies can be produced with Python by first calculating a climatology and then calculating the difference between it and the observed data. However, Python also has a single command that does all of these calculations. Let's look at an example.

Example: Recreate the SSTA data for the time period 1982-2001.

Start at the Reyn_Smith* dataset main page.
Select the monthly SST data variable.
CHECK
Select the Jan 1982-Dec 2001 time period. CHECK EXPERT

CHECK

While in expert mode, enter the following line below the text already there.

yearly-anomalies
Click "OK". CHECK

You have just created the SSTA anomalies for the time period 1982-2001 based a 1982-2001 climatology. While convenient, this operation is bit limited in that it can be applied to monthly data. And like the yearly-climatology command, you can find this options via the "Filters" link in the function bar.

Correlation

This is an excellent example that combines many of the techniques covered to this point. It will calculate a Pearson product-moment correlation.

Example: Find the correlation between sea surface temperature anomalies and the Southern Oscillation Index from January 1987 to December 2001.
Note: in order to correlate two sets of data, they must have the exact same temporal unit.

Up to this point, we have been using the GLOBALSOD dataset for our station data. However, you can see in the time grid information of this dataset that its temporal unit is days while that of the Reyn_Smith SST data we have been using is months since 1960. For simplicity, let's use another dataset, the sea level pressure data that has monthly data defined as months since Jan 1951.

Select the NOAA NCEP EMC CMB GLOBAL dataset by either searching for it or through the SOURCES option. CHECK
Select the Reyn_SmithOIv2 > monthly > Sea Surface Temperature Anomaly dataset. CHECK
Select the Jan 1987-Dec 2001 time period. CHECK EXPERT

At this point, when the first dataset selections have been made, it is typically easiest to make the second dataset selections in expert mode.

While in expert mode, enter the following lines below the text already there. All of these commands should look familiar to you from previous examples.

SOURCES .Indices .soi .standardized
T (Jan 1985) (Dec 2003) RANGEEDGES
                        
Click "OK". CHECK

You now have two data fields with identical time grids. Let's correlate these fields. While in expert mode, enter the following lines below the text already there.

[T] correlate
Click "OK". CHECK

To view the correlation data you just produced:
Select one of the views under the "Data Views" tab, e.g. the "Colors with land" view CHECK
or
select the "Data Tables" tab and then select one of the table links shown, e.g. "columnar table" under the "Columnar Tables" header CHECK
You can correlate over the spatial grids as well by replacing the [T] in "[T] correlate" by [X], [Y], [X Y], etcetera in the last line from expert mode.

Trigonometric functions

Basic trig functions are typically used with the spatial grids. The results of this function can then be used as part a broader technique, such as spatial weighting.

Example: Find the cosine of a latitudinal grid of weekly SST data.

Start at the Reyn_Smith* dataset main page.
Select the weekly SST data variable.
CHECK
While in expert mode, enter the following line after the text already there.

Y cosd
Click "OK". CHECK
To find the sine of data, replace cosd with sind.

To view the data you just produced:
Select one of the views links in the function bar.
OR
Select Tables link > columnar table link. CHECK