## Statistical Techniques in the Streaming Data Library (SDL): A Tutorial

Statistical techniques are essential tools for analyzing large datasets; this statistics tutorial thus covers essential skills for many Streaming Data Library (SDL) users.

One of the most common quantities used to summarize a set of data is its center. The
center is a single value, chosen in such a way that it gives a reasonable approximation
of normality.

Both running and weighted averages are important filtering methods for statistical
analysis.

Climatology is commonly known as the study of our climate, yet the term encompasses
many other important definitions. Climatology is also defined as the long-term average
of a given variable, often over time periods of 20-30 years.

It is often important to determine if a set of data is homogeneous before any statistical
technique is applied to it. Homogeneous data are drawn from a single population.

A random variable or random process is said to be stationary if all of its statistical
parameters are independent of time. While most statistical techniques require that
data is stationary, most atmospheric processes are visibly nonstationary.

While measures of central tendency are used to estimate "normal" values of a dataset,
measures of dispersion are important for describing the spread of the data, or its
variation around a central value.

The correlation is defined as the measure of linear association between two variables.
A single value, commonly referred to as the correlation coefficient, is often needed
to describe this association.

Indices are diagnostic tools used to describe the state of a climate system. Climate
indices are most often represented with a time series; each point in time corresponds
to one index value.

A frequency distribution is one of the most common graphical tools used to describe
a single population. It is a tabulation of the frequencies of each value (or range
of values).

Singular value decomposition (SVD) is quite possibly the most widely-used multivariate
statistical technique used in the atmospheric sciences. The technique was first introduced
to meteorology in a 1956 paper by Edward Lorenz, in which he referred to the process
as empirical orthogonal function (EOF) analysis. Today, it is also commonly known
as principal-component analysis (PCA). All three names are still used, and refer to
the same set of procedures within the Streaming Data Library (SDL).

Interpolation is the process of using known data values to estimate unknown data values.