Adjacency (or contiguity)

Two polygons are either adjacent or not. Different criteria may be used to determine adjacency dependent on whether polygons must share an edge (Rook's case) or need only share a corner (Queen's case). Adjacency relations among a set of polygons or areas may be summarized in an adjacency matrix where each row-column entry is either 1 (for adjacent polygons) or 0 (for non-adjacent polygons).

Affine transformation

A coordinate transformation that combines a translation, a rotation and a scaling of the original coordinates. No distortion in the shape of entities occurs as a result of an affine transformation. Such a transformation can be represented by a 3 by 3 matrix for two dimensional coordinates.

Alternative hypothesis

In hypothesis testing the hypothesis that is consider opposite to the null hypothesis and which is considered supported if the null is rejected. Usually, the alternative hypothesis is a consequence of the theory that is being tested, and the investigator hopes to confirm support for it by statistical analysis.

Anisotropic

Spatial variation that is dependent on direction is anisotropic. See also isotropic.

Area

The two-dimensional region of space enclosed by a polygon. See also point and line.

Aspect

The direction perpendicular to the direction of maximum slope (or gradient) of a surface. A standard raster or map algebra operation. A key parameter in calculating hillshading for a digital elevation model.

Attribute

A characteristic of an object describable by a value. Values may be nominal, ordinal, interval, or ratio.

Autocorrelation

A measure of the degree to which attribute values in a data set are related to one another over space. If locations near one another tend to have similar values, the data set is positively autocorrelated. If locations near one another tend to be different (a 'checkerboard' effect), the data set is negatively autocorrelated. Where there is no particular spatial pattern to the data there is no autocorrelation. See also spatial dependence.

Bar chart

A statistical display showing for each of a series of categories the number or relative number of cases in a dataset that are members of that category. Not to be confused with a histogram.

Boolean algebra

The mathematical rules for combining 'truth values'. There are two values in Boolean algebra, TRUE and FALSE. There are three basic operators NOT, AND and OR. Boolean algebra specifies the results of combining truth values using different operations. For example, NOT TRUE = FALSE, FALSE AND TRUE = FALSE, and so forth.

Box plot

A statistical display that shows the median, upper and lower quartiles (see quantile), and outliers in a dataset.

Classical spatial analysis

The approach to spatial analysis focused on the relationship between spatial processes and the patterns they generate. The fundamental question in spatial analysis is "could this (observed) pattern have been produced by this process?" Spatial analysis attempts to answer the question using statistical methods.

Class boundaries

The values that determine when cases in a data set occur in one class rather than another, particularly when applied to choropleth maps. For example, in a data set where one attribute is defined as a percentage (say percentage of population under 18), class boundaries might be 10, 25, 50, 75 and 90. These would define classes from 0 to 10, 10 to 25, 25 to 50, 50 to 75, 75 to 90 and 90 to 100 percent.

Cluster detection

While point pattern analysis is concerned with determining whether or not point patterns are evenly-spaced, random, or clustered, cluster detection methods attempt to identify the location of clusters. Knowing the location of clusters is often more important because it may indicate locations that require special attention.

Clustered

In point pattern analysis events may have a tendency to group closely together in space, when they are considered clustered.

Contiguity

See adjacency.

Contour lines

Lines on a surface joining points with the same value, used particularly for lines of equal elevation on earth's surface. A more general term is isoline, which can refer to lines of equal value of any surface or field phenomenon, such as atmospheric pressure, population density, soil pH, etc.

Control point

Data that record a spatial location and an attribute value. Control points are used to represent a surface or field phenomenon. Weather station data or spot heights are examples.

Co-registration

The process of ensuring that different spatial data sets are properly aligned so that the same geographical locations in each data set are aligned to the same points in the GIS data layers. Co-registration is an essential prerequisite to overlay analysis, and may involve affine transformations of one or more data sets.

Delaunay triangulation

A method that joins a set of points into a collection of interconnected triangles that completely fill the study area.creates The Delaunay triangulation has the property that the triangles are as close to equilateral as possible, given the spatial distribution of the points. Delaunay triangulations are often used in triangulated irregular networks. The Delaunay triangulation is also the dual of the proximity polygons of a set of points, where each edge in the triangulation corresponds to a pair of Voronoi polygons that share an edge.

Density estimation

Any process used to measure the density or intensity of events in space, expressed in numbers per unit area. The most commonly used method uses kernel functions in kernel density estimation.

Dependent variable

In regression the variable that the model is set up to predict or estimate.

Deterministic

A spatial process is described as deterministic if its operation always results in the same outcome.

Digital elevation model (DEM)

A digital representation of earth's surface elevation, or more generally, of the attribute values of a field, where data are stored as a raster or grid of values.

Distance

The spatial separation between two entities. Most simply, this is measured in a straight line or 'as the crow flies'. On the plane, this is the Euclidean distance, or on a globe the Great Circle distance. See also statistical distance, adjacency, interaction.

Edge effect

A common difficulty in spatial analysis is that there is an inherent asymmetry between places included in the study area and those considered outside the study area. Thus, for example, in point pattern analysis, points near the edge of the study area are much more likely to have a nearest neighbor toward the center of the study area than away from it. Similarly in determining autocorrelation for a set of polygons, lack of knowledge of the state of polygons adjacent to but outside the study area may affect the results. Such effects are collectively termed edge effects.

Entity

Any real thing in the geographical world represented by an object in a GIS.

Exploratory spatial data analysis (ESDA)

A set of methods often used in spatial analysis based on linking statistical displays (see box plot, histogram, scatter plot, parallel coordinate plot) and map views to enable exploration of patterns in spatial data.

Evenly-spaced

In point pattern analysis events may have a tendency to be roughly equally spaced apart in space, when they are considered to be evenly-spaced.

Event

See point pattern analysis.

F function

A function in point pattern analysis based on the cumulative frequency of the shortest distance between events in a point pattern and a set of randomly placed locations in the study area. See also G function, K function.

Field (or surface)

A field is a representation in data of a phenomenon that could, in principle, be measured at all locations. The most obvious example is earth's elevation. Usually, it is assumed that there is only one attribute value at each spatial location, although this is a simplification with regard to actual fields (think of caves or overhanging cliffs, when there are two or more elevation values at a location). Fields may be stored as a digital elevation model (DEM), as a set of control points, as a triangulated irregular network (TIN), or as a mathematical equation relating spatial location to attribute values. Any field can be visualized as a surface. See also object.

First order spatial variation

First order variation in a spatial pattern is that part of the pattern that may be attributed to trends across the study area. The most obvious example is a steady (say) south-north decrease in average temperatures across a region. The part of a pattern that is considered first order is likely to vary with spatial and temporal scale. Marked first order effects in a pattern can be problematic for some methods such as kriging, when universal kriging should be used to account for the first order effect. First order components in a spatial pattern may be effectively modeled using trend surface analysis. See also second order spatial variation.

Focal operator

In map algebra, an operation based on a location and some defined neighborhood of other locations around it. So-called, because the values used in the operation are drawn from locations focused on the location. See also global operator, local operator, zonal operator.

G function

A function in point pattern analysis based on the cumulative frequency of the shortest inter-event distances between events in a point pattern. See also F function, K function.

Global operator

In map algebra, an operation based on a location and all the other locations in the study area. See also focal operator, local operator, zonal operator.

Gradient

The slope of a surface in its direction of greatest slope is its gradient. See also aspect.

Histogram

A statistical display showing for each of a series of numerical ranges the number or relative number of cases in a dataset that fall in that range. Note that the numerical ranges need not be equally sized, but that in the display they should be drawn to scale, such that the area of the column associated with each range correctly represents the number of cases in that range. Not to be confused with a bar chart .

Hypothesis testing

A procedure in inferential statistics, whereby the probability of an observed outcome is assessed in light of a proposed null hypothesis. Rejection of the null hypothesis due to a low assessed probability or p-value is provides grounds for rejection of the null hypothesis, and support for the alternative hypothesis.

Independent variable

In regression the variable (or variables) that are assumed to lead to changes in the independent variable.

Interactive brushing

In exploratory spatial data analysis interactive brushing highlights map locations and their associated points in statistical displays such as scatter plots. The GeoDa software provides a good example of the method in practice.

Interaction

A generalization of other measures (such as distance and adjacency) of the degree to which spatial locations are related. Usually, interaction is measured on a scale of 0 to 1 where 0 indicates little or no relationship and 1 indicates a strong relationship.

Interval

A scale of measurement for attribute data that expresses the attribute numerically such that the differences between cases are directly comparable, that is, a given numerical difference is equivalent regardless of the relative values of the numbers. See also nominal, ordinal, and ratio.

Inverse distance

A common method of measuring the interaction between locations is to use a mathematical formula such that the interaction is inversely proportional to the distance between the locations. Further refinement of the approach may involve raising the distance to some power before the inversion operation.

Isotropic

Spatial variation that is uniform regardless of direction is isotropic. See also anisotropic.

K function

A function in point pattern analysis based on all the inter-event distances between events in a point pattern. See also G function, K function.

Kernel density estimation

A method of density estimation making use of kernel functions. For each event to be included in the density estimates a kernel function is used to 'spread' its effect across space. A density surface is produced by summing all the individual functions across the study region.

Kernel function

A function with the general effect that points close to its center are weighted at least as strongly as those further from the center. The simplest example is a circle. More commonly used are functions that weight the center more heavily than more remote locations. The simplest such example has a conical shape with some weight applied to the center location and a uniform linear decay in all directions. More complex functions have bell shapes, and may have a Gaussian (normal curve) 'bell' shape.

Line

A line joins two or more points. Often, when more than two points form the line, it is referred to as a polyline. See also area.

Local operator

In map algebra, operators applied at only one map location. Examples include simple mathematical operators (such as negation, or squaring), more complex mathematical operators (such as sine, cosine, etc.) that can be applied to a single map layer. More commonly local operations are applied between two or more map layers. Examples include summation, maximum, minimum, average, etc. See also focal operator, global operator, zonal operator.

Logistic regression

A method of regression that assigns probabilities between 0 and 1 to an outcome (the dependent variable) based on the combination of values of input variables (the independent variables). This method is commonly applied in spatial analysis to predicting the likelihood of change in landcover type. For example, the chance of deforestation or urbanization at locations can be estimated based on other landcover types or measures. Logistic regression is a possible method of overlay analysis, or equivalently map algebra.

Map algebra

A formalisation of the mathematical manipulation of values in GIS data layers. In map algebra values of attributes at map locations are altered by operations applied locally, focally, zonally or globally to individual map layers or between two or more. See also local operator, focal operator, global operator, zonal operator.

Matrix

A grid of numbers. There are mathematical rules for adding, multiplying and inverting matrices and their use is a key topic in modern statistics and algebra. In spatial analysis matrices are often used to represent relationships among spatial entities, such as distance, adjacency, and interaction.

Mean

A measure of central tendency in statistics based determined by summing all values in a dataset and dividing by the number of values. The mean gives an idea of the typical value in a dataset, but is prone to distortion by unusually high or low outlier values. See also median, mode.

Median

A measure of central tendency in statistics based determined from the middle data value when data values are arranged in order. The median is much less affected by outlier values than the mean. See also mean, mode.

Mode

A measure of central tendency in statistics based determined by . See also mean, median.

Modifiable areal unit problem (MAUP)

When spatial data are derived from counting or averaging data within areal units, the form of those areal units affects the data recorded, and any statistical measures derived from the data. Modifying the areal units therefore changes the data. Two effects are involved: a zoning effect arising from the particular choice of areas at a given scale; and an aggregation effect arising from the extent to which data are aggregated over smaller or larger areas. The modifiable areal unit problem arises in part from edge effect.

Monte-Carlo simulation (or method)

A statistical method of generating a sampling distribution for a statistic, usually in the absence of a well understood analytically derived method. In spatial analysis it is often difficult to derive expected distributions of statistical measures, so Monte Carlo simulation is often used. Based on an understanding of the processes at work in generating the spatial data under analysis, a computer simulation is used to generate a number of synthetic datasets, which are then analyzed using the measure under test. The results from multiple synthetic datasets provide a synthetic sampling distribution against which observed measurements can be assessed.

Multidimensional scaling

A multivariate statistical method that uses statistical distances between observations to reduce the dimensionality of the data to a manageable number, usually two or three, so that data may be conveniently visualized as a scatter plot.

Nearest neighbor distance (or mean nearest neighbor distance)

For any event in a point pattern one other event is its nearest neighbor. The distance to that event is the nearest neighbor distance. By determining nearest neighbor distance for all events in a pattern, the mean nearest neighbor distance can easily be determined, and may give an indication of whether the pattern is clustered or evenly-spaced.

Neighborhood

The neighborhood of a spatial object is either a region of space considered for analysis to be associated with it, or a set of other objects considered as that object's neighbors. There is no standard way of defining neighborhoods for objects.

Nominal

A scale of measurement for attribute data that expresses the attribute only in terms of different descriptions, that is, the attribute is non-numeric. See also interval, ordinal, and ratio.

Nugget

The value of the semivariogram function at zero distance. See also range, sill.

Null hypothesis

A statement about statistical data set up by the investigator to allow statistical testing of an aspect of the data. The null hypothesis is often the opposite of what the investigator believes to be true, but is put forward in the hope that the data will contradict it. When this happens the null hypothesis is rejected and the alternative hypothesis (usually what the investigator is interested in showing) is understood to be supported by the data. This rather tortuous approach is favored because it avoids the logical problem of assuming the truth of a hypothesis on the basis of one measurement or set of measurements (since there is no logical way to be certain that measuring again wouldn't give a different answer).

Object

In an object representation of the world, geographic entities are each represented by a digital object, typically a geometric object (point, line or area), with associated attributes that describe the state of the entity. The object view is usually associated with vector datasets.

Outlier

Outlier values in a dataset are those that are not typical of the dataset in general. Outliers are not clearly defined but usually rely on a statistical summary of the dataset in terms of its typical values and spread. These are usually assessed using statistical measures of central tendency (see mean, median, mode) and of spread (see standard deviation, variance, quantile).

Ordinal

A scale of measurement for attribute data that expresses the attribute so that cases may be ranked in order, but precise differences are not calculable between the categories. See also interval, nominal, and ratio.

Overlay analysis

Superimposing two or more maps registered to a common coordinate system, to show relationships between features in the same study area. In a GIS context the superimposition occurs digitally, although overlay analysis has its origins in the overlay of transparent map layers in landscape planning, where the method was used to identify areas suitable for different types of development. Overlay analysis is closely related to map algebra.

Parallel coordinate plots

A graphical display for multivariate data. Each variable in the display is represented as one of a set of parallel axes. Each observation is shown by a 'string' connecting points on the axes corresponding to the observation's value on each variable. Increasingly common in exploratory spatial data analysis.

Pattern

A characteristic arrangement of spatial objects, possibly produced by a spatial process. See also point pattern.

Point

A spatial object with no dimension and just a single location. In GIS, usually only two geographic coordinates define a point, although three is increasingly common. See also line, area.

Point pattern

A set of events in a study region. An event is an observation of the type we are interested in, at some point location
A point pattern must be mapped on a plane, with an objectively determined study area. It should also not be a sample, but a census, with one to one correspondence between events and the objects of interest. This means that event locations should be 'proper', not, for example the centroids of polygons.

Point pattern analysis

Any of a number of methods of statistically assessing a point pattern. Classic point pattern analysis is concerned with determining whether a point pattern is could have been produced by a stochastic spatial process, and if not, whether it tends to be clustered or evenly-spaced .See quadrat analysis, F function, G function, K function, and nearest neighbor distance. See also cluster detection.

Polygon

A closed multi-point line, that is, one where the 'last' point on the line is the same as the 'first'. See also area.

Population

In statistics the (potentially infinite) set of cases that defines the object of study. Populations cannot usually be studied in any practical way, and so a sample from the population is studied instead. Inferential statistics is concerned with what can be inferred about a population from the study of a sample.

Process

See spatial process.

Proximity polygon

For a set of non-overlapping spatial objects, a set of polygons can be defined such that each polygon encloses the area closest to one of the spatial objects. Proximity polygons are usually defined for a set of points, and are also known as Voronoi polygons and Thiessen polygons. See also Delaunay triangulation.

P-value

The probability associated with the observed (actual) result of measuring a sample statistic, relative to some hypothesis. When a null hypothesis has been established, if the measured p-value is low, it means that the observation actually made would be unlikely to have occurred if the null hypothesis were true. Depending on how low the p-value is (0.05 is the most common standard used) the null hypothesis can be rejected, thus providing support for the alternative hypothesis. See also hypothesis testing.

Quadrat analysis

A method of point pattern analysis in which the study region is sampled using a set of similar shapes (quadrats) and counting the number of events in each. Analysis of the resulting quadrat counts can help determine whether the pattern is evenly spaced or clustered. Quadrat analysis may be based on a set of randomly located quadrats (common in fieldwork) or on a census where the quadrats fill the study regions without overlaps.

Quantile

A quantile in a dataset is a subset of the data obtained by first arranging observations in rank order, and then dividing the data into a number of equal-sized sets. Common quantiles are quartiles (when there are four quantiles) and deciles (when there are ten). Quantiles are used as one basis for determining class boundaries in choropleth mapping when each quantile is colored differently. They also form the basis for one measure of spread in statistics when the inter-quartile range may be used. The data is rank ordered and divided in two (the middle value is the median). Each half of the data is also divided in two to give two more values, called the upper and lower quartile. The difference between these values is the inter-quartile range (IQR).

Queen's case adjacency

See adjacency.

Range

(1) The range of a dataset is the difference between its highest and lowest values.

(2) A feature of the semivariogram. The range is the distance beyond which there is no particular relationship between data values that can be distinguished from the variance in the dataset. See also sill, nugget.

Raster GIS

In a raster GIS, data are stored as grids of attribute values. Such systems are particularly suited to handling field data. See also vector GIS.

Raster

A term for a grid.

Ratio

A scale of measurement for attribute data that expresses the attribute numerically such that the relative values of cases expressed as a ratio are meaningful across the full range of data values. See also interval, nominal, and ordinal.

Realization

Each time a spatial process runs, the outcome pattern is a realization.

Regression

A statistical method that expresses average variation in one variable (attribute) in terms of one or more independent variables (or attributes). The result of a regression analysis summarizes how much we expect the value of the dependent variable to change given change in the independent variable(s). May also form the basis of a type of model-based overlay analysis. See also logistic regression.

Rook's case adjacency

See adjacency.

Rotation

One component of an affine transformation is a rigid rotation of the coordinate system about a defined origin, which preserves the shape and spatial relations among all spatial objects.

Sample

A subset of the study population used as a basis study of the whole population. Statistics is concerned with understanding the relationships between measurements made on a sample and the likely values of those measurements for the population from which the sample was drawn.

Sampling distribution

Any statistic measured for a sample is subject to sampling variation: if a different sample had been taken a different value of the statistic would have been arrived at. The sampling distribution for a particular statistic describes the expected variability in the sample statistic usually given information about the sample size, and on the assumption that the sample is random. The sampling distribution of many statistics conforms to the normal distribution.

Scalar

Scalar values have magnitude only, and as such can be expressed with a single a numeric value. See also vector.

Scaling

One component of an affine transformation is a change in scale, which preserves the shape and relative spatial relations among all spatial objects but may increase or decrease their size.

Scatter plot

A statistical display showing values of two (sometimes three) variables plotted relative to a set of orthogonal axes.

Scatter plot matrix

A statistical display consisting of multiple scatter plots each showing two variables plotted against one another. Used to show the relationships among many variables. Common in exploratory spatial data analysis.

Second order spatial variation

Second order variation in a spatial pattern is that part of the pattern that may be attributed to interaction effects among elements in the pattern. Alternatively, it is what is left of the pattern once first order variation or trend effects have been removed. Against some back ground trend, the second order effects are more localized variations in pattern intensity.

Semivariogram

A mathematical function that summarizes the relationship between spatial difference (distance) and attribute difference (variance) in a spatial data set. Derivation of the semivariogram is generally complex but may start with the variogram cloud. Features of the semivariogram are the nugget, sill, and range.

Sill

A feature of the semivariogram. The sill is the steady value of the semivariogram function at distances greater than the range. See also nugget.

Slope

The rate at which a surface increases with distance. See also aspect, gradient.

Spatial autocorrelation

See autocorrelation.

Spatial cluster identification

See cluster detection.

Spatial data types

See point, line, area and field.

Spatial dependence

That property of any process that leads to its varying in a systematic way across space. Spatial dependence may exhibit first order or second order effects.

Spatial process

A mechanism for creating spatial patterns (see point pattern). The mechanism may be summarized by a mathematical formula, by a verbal description, or implemented in a computer program. The relationship between spatial processes and the spatial patterns they generate is central to classical spatial analysis. Processes may be deterministic or stochastic.

Spatial relationships

See distance, adjacency, neighborhood, and interaction.

Standard deviation

A measure of spread in statistics. The standard deviation is calculated as the square root of the variance. The standard deviation has most relevance in relation to the normal distribution when 68.3% of observations are within one standard deviation, 95.5% of observations are within two standard deviations, and 99.7% of observations within three standard deviations either side of the mean value. While these relationships only hold precisely for a true normal distribution, they form the basis for a rule-of-thumb that observations more than three standard deviations away from the mean data value are outliers.

Statistical displays

See histogram, box plot, bar chart, parallel coordinate plot and scatter plot.

Statistical distance

A generalization of the concept of distance which measures the degree of difference between two observations by combining their difference on one or more variable in some defined mathematical way. The most common way of arriving at a statistical distance is to treat the set of attribute values for each observation as a set of coordinates in a multidimensional space, to calculate the difference in each attribute and combine them into a single distance using Pythagoras's theorem. Statistical distance is the central concept in most forms of multivariate statistical analysis.

Statistical significance

The statistical significance of a measurement is judged according to its p-value, so that only those observations deemed unlikely relative to the known or assumed probabilities of the process being measured are considered significant.

Statistical spatial analysis

See classical spatial analysis.

Stochastic

In a spatial process, variation between realizations may occur due to chance variation. A process that incorporates chance elements in its definition is termed stochastic. The stochastic elements in a process do not imply that real spatial processes or a matter of chance, bt that some elements may be poorly understood, or that the process may be subject to variations arising from the interactions of multiple elements, which while not due to chance, appear due to chance, or can be effectively modeled as if they were due to chance. See also deterministic.

Suitability mapping

A variant of overlay analysis particularly concerned with identifying geographic areas suitable for some purpose (residential development or storage of nuclear waste, for example), based on the combination of two or more geographically varying factors.

Surface

See field.

Thiessen polygon

See proximity polygon.

Tobler's ‘First law of geography’

The 'first law' is not a law, it is an observation, namely that while everything is related to everything else, near things tend to be more related than distant things. This observation is fundamental to the analysis of geographic data. It foreshadows autocorrelation. More fundamentally it is what makes geography (and hence spatial data) worth studying at all! If the law were not generally true, then there would be no particular spatial pattern to phenomena, and the geographic study of patterns would be pointless.

Translation

One component of an affine transformation is a translation, which preserves the shape and relative spatial relations among all spatial objects but moves them relative to coordinate systems.

Trend surface

A trend surface summarizes the first order variation in spatial pattern. Trend surfaces may be used in kriging, and can be developed by applying regression with spatial coordinates as the independent variables.

Triangulated irregular networks (TIN)

A method of storing field data. Control points are stored with attribute values recorded, and the points are connected together in a triangular mesh (often a Delaunay triangulation). A TIN has the dual advantages of compactness relative to a digital elevation model, and ease of visualization, since triangular surfaces lend themselves to the calculations required for rapid display in 3D.

Variance

A measure of spread in statistics. Variance is calculated as the average of the squared difference between each data value and the mean. Variance is not used as frequently as the standard deviation as a summary statistic because the squaring operation makes it hard to interpret. Variance is the basis for the semivariogram.

Variogram cloud

A scatter plot showing for every pair of cases in a spatial dataset the distance between them (horizontal axis) and the squared difference between their values. The variogram cloud gives an idea of the spatial dependence in a dataset, and is the basis for calculation of the semivariogram.

Vector

A quantity having both size and direction. An example is wind which has both a speed and a direction. A two dimensional vector can be described by two values, while in general a quantity described by n numbers is a n-dimensional vector. See also scalar.

Vector field

A field where each value is a vector.

Vector GIS

A GIS that stores data as geometric shapes i.e. points, lines, and areas. See also raster GIS.

Viewshed

The area that is visible from a specified location.

Voronoi polygon

See proximity polygon.

Weights matrix

A matrix where each element is the interaction weight between the spatial objects corresponding to the row and column position of the element.

Weights-of-evidence

A method of overlay analysis based on the combined probability of different data attributes occurring at the same locations.

Zonal operator

In map algebra, an operation based on a location and all other locations in a zone associated with that location. See also focal operator, global operator, local operator.