LESSON 4: POINT PATTERN ANALYSIS

Lesson 4 Overview

Introduction

In the previous lesson we saw how a spatial process can be described in mathematical terms so that the patterns it is expected to produce can be predicted. In this lesson we apply this knowledge to the analysis of point patterns. Point pattern analysis is the application in which these ideas are most thoroughly developed, so it is the best place to learn about this approach.

Point pattern analysis has become an extremely important application in recent years, particularly in crime analysis, in epidemiology, and in facility location planning and management. Point pattern analysis also goes all the way back to the very beginning of spatial analysis in Dr. John Snow's work on the London cholera epidemic of 1854.

See http://www.ph.ucla.edu/epi/snow.html for more information about Snow's work.

Learning Objectives

By the end of this lesson, you should be able to

define point pattern analysis and list the conditions necessary for it to work well
explain how quadrat analysis of a point pattern is performed and distinguish between quadrat census and a quadrat sampling methods
discuss relevant factors in determining an appropriate quadrat size for point pattern analysis
describe in outline kernel density estimation and understand how it transforms point data into a field representation
describe distance-based measures of point patterns (mean nearest neighbor distance and the G, F and K functions)
explain how distance-based methods of point pattern measurement are derived from a distance matrix
describe how the independent random process and expected values of point pattern measures are used to evaluate point patterns, and to make statistical statements about point patterns
explain how Monte Carlo methods are used when analytical results for spatial processes are difficult to derive
justify the stochastic process approach to spatial statistical analysis
discuss the merits of point pattern analysis versus cluster detection, and outline the issues involved in real world applications of these methods

Reading Assignment

This week the reading is detailed, demanding, and long. I therefore recommend that you start it as soon as possible, and also that you read the material twice. First time through, you should quickly skim the material to become familiar with the overall plan. On the second pass, you should read more closely taking note of the details. Whatever you do, don't leave the reading to the last minute this week!

Chapter 4, "Point Pattern Analysis," pages 77-114
Chapter 5, Section 5.1, "Practical Point Pattern Analysis," pages 115-123
You need only read the first section of Chapter 5 (noted below), although I hope that you will find all of it useful as a guide to current and future developments in point pattern analysis.

After you've completed the reading, or at the very least skimmed the material, get back online and supplement your reading from the commentary material, then test your knowledge with the self-test quizzes.

Lesson 4 Deliverables

This lesson is one week in length. The following items must be completed by the end of the week. See the Calendar tab, above, for the specific date.

Complete the two self-test quizzes satisfactorily (you have an unlimited number of attempts and must score 90% or more).
Complete the Project 4 activities. This involves running and interpretting point pattern analysis of some crime data for St Louis. (The materials for Project 4 can be found under the Lessons tab, in the Lesson 4 folder.)
There is no specific activity in the quarter-long project this week, as the core of this week's lesson will keep you busy enough. However, don't forget that you have to submit a review of two other research proposals next week—I will tell you which ones this week.

Questions?

If you have any questions now or at any point during this lesson, please feel free to post them to the Lesson 4 thread on the Lesson Content Discussion Forum .(That Discussion Forum can be accessed at any time by clicking on the Communicate tab, above, and then scrolling down to the Discussion Forums section.)

Ready to continue? Click on the "Next" link, above, to continue with this lesson.

LESSON 4: POINT PATTERN ANALYSIS

Commentary - Chapter 4, "Point Pattern Analysis"

Section 4.2, "Describing a point pattern," pages 79-81

It should be pointed out that the distinction between first- and second-order effects is a fine one. In fact, it is often scale-dependent, and often an analytical convenience, rather than a hard and fast distinction. This becomes particularly clear when you realize that an effect that is first-order at one scale may become second-order at a smaller scale (that is, when you 'zoom out').

The simplest example of this is when a (say) east-west steady rise in land elevation viewed at a regional scale is first-order, but zooming out to the continental scale, this trend becomes a more localized topographic feature. This is yet another example of the scale-dependence effects inherent in spatial analysis and noted in Lesson 1.

Section 4.3, "Density-based point pattern measures," pages 81-88

It is worth emphasizing the point that quadrats need not be square, although it is rare for them not to be.

With regard to kernel density estimation (KDE) it is worth pointing out the strongly scale-dependent nature of this analysis method. This becomes apparent when we view the effect of varying the KDE bandwidth on the estimated event density map in the following sequence of maps, all generated from the same pattern of homicide events in St. Louis, Missouri downtown in 1982.

Contact your instructor if you are unable to see or interpret this graphic

Using a large KDE bandwidth results in a very generalized impression of the event density.

Contact your instructor if you are unable to see or interpret this graphic

The map generated using a small KDE bandwidth is also problematic, as it focuses too much on individual events.

Contact your instructor if you are unable to see or interpret this graphic

An intermediate choice of bandwidth results in a more satisfactory map that enables distinct regions of high density of events (clusters) to be identified.

Section 4.4, "Distance-based point pattern measures," pages 88-95

It may be helpful to briefly distinguish the four major distance methods discussed here:

Mean nearest neighbor distance is exactly what the name says!
G function is the cumulative frequency distribution of the nearest neighbor distance. It gives the probability for a specified distance, that the nearest neighbor distance to another event in the pattern will be less than the specified distance.
F function is the cumulative frequency distribution of the distance to the nearest event in the pattern from random locations not in the pattern.
K function is based on all inter-event distances, not simply nearest neighbor distances. Interpretation of the K function is tricky for the raw figures and makes more sense when statistical analysis is carried out as discussed in a later section.

It is useful to see these measures as forming a progression from least to most informative (with an accompanying rise in complexity).

Section 4.5, "Assessing point patterns statistically," pages 95-108

The measures discussed in the preceding two sections can all be tested statistically for deviations from the expected values associated with a random point process. In fact deviations from any well defined process can be tested, although the mathematics required becomes more complex.

This section simply outlines how each of the measures described in previous sections may be tested statistically. The most complex of these is the K function, where the additional concept on an L function is introduced to make it easier to detect large deviations from a random pattern.

More important, in practical terms is the Monte Carlo procedure discussed on pages 104-108. Monte Carlo methods are common in statistics generally, but are particularly useful in spatial analysis when mathematical derivation of the expected values of a pattern measure can be very difficult. Instead of trying to derive analytical results, we simply resort to the computer's ability to randomly generate many patterns according to the process description we have in mind, and then compare our observed result to the simulated distribution of results. This approach is explored in more detail in the project for this lesson.

Ready? Take the Chapter 4 Quiz to check your knowledge! Click on the "Next" link, above, to access the self-test quiz for Chapter 4. You have an unlimited number of attempts and must score 90% or more.

Ready to continue? Click on the "Next" link, above, to begin the Chapter 4 Quiz.

LESSON 4: POINT PATTERN ANALYSIS

Commentary - Chapter 5, Section 5.1, "Point Pattern Analysis Versus Cluster Detection"

The key issue here is that classic point pattern analysis allows us to say that a pattern is ' evenly-spaced' or ' clustered' relative to some null spatial process (usually the independent random process), but it does not allow us to say where the pattern is clustered. This is important in most real world applications. A criminal investigator takes it for granted that crime is more common at particular 'hotspots', i.e., that the pattern is clustered, so statistical confirmation of this assumption might be nice, but it is not particularly useful. However, an indication of where the crime hotspots are located is definitely useful.

The problem is that detecting clusters in the presence of background variation in the affected population is very difficult. This is especially so for rare events. You can get some idea of the degree of difficulty from the description of the Geographical Analysis Machine (GAM) on pages 119-122. Although GAM has not been widely adopted by epidemiologists, the approach suggested by it was ground-breaking and other more recent tools use very similar methods. (See the optional 'Try This' box below for more on this.)

The basic idea is very simple: repeatedly examine circular areas on the map and compare the observed number of events of interest to the number that would be expected under some null hypothesis (usually spatial randomness). Tag all those circles that are statistically unusual. That's it!

Three things make this conceptually simple procedure tricky.

First, is the statistical theory associated with determining an expected number of eventsperhaps dependent on a number of spatially varying covariates of the events of interest, such as populations in different age subgroups. Thus, for a disease (say) associated with older members of the population, we would naturally expect to see more cases of the disease in places where more older people live. This has to be accounted for in determination of the number of events expected.
Second, there are some conceptual difficulties in carrying out multiple statistical significance tests on a series of (usually) overlapping circles. The rather sloppy statistical theory in the original presentation of the GAM goes a long way to explaining the reluctance of statistical epidemiologists to adopt the tool, even though more recent tools are rather similar.
Third, is the enormous amount of computation required for exhaustive searching for clusters. This is especially so if stringent levels of statistical significance are required, since many more Monte Carlo simulation runs are then required.

If you are interested, take a look at the SatSCAN website. SatSCAN is a tool developed by the Biometry Research Group of the National Cancer Institute in the United States. SatSCAN works in a very similar way to the original GAM tool, but has wider acceptance among epidemiological researchers. You can download a free copy of the software and try it on on some sample data.

Ready? Take the Section 5.1 Quiz to check your knowledge! Click on the "Next" link, above, to access the self-test quiz for Section 5.1. You have an unlimited number of attempts and must score 90% or more.

Ready to continue? Click on the "Next" link, above, to begin the Section 5.1 Quiz.

PROJECT 4: POINT PATTERN ANALYSIS

Overview

Background

In this week's project you will use some of the point pattern analysis tools available in ArcGIS together with one we've made specially for this course to investigate a point pattern of crime events in St. Louis.

Project Resources

The ArcMap template file and data files you need for Project 4 are available here for download. If you have any difficulty downloading these files, please contact me.

PtPatternAnalysis_v9.2.mxt is an ArcMap template file with additional custom functionality to support quadrat analysis. (That file is 783 Kb and will take approximately 2 minutes to download over a 56 Kbps modem.) Many, many thanks to Jim Detwiler for programming this file!
project4_ptData.zip is a zip file that contains two shape files: gunHomicide.shp records the locations of homicides in St. Louis, Missouri, in 1982; and attemptedStreetRobbery.shp, which records the location of incidents of attempted 'non-residential burglary' over the same period. (That file is 14 Kb and should be quick to download even over a 56 Kbps modem. Once you have downloaded the file, double-click on the project4_ptData.zip file to launch WinZip, PKZip, 7-Zip, or another file compression utility. Follow your software's prompts to decompress the file.)
project4_data.zip is a zipped version of an ArcGIS geodatabase file StLouisCrime.mdb , with layers of background topography, principally the street network, for St. Louis, Missouri. This serves no actual purpose in the analysis as such, but gives some context for the exercise, and may help in your discussions of results. (That file is 1.9 Mb and will take around 5 minutes to download over a 56 Kbps modem. Once you have downloaded the file, double-click on the project4_data.zip file to launch WinZip, PKZip, 7-Zip, or another file compression utility. Follow your software's prompts to decompress the file.)

Open a new ArcMap map from the .mxt template file. Do this either:

By double-clicking on PtPatternAnalysis_v9.mxt in the explorer, or
From ArcMap by selecting File - New... and navigating to the PtPatternAnalysis.mxt template, or
From the ArcMap start up dialog, by selecting the Start Using ArcMap with a Template option.

You should immediately set the File - Map Properties... - Data Source Options... to Store Relative Path Names, and save the new project (even with no data) to a new .mxd file.

Once you've done that, load in the shape files from project4_Ptdata.zip along with the background layers in the StLouisCrime.mdb file. .

Summary of Project 4 Deliverables

For Project 4, the items you are required to submit are as follows:

Create and insert a map showing standard deviational ellipses for the two crime patterns and commentary on the relative locations of the two patterns.
Calculate nearest neighbor distance statistics for both crime patterns and comment on these results.
Create and insert maps of quadrat analyses of the two crime patterns, along with commentary on each, and details of the analysis results in each case. You have to choose the quadrat size and analysis method (census- or sample-based) and should provide some explanation of your choices.
Create density maps of the gunHomicide and attemptedStreetRobbery data. Insert the maps into your write-up along with commentary explaining your choice of parameters, particularly the bandwidth.
Finally, comment on the study area in all these examples: it has effectively been set for you by the dataset. Do you think more extensive data would lead to different conclusions? How would the results be affected?

Questions?

If you have any questions now or at any point during this project, please feel free to post them to the Project 4 thread on the Project Discussion Forum. (That Discussion Forum can be accessed at any time by clicking on the In Touch tab, above, and then scrolling down to the Discussion Forums section.)

PROJECT 4: POINT PATTERN ANALYSIS

Familiarization with the Quadrat Analysis Tool

As part of this project, a quadrat analysis tool has been developed by Jim Detwiler. This page explains its operation.

The quadrat analysis tool is available in any map file created from the PtPatternAnalysis_v9.mxt template and appears as a toolbar:

The Point Pattern Analysis toolbar

If the toolbar is not available, right-click on the ArcMap program window and select it from the drop-down menu:

Contact your instructor if you are unable to see or interpret this graphic

The right-click menu that allows you to enable Point Pattern Analysis

Defining the study area

To define a study area for analysis click on the New Rectangle button then mouse-drag a rectangle across from one corner to the diagonally opposite corner of the rectangular study area you wish to define. This creates a graphic object that is currently selected.
To define this as the study area for either census or grid-based quadrat analysis click on Census - Create Grid or on Sampling - Define Study Area. What happens next is explained in the next two sections.

If you want to define a quadrat analysis with precise dimensions, then before clicking on the Create Grid or Define Study Area menu items, you should right-click on the rectangle, and select Properties - Size and Position and define the width and height you want. You can also define how the rectangle is drawn. Making it transparent will make it easier to position it precisely where you want by dragging.

Defining a grid for census-based analysis and running the analysis

The Census - Create Grid option asks you to define the parameters for a regular rectangular grid inside the study area rectangle just defined. This is done in terms of the required numbers of rows and columns of (rectangular) quadrats.

Once you have defined the number of rows and columns, a new shapefile will be created and added to the map.

To use the new shapefile for quadrat analysis, click Census - Run Analysis. In the dialog that appears specify the point layer containing the pattern to analyze, specify whether or not to use an attribute of each point as an event count, and specify the shapefile containing the census quadrats to use. Then click on Calculate Stats to run the analysis.

Use this dialog to define parameters for the quadrat analysis.
When the analysis is complete, the following results window will appear:

Results from quadrat analysis. These are formatted in a similar way to Table 4.3 on page 99 of the course text.
You can save the results to a text file in a spreadsheet readable format by clicking the Export button.

The analysis adds a field to the census grid shapefile named 'K' which, for each quadrat records how many events occurred in that quadrat. You may find it helpful in understanding the method to color the layer using this attribute.

Defining a quadrat for sample-based analysis

For sample-based analysis, once you have defined a study region with the New Rectangle tool, by clicking on Sampling - Define Study Area, you should again click on New Rectangle and draw a new rectangle, then click on Sampling - Define Quadrat to specify that this is the quadrat shape required.
Once both shapes are defined, click on Sampling - Run Analysis to calculate the results as for a census-based analysis.
When you have completed sample-based analysis, you can remove all the graphic objects it creates using the Remove Count Labels and Remove Quadrats buttons. Note that the Remove Quadrats button will leave both the study area rectangle and one quadrat intact, in case you want to repeat the analysis. You should select and delete these by hand if you want a 'clean' display.
You can make non-rectangular quadrats for sample-based analysis using the ArcMap drawing tools. However, the Remove Quadrats button will not work properly on them, and you will have to clean up by hand.

Exporting the results to a text file and determining a p value

Click on Export in the Quadrat Count Statistics dialog to create a tab-separated text file summarizing the analysis results.
Read the tab-separated text file with a spreadsheet program to determine the p value associated with the analysis. In Openoffice.org Calc or Microsoft Excel the function required to calculate this is CHIDIST.
You can also use a spreadsheet program to plot a histogram of the analysis results, which you may also find helpful.

PROJECT 4: POINT PATTERN ANALYSIS

Kernel Density Analysis

In this part of the project, you use built-in ArcMap Spatial Analyst functionality to help understand the crime data. Two maps and accompanying commentary are required. Both are made using the Spatial Analyst - Density... tool.

First, open the Spatial Analyst - Density... tool. You will see the following dialog box:
The density estimation dialog box. Specify parameters for kernel density estimation here (see text).
Specify the following parameters:
- Input data - the point pattern data set
- Population field - the attribute that includes a count of the number of events occurring at one location.
- Density type - Kernel or Simple, as discussed in the text. In this project you should select Kernel .
- Search radius - the kernel bandwidth.
- Area units - the units that will be used in the density estimate calculation.
- Output cell size - the resolution of the grid across which density estimates will be made.
- Output raster - a file name for saving the analysis result permanently.
Use this dialog for the analyses outlined below.
- Create density maps of the gunHomicide and attemptedStreetRobbery data. Place the maps in your write-up along with commentary explaining your choice of parameters, particularly the bandwidth.
- Finally, comment on the study area in all these examples: it has effectively been set for you by the dataset. Do you think more extensive data would lead to different conclusions? How would the results be affected.

PROJECT 4: POINT PATTERN ANALYSIS

Checklist of Project 4 Deliverables

Here is a summary of the deliverables for Project 4, to be posted as a PDF (preferably) or MS Word document. Once posted, please provide a link to the document in the discussion forum "Posted project links" thread for this week. Note that if you don't have access to any webspace for posting, that you can simply attach the document to your post to the discussion forum. Make sure you have completed each item!

Create a map showing standard deviational ellipses for the two crime patterns and commentary on the relative locations of the two patterns.
Calculate nearest neighbor distance statistics for both crime patterns and report and comment on the results.
Create maps of quadrat analyses of the two crime patterns, along with commentary on each, and details of the analysis results in each case. You have to choose the quadrat size and analysis method (census- or sample-based) and should provide some explanation of your choices.
Create density maps of the gunHomicide and attemptedStreetRobbery data. Put the maps in your write-up along with commentary explaining your choice of parameters, particularly the bandwidth.
Finally, comment on the study area in all these examples: it has effectively been set for you by the dataset. Do you think more extensive data would lead to different conclusions? How would the results be affected?

That's it for Project 4!

QUARTER-LONG PROJECT

Week 4: Beginning the Peer Review Process

There is no specific deliverable for this week, however you should use this week to begin the peer review process for the preliminary proposals. Early this week I will send an email letting you know which two other student's proposals you have been assigned to review. Begin by looking at the two proposals you have been assigned to review as posted on the 'Project Initial Proposal discussion board' (you can get to this by clicking on 'Previous' above). Then, simply post your comments as a response to the assigned project proposal message. Your peer reviews are due by the end of Week 5. (Although you are welcome to post them at any point between now and then.)

You should consider the following aspects in writing comments for the authors of the proposals:

Are the goals reasonable and achievable? It is a common mistake to aim too high and attempt to do too much. Suggest possible amendments to the proposals' aims that might make them more achievable in the time frame.
Are the data adequate for the task proposed? Do you foresee problems in obtaining or organizing the data? Suggest how these problems could be avoided.
Are the proposed analysis methods appropriate? Suggest alternative methods, or enhancements to the proposed methods that would also help.
Provide any additional input that you feel is appropriate. This could include suggestions for additional outputs (e.g., maps) not specifically mentioned by the author, or suggestions as to further data sources, relevant things to read, relevant other examples to look at, and so on.

Remember... you will be receiving two reviews from other students of your own proposal, so you should include the types of useful feedback that you would like to see in those commentaries. Criticism is fine, provided that it includes constructive inputs and suggestions. If something is wrong, how can it be fixed?

Meanwhile, I will be reviewing the preliminary proposals, and providing each of you with feedback and suggestions. I will aim to complete my reviews and mail them to you this week.

Questions?

If you have any questions now or at any point during this project, please feel free to post them to the Quarter-long Project Discussion Forum. (That Discussion Forum can be accessed at any time by clicking on the In Touch tab, above, and then scrolling down to the Discussion Forums section.)

Lesson 4 Overview

LESSON 4: POINT PATTERN ANALYSIS

Lesson 4 Overview

Introduction

Learning Objectives

Reading Assignment

Lesson 4 Deliverables

Questions?

LESSON 4: POINT PATTERN ANALYSIS

Commentary - Chapter 4, "Point Pattern Analysis"

Section 4.2, "Describing a point pattern," pages 79-81

Section 4.3, "Density-based point pattern measures," pages 81-88

Section 4.4, "Distance-based point pattern measures," pages 88-95

Section 4.5, "Assessing point patterns statistically," pages 95-108

LESSON 4: POINT PATTERN ANALYSIS

Commentary - Chapter 5, Section 5.1, "Point Pattern Analysis Versus Cluster Detection"

LESSON 4: POINT PATTERN ANALYSIS

Final Activities for Lesson 4

PROJECT 4: POINT PATTERN ANALYSIS

Overview

Background

Project Resources

Summary of Project 4 Deliverables

Questions?

PROJECT 4: POINT PATTERN ANALYSIS

Standard Deviation Ellipses for the Crime Patterns

PROJECT 4: POINT PATTERN ANALYSIS

Mean Nearest Neighbor Distance Analysis for the Crime Patterns

PROJECT 4: POINT PATTERN ANALYSIS

Familiarization with the Quadrat Analysis Tool

Defining the study area

Defining a grid for census-based analysis and running the analysis

Defining a quadrat for sample-based analysis

Exporting the results to a text file and determining a p value

PROJECT 4: POINT PATTERN ANALYSIS

Quadrat Analysis of the Two Crime Patterns

PROJECT 4: POINT PATTERN ANALYSIS

Kernel Density Analysis

PROJECT 4: POINT PATTERN ANALYSIS

Checklist of Project 4 Deliverables

That's it for Project 4!

QUARTER-LONG PROJECT

Week 4: Beginning the Peer Review Process

Questions?