Tuesday, May 11, 2010

R and ArcGIS United


Modifiable Areal Unit Problem (MAUP)
The modifiable areal unit problem (MAUP) is a statistical bias that is caused by the selection of district boundaries. Since boundaries in spatial analysis (ex. Census districts, tracts, blocks, etc.) are not randomly selected, correlations between two variables may appear to exist; however, the correlation between the two variables may be symptomatic of the choice of boundaries.

The following series of maps helps to illustrate MAUP, because they show how the choice of boundaries affects the appearance or lack of appearance of correlations between spatial locations.

Ecological Fallacy
The ecological fallacy is the use of aggregate data to draw inferences about individuals. In this fallacy, one infers that the group's mean of a given characteristic is descriptive of an individual's characteristic. This is wrong to do, because it assumes that groups are homogenous.

The following series of maps helps to illustrate the Ecological Fallacy as well. They show how reducing the size of a group to the individual level reveals the heterogeneity of individuals.

My Simulation
In my simulation, I created a world where 250 X values and 250 Y values were both randomly drawn from a normal distribution with a mean of 50 and a standard deviation of 10. Using the X and Y coordinates, I created a hypothetical world where each point on a scatterplot represented a single occurrence of Swine Flu case within a geographic location. Figure 1 is a scatterplot I created in R to reveal the actual distribution of "Swine Flu" cases, in my hypothetical world.

Figure 1 Distribution of Swine Flu Cases in Hypothetical Territory.


















When I create maps that are intended to illustrate the distribution of Swine Flu cases in a given geographic area, the distribution in those maps should mirror the actual distribution that I know to exist in the world that I created.

In ArcMaps, I created a series of 5 maps with varying resolutions, or district sizes. Each map contains a higher resolution or more districts, starting with 9 districts and ending with 625.

MAP 1 3x3 Resolution: This map reveals the general pattern that is illustrated in Figure 1 (most cases found in the center and then diminishing as one moves to the perimeters); however, it fails to reveal the diversity of cases within the overall territory. This map is a good example of the ecological fallacy, because the visual gives the impression that each of these districts are compiled of homogenous individuals.

This map also illustrates MAUP. By creating a world with very few districts, I have actually increased the correlation between location and the likelihood of contracting the Swine Flu virus. I know that as I increase the number of districts the correlation between these two variables will decrease.



















Map 2 5x5 Resolution: The problem with the first map is still present in this second map. By increasing the number of districts from 9 to 25, there is no new information being told to the viewer. The only difference between Map 1 and Map 2 is that there is the addition of a perimeter of districts around the exact same pattern of Swine Flu cases found in Map 1. An observer still cannot see the diversity of cases within districts.


















Map 3 10x10 Resolution: This third map finally begins to reveal the diversity of occurrences and a more truthful depiction of the actual distribution of cases found in Figure 1 (the scatterplot). Most cases are found in the center and then the cases begin to slowly diminish as you move to the perimeter. This map closely resembles the true distribution.

This map also reveals how increasing the number of districts has decreased the correlation between location and likelihood of contracting the virus. The relationship between the two variables is more accurately depicted in this map than in the previous two maps.


















Map 4 15x15 Resolution: This fourth map best matches the actual distribution of Swine Flu cases found in Figure 1. The map shows that most occurrences of Swine Flu occurred in the center of the territory, but were not completely isolated in one or two districts within the territory. As you move away from the center, the number of observed cases diminishes, but at a slower rate than has been previously illustrated in the former maps. This pattern is what is found in Figure 1.


















Map 5 25x25 Resolution: This final map best exemplifies MAUP. The impression from this map is that there is no relationship between districts and Swine Flu occurrences. However, we already know that there is a relationship. We know that those living within the center of this territory were more likely to contract the virus than those who lived in the perimeter. This map, however, describes a world where location has no affect on the likelihood of contracting the virus. By creating too many districts, I have actually eliminated the correlation between the two variables.

Although this map gets down to the individual level, it shows how accounting for the ecological fallacy can go too far. This map implies that there are no similarities within districts and we know that those in the center of the territory were more likely to contract the virus.


















R Code

#Creating the Values of the X Variable by randomly drawing 250 numbers from a normal distribution with mean 50 and standard deviation 10.

x1<-rnorm(250, 50, sd=10)


#Creating the Values of the Y Variable by randomly drawing 250 numbers from a normal distribution with mean 50 and standard deviation 10.

y1<-rnorm(250, 50, sd=10)


#Column Binding the two vectors together into a matrix

mypoints1<-cbind(x1, y1)


#Saving the matrix into a csv File

write.csv(mypoints1, file="F:/mypoints1.csv")


No comments:

Post a Comment

Followers