Tuesday, May 18, 2010

Kern County Nearest Neighbor

Here is my attempt to replicate the Afghanistan in-class practice using census data for Kern County. For some reason, I was unable to merge demographic data with the shape-file. This means that I don't really have anything too exciting to share or reveal.

library(maptools)

library(spdep)

library(classInt)

library(RColorBrewer)


census <- readShapePoly("/Users/carrielevan/Documents/Geo299/ShapeFileHW/06029_Kern_County/tl_2009_06029_bg00/tl_2009_06029_bg00.shp" ,proj4string=CRS("+proj=longlat"))


#summarize new R object;

summary(census)


#view shapefile

plot(census)























coordinates(census)


#put centroids into a file and make it a data frame;

centers = coordinates(census)

centers = data.frame(centers)


#plot the coordinates

points(centers,col="blue",cex=1.2)



























#Adding Labels

text(centers,labels=rownames(centers),cex=1.5)
























#MEASURING SPACE: nearest neighbors & distance-based neighbors


kern.centers = coordinates(census)


#how many neighbors, k, are of interest? Why?

k=1


#determine the k nearest neighbors for each point in afghan.centers;

knn1 = knearneigh(kern.centers,k,longlat=T)


#create a neighbors list from the knn1 object;

kern.knn1 = knn2nb(knn1)


#map k-nearest neighbors;

plot(census)

plot(kern.knn1,kern.centers,col="blue",add=T)











#2 Nearest Neighbors

k=2

knn2 = knearneigh(kern.centers,k,longlat=T)

kern.knn2 = knn2nb(knn2)

plot(census)

plot(kern.knn2,kern.centers,col="green",add=T)











#3 Nearest Neighbors

k=3

knn3 = knearneigh(kern.centers,k,longlat=T)

kern.knn3 = knn2nb(knn3)

plot(census)

plot(kern.knn3,kern.centers,col="red",add=T)























k=4

knn4 = knearneigh(kern.centers,k,longlat=T)

kern.knn4 = knn2nb(knn4)

plot(census)

plot(kern.knn4,kern.centers,col="purple",add=T)


























k=5

knn5 = knearneigh(kern.centers,k,longlat=T)

kern.knn5 = knn2nb(knn5)

plot(census)

plot(kern.knn5,kern.centers,col="pink",add=T)


























##Neighbors based on distance in kilometers

d = .05

kern.dist.05 = dnearneigh(kern.centers,0,d,longlat=T)

plot(census)

plot(kern.dist.05,kern.centers,add=T,lwd=2,col="green")


























d = .1

kern.dist.1 = dnearneigh(kern.centers,0,d,longlat=T)

plot(census)

plot(kern.dist.1,kern.centers,add=T,lwd=2,col="blue")



























d = .25


#create a distance based neighbors object (kern.dist.25) with a .25km threshold;

kern.dist.25 = dnearneigh(kern.centers,0,d,longlat=T)


#map neighbors based on distance;

plot(census)

plot(kern.dist.25,kern.centers,add=T,lwd=2,col="red")


























#obtain summary report of afghan.dist.100 object

summary(kern.dist.25)







Tuesday, May 11, 2010

R and ArcGIS United


Modifiable Areal Unit Problem (MAUP)
The modifiable areal unit problem (MAUP) is a statistical bias that is caused by the selection of district boundaries. Since boundaries in spatial analysis (ex. Census districts, tracts, blocks, etc.) are not randomly selected, correlations between two variables may appear to exist; however, the correlation between the two variables may be symptomatic of the choice of boundaries.

The following series of maps helps to illustrate MAUP, because they show how the choice of boundaries affects the appearance or lack of appearance of correlations between spatial locations.

Ecological Fallacy
The ecological fallacy is the use of aggregate data to draw inferences about individuals. In this fallacy, one infers that the group's mean of a given characteristic is descriptive of an individual's characteristic. This is wrong to do, because it assumes that groups are homogenous.

The following series of maps helps to illustrate the Ecological Fallacy as well. They show how reducing the size of a group to the individual level reveals the heterogeneity of individuals.

My Simulation
In my simulation, I created a world where 250 X values and 250 Y values were both randomly drawn from a normal distribution with a mean of 50 and a standard deviation of 10. Using the X and Y coordinates, I created a hypothetical world where each point on a scatterplot represented a single occurrence of Swine Flu case within a geographic location. Figure 1 is a scatterplot I created in R to reveal the actual distribution of "Swine Flu" cases, in my hypothetical world.

Figure 1 Distribution of Swine Flu Cases in Hypothetical Territory.


















When I create maps that are intended to illustrate the distribution of Swine Flu cases in a given geographic area, the distribution in those maps should mirror the actual distribution that I know to exist in the world that I created.

In ArcMaps, I created a series of 5 maps with varying resolutions, or district sizes. Each map contains a higher resolution or more districts, starting with 9 districts and ending with 625.

MAP 1 3x3 Resolution: This map reveals the general pattern that is illustrated in Figure 1 (most cases found in the center and then diminishing as one moves to the perimeters); however, it fails to reveal the diversity of cases within the overall territory. This map is a good example of the ecological fallacy, because the visual gives the impression that each of these districts are compiled of homogenous individuals.

This map also illustrates MAUP. By creating a world with very few districts, I have actually increased the correlation between location and the likelihood of contracting the Swine Flu virus. I know that as I increase the number of districts the correlation between these two variables will decrease.



















Map 2 5x5 Resolution: The problem with the first map is still present in this second map. By increasing the number of districts from 9 to 25, there is no new information being told to the viewer. The only difference between Map 1 and Map 2 is that there is the addition of a perimeter of districts around the exact same pattern of Swine Flu cases found in Map 1. An observer still cannot see the diversity of cases within districts.


















Map 3 10x10 Resolution: This third map finally begins to reveal the diversity of occurrences and a more truthful depiction of the actual distribution of cases found in Figure 1 (the scatterplot). Most cases are found in the center and then the cases begin to slowly diminish as you move to the perimeter. This map closely resembles the true distribution.

This map also reveals how increasing the number of districts has decreased the correlation between location and likelihood of contracting the virus. The relationship between the two variables is more accurately depicted in this map than in the previous two maps.


















Map 4 15x15 Resolution: This fourth map best matches the actual distribution of Swine Flu cases found in Figure 1. The map shows that most occurrences of Swine Flu occurred in the center of the territory, but were not completely isolated in one or two districts within the territory. As you move away from the center, the number of observed cases diminishes, but at a slower rate than has been previously illustrated in the former maps. This pattern is what is found in Figure 1.


















Map 5 25x25 Resolution: This final map best exemplifies MAUP. The impression from this map is that there is no relationship between districts and Swine Flu occurrences. However, we already know that there is a relationship. We know that those living within the center of this territory were more likely to contract the virus than those who lived in the perimeter. This map, however, describes a world where location has no affect on the likelihood of contracting the virus. By creating too many districts, I have actually eliminated the correlation between the two variables.

Although this map gets down to the individual level, it shows how accounting for the ecological fallacy can go too far. This map implies that there are no similarities within districts and we know that those in the center of the territory were more likely to contract the virus.


















R Code

#Creating the Values of the X Variable by randomly drawing 250 numbers from a normal distribution with mean 50 and standard deviation 10.

x1<-rnorm(250, 50, sd=10)


#Creating the Values of the Y Variable by randomly drawing 250 numbers from a normal distribution with mean 50 and standard deviation 10.

y1<-rnorm(250, 50, sd=10)


#Column Binding the two vectors together into a matrix

mypoints1<-cbind(x1, y1)


#Saving the matrix into a csv File

write.csv(mypoints1, file="F:/mypoints1.csv")


Wednesday, May 5, 2010

The Effects of Church Attendance and Republican Vote



I began by investigating regions in the country. I wanted to know if religion played the same role in determining vote choice in both the Southern and non-Southern states. Furthermore, I wanted to know if this relationship has changed overtime.



















In this first image, I compare the effect of church attendance on Republican vote choice between Southern and non-Southern States between the years 1948 and 1978. What is most noticeable here is that individuals in Non-Southern states are more influenced by religion than those in Southern states. The figure tells us that as an individual attends church more frequently, he is more likely to vote for a Republican candidate. This pattern was true for individuals in the South; however, the relationship is not as strong.




















This second figure compares the same two variables (church attendance and Republican voting), but for the years 1978-1992. What is most evident in this visual is that there is a shift occurring. Religion is becoming less of a determinate of vote choice for individuals in non-southern states and more influential in Southern states.

I wanted to continue this investigation, but did not have sufficient cases to investigate this relationship for 1992-2008. However, the pattern detected in this visual implies that the shift may continue.

Since I was unsatisfied with this finding, I wanted to continue to investigate the influence of religion on vote choice. I also looked at how religion affects vote choice between different age brackets (18-30 and 65-90). Again, I wanted to track this relationship overtime, in order to see if religion has always influenced young and old voters in the same way or if there is an emerging pattern amongst today's voters.





















In this third image, I compare the effect of church attendance on vote choice between two different age brackets between the years 1948 and 1978. What is made most evident is that during this time period, older voters' vote choices were more influenced by their faith than younger voters. Despite this finding, for both groups, the more an individual attended church in this time period, the more likely he/she was to vote for a Republican Candidate.




















This fourth visual compares the exact same variables as the previous image, except during a different time period (1978-1992). What is most obvious here is that there is a shift occurring. Religion is more strongly influencing younger voters' vote choices and is not as important for older voters.




















This final graph compares the influence of religion on vote choice between the years 1992 and 2004 for old and young voters. What is most obvious in this graph is that religion is now more influential on young person's vote choice than for old. This implies that in today's political climate, the more religious a young person is the more likely he/she is to vote for a Republican.

All three of these graphs in combination help tell a story of change in American politics. They reveal that religion is influencing younger voters decisions today in way that it has not influenced past generations.






Followers