{"id":80,"date":"2016-12-30T08:55:25","date_gmt":"2016-12-30T08:55:25","guid":{"rendered":"https:\/\/sqlshep.com\/?p=80"},"modified":"2017-01-29T23:52:25","modified_gmt":"2017-01-29T23:52:25","slug":"visualization-gateway-drug-ii","status":"publish","type":"post","link":"https:\/\/sqlshep.com\/?p=80","title":{"rendered":"Visualization, The gateway drug II"},"content":{"rendered":"<p>In the last blog you were able to get a dataset with county and population data to display on a US map and zoom in on a state, and maybe even a county if you went exploring.\u00a0 In this demo we will be using the same choroplethr package but this time we will be using external data.\u00a0 Specifically, we will focus on one state, and check out the education level per county for one state.<\/p>\n<p>The data is hosted by the <a href=\"https:\/\/data.ers.usda.gov\/reports.aspx?ID=18243\">USDA Economic Research Division<\/a>,\u00a0 under Data Products \/ County-level Data Sets.\u00a0 What will be demonstrated is the proportion\u00a0of the population who have completed college, the datasets &#8220;completed some college&#8221;, &#8220;completed high school&#8221;, and &#8220;did not complete high school&#8221; are also available on the USDA site.<\/p>\n<p><a href=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2016\/12\/Screen-Shot-2016-12-29-at-8.42.14-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-99\" src=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2016\/12\/Screen-Shot-2016-12-29-at-8.42.14-PM-300x238.png\" alt=\"\" width=\"300\" height=\"238\" srcset=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2016\/12\/Screen-Shot-2016-12-29-at-8.42.14-PM-300x238.png 300w, https:\/\/sqlshep.com\/wp-content\/uploads\/2016\/12\/Screen-Shot-2016-12-29-at-8.42.14-PM-768x610.png 768w, https:\/\/sqlshep.com\/wp-content\/uploads\/2016\/12\/Screen-Shot-2016-12-29-at-8.42.14-PM-1024x813.png 1024w, https:\/\/sqlshep.com\/wp-content\/uploads\/2016\/12\/Screen-Shot-2016-12-29-at-8.42.14-PM-624x496.png 624w, https:\/\/sqlshep.com\/wp-content\/uploads\/2016\/12\/Screen-Shot-2016-12-29-at-8.42.14-PM.png 1780w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>For this effort, You can grab the data off <a href=\"https:\/\/github.com\/sqlshep\/SQLShepBlog\/blob\/master\/FloridaData\/Edu_CollegeDegree-FL.csv\">my GitHub site <\/a> or the data is at the bottom of this blog post, copy it out into a plain text file. Make sure you change the name of the file in the script below, or make sure the file you create is &#8220;Edu_CollegeDegree-FL.csv&#8221;.<\/p>\n<p>Generally speaking when you start working with GIS data of any sort you enter a whole new world of acronyms and in many cases mathematics to deal with the craziness.\u00a0 The package we are using eliminates almost all of this for quick and dirty graphics via the choroplethr package.\u00a0 The county choropleth takes two values, the first is the region which must be the <a href=\"https:\/\/en.wikipedia.org\/wiki\/FIPS_county_code\">FIPS code<\/a> for that county.\u00a0 If you happen to be working with states, then the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Federal_Information_Processing_Standard_state_code\">FIPS state code<\/a> must be used for region.\u00a0 To make it somewhat easier, the first two digits of the county FIPS code is the state code, the remainder is the county code for the data we will be working with.<\/p>\n<p>So let\u2019s get to it;<\/p>\n<p>Install and load the choroplethr package<\/p>\n<pre><code>\r\ninstall.packages(\"choroplethr\")\r\nlibrary(choroplethr)\r\n<\/code><\/pre>\n<p>Use the <a href=\"https:\/\/stat.ethz.ch\/R-manual\/R-devel\/library\/base\/html\/getwd.html\">setwd()<\/a> to set the local working directory, <a href=\"https:\/\/stat.ethz.ch\/R-manual\/R-devel\/library\/base\/html\/getwd.html\">getwd()<\/a> will display what the current R working directory.<\/p>\n<pre><code>\r\n\r\nsetwd(\"\/Users\/Data\")\r\ngetwd()\r\n\r\n\r\n<\/code><\/pre>\n<p><a href=\"http:\/\/stat.ethz.ch\/R-manual\/R-devel\/library\/utils\/html\/read.table.html\">Read.csv<\/a> will read in a comma delimited file. <a href=\"https:\/\/stat.ethz.ch\/R-manual\/R-devel\/library\/base\/html\/assignOps.html\">&#8220;&lt;-&#8220;<\/a> is the assignment operator, much like using the &#8220;=&#8221;. The &#8220;=&#8221; can be used as well. Which to assignment operator to use is a bit if a religious argument in the R community, i will stay out of it.<\/p>\n<pre><code>\r\n# read a csv file from my working directory \r\nedu.CollegeDegree <- read.csv(\"Edu_CollegeDegree-FL.csv\")\r\n\r\n\r\n<\/code><\/pre>\n<p>View() will open a new tab and display the contents of the data frame.<\/p>\n<pre><code>\r\nView(edu.CollegeDegree) \r\n<\/code><\/pre>\n<p>str() will display the structure of the data frame, essentially what are the data types of the data frame<\/p>\n<pre><code>\r\nstr(edu.CollegeDegree)\r\n<\/code><\/pre>\n<p>Looking at the structure of the dataframe we can see that the counties imported as <a href=\"https:\/\/www.stat.berkeley.edu\/classes\/s133\/factors.html\">Factors<\/a>, for this task it will not matter as i will not need the county names, but in the future it may become a problem. To nip this we will reimport using stringsAsFactors option of read.csv we will get into factors later, but for now we don't need them.<\/p>\n<pre><code>\r\nedu.CollegeDegree <- read.csv(\"Edu_CollegeDegree-FL.csv\",stringsAsFactors=FALSE)\r\n\r\n#Recheck our structure \r\nstr(edu.CollegeDegree)\r\n\r\n <\/code><\/pre>\n<p>Now the region\/county name is a character however, the there is actually more data in the file than we need. While we only have 68 counties, we have more columns\/variables than we need. The only year i am interested in is the CollegeDegree2010.2014 so there are several ways to remove the unwanted columns.<\/p>\n<p>The following is actually using index to include only columns 1,2,3,8 much like using column numbers in SQL vs the actual column name, this can bite you in the butt if the order or number of columns change though not required for this import, header=True never hurts. You only need to run one of the following commands below, but you can see two ways to reference columns.<\/p>\n<pre><code>\r\nedu.CollegeDegree <- read.csv(\"Edu_CollegeDegree-FL.csv\", header=TRUE,stringsAsFactors=FALSE)[c(1,2,3,8)]\r\n\r\n# or Use the colun names\r\n\r\nedu.CollegeDegree <- read.csv(\"Edu_CollegeDegree-FL.csv\", header=TRUE,stringsAsFactors=FALSE)[c(\"FIPS\",\"region\",\"X2013RuralUrbanCode\",\"CollegeDegree2010.2014\")]\r\n\r\n#Lets check str again\r\nstr(edu.CollegeDegree)\r\n\r\n<\/code><\/pre>\n<p>Using summary() we can start reviewing the data from statistical perspective. The CollegeDegree2010.2014 variable, we can see the county with the lowest proportion of college graduates is .075, or 7.5% of the population of that county the max value is 44.3%. The average across all counties is 20.32% that have completed college.<\/p>\n<pre><code>\r\n\r\nsummary(edu.CollegeDegree)\r\n\r\n<\/code><\/pre>\n<p>Looking at the data we can see that we have a FIPS code, and the only other column we are interested in for mapping is CollegeDegree2010.2014, so lets create a dataframe with just what we need.<\/p>\n<pre><code>\r\nView(edu.CollegeDegree)\r\n\r\n# the follwoing will create a datafram with just the FIPS and percentage of college grads\r\nflCollege <- edu.CollegeDegree[c(1,4)]\r\n\r\n# Alternatively, you can use the column names vs. the positions. Probably smarter ;-) \r\nflCollege <- edu.CollegeDegree[c(\"FIPS\",\"CollegeDegree2010.2014\")]\r\n\r\n# the following will create a dataframe with just the FIPS and percentage of college grads\r\n\r\nflCollege <\/code><\/pre>\n<p>But, from reading the help file on county_choropleth, it requires that only two variables(columns) be passed in, region, and value. Region must be a FIPS code so, we need to rename the columns using <a href=\"https:\/\/stat.ethz.ch\/R-manual\/R-devel\/library\/base\/html\/colnames.html\">colnames()<\/a>.<\/p>\n<pre><code>\r\n\r\ncolnames(flCollege)[which(colnames(flCollege) == 'FIPS')] <- 'region'\r\ncolnames(flCollege)[which(colnames(flCollege) == 'CollegeDegree2010.2014')] <- 'value'\r\n\r\n<\/code><\/pre>\n<p>So, lets map it!<\/p>\n<p>Since we are only using Florida, set the state_zoom, it will work without the zoom but you will get many warnings. You will also notice a warning that 12000 is not mappable. Looking at the data you will see that 12000 is the entire state of Florida.<\/p>\n<pre><code>\r\n\r\ncounty_choropleth(flCollege,\r\n                  title = \"Proportion of College Graduates \",\r\n                  legend=\"Proportion\",\r\n                  num_colors=9,\r\n                  state_zoom=\"florida\")\r\n\r\n<\/code><\/pre>\n<p>For your next task, go find a different state and a different data set from the USDA or anywhere else for that matter and create your own map. Beware of the \"value\", that must be an integer, sometimes these get imported as character if there is a comma in the number. This may be a good opportunity for you to learn about gsub and as.numeric, it would look something like the following command. Florida is the dataframe, and MedianIncome is the column.<\/p>\n<pre><code>\r\n\r\nflorida$MedianIncome <- as.numeric(gsub(\",\", \"\",florida$MedianIncome))\r\n\r\n\r\n<\/code><\/pre>\n<p>USDA Economic Research Division Sample Data<\/p>\n<pre><code>\r\n\r\nFIPS,region,2013RuralUrbanCode,CollegeDegree1970,CollegeDegree1980,CollegeDegree1990,CollegeDegree2000,CollegeDegree2010-2014\r\n12001,\"Alachua, FL\",2,0.231,0.294,0.346,0.387,0.408\r\n12003,\"Baker, FL\",1,0.036,0.057,0.057,0.082,0.109\r\n12005,\"Bay, FL\",3,0.092,0.132,0.157,0.177,0.216\r\n12007,\"Bradford, FL\",6,0.045,0.076,0.081,0.084,0.104\r\n12009,\"Brevard, FL\",2,0.151,0.171,0.204,0.236,0.267\r\n12011,\"Broward, FL\",1,0.097,0.151,0.188,0.245,0.302\r\n12013,\"Calhoun, FL\",6,0.06,0.069,0.082,0.077,0.092\r\n12015,\"Charlotte, FL\",3,0.088,0.128,0.134,0.176,0.209\r\n12017,\"Citrus, FL\",3,0.06,0.071,0.104,0.132,0.168\r\n12019,\"Clay, FL\",1,0.098,0.168,0.179,0.201,0.236\r\n12021,\"Collier, FL\",2,0.155,0.185,0.223,0.279,0.323\r\n12023,\"Columbia, FL\",4,0.083,0.093,0.11,0.109,0.141\r\n12027,\"DeSoto, FL\",6,0.048,0.082,0.076,0.084,0.099\r\n12029,\"Dixie, FL\",6,0.056,0.049,0.062,0.068,0.075\r\n12031,\"Duval, FL\",1,0.089,0.14,0.184,0.219,0.265\r\n12033,\"Escambia, FL\",2,0.092,0.141,0.182,0.21,0.239\r\n12035,\"Flagler, FL\",2,0.047,0.137,0.173,0.212,0.234\r\n12000,Florida,0,0.103,0.149,0.183,0.223,0.268\r\n12037,\"Franklin, FL\",6,0.046,0.09,0.124,0.124,0.16\r\n12039,\"Gadsden, FL\",2,0.046,0.086,0.112,0.129,0.163\r\n12041,\"Gilchrist, FL\",2,0.027,0.071,0.074,0.094,0.11\r\n12043,\"Glades, FL\",6,0.031,0.078,0.071,0.098,0.103\r\n12045,\"Gulf, FL\",3,0.057,0.068,0.092,0.101,0.147\r\n12047,\"Hamilton, FL\",6,0.055,0.059,0.07,0.073,0.108\r\n12049,\"Hardee, FL\",6,0.045,0.074,0.086,0.084,0.1\r\n12051,\"Hendry, FL\",4,0.076,0.076,0.1,0.082,0.106\r\n12053,\"Hernando, FL\",1,0.061,0.086,0.097,0.127,0.157\r\n12055,\"Highlands, FL\",3,0.081,0.097,0.109,0.136,0.159\r\n12057,\"Hillsborough, FL\",1,0.086,0.145,0.202,0.251,0.298\r\n12059,\"Holmes, FL\",6,0.034,0.06,0.074,0.088,0.109\r\n12061,\"Indian River, FL\",3,0.107,0.155,0.191,0.231,0.267\r\n12063,\"Jackson, FL\",6,0.064,0.081,0.109,0.128,0.142\r\n12065,\"Jefferson, FL\",2,0.061,0.113,0.147,0.169,0.178\r\n12067,\"Lafayette, FL\",9,0.048,0.085,0.052,0.072,0.116\r\n12069,\"Lake, FL\",1,0.091,0.126,0.127,0.166,0.21\r\n12071,\"Lee, FL\",2,0.099,0.133,0.164,0.211,0.253\r\n12073,\"Leon, FL\",2,0.241,0.32,0.371,0.417,0.443\r\n12075,\"Levy, FL\",6,0.051,0.078,0.083,0.106,0.105\r\n12077,\"Liberty, FL\",8,0.058,0.08,0.073,0.074,0.131\r\n12079,\"Madison, FL\",6,0.07,0.083,0.097,0.102,0.104\r\n12081,\"Manatee, FL\",2,0.096,0.124,0.155,0.208,0.275\r\n12083,\"Marion, FL\",2,0.074,0.096,0.115,0.137,0.172\r\n12085,\"Martin, FL\",2,0.079,0.16,0.203,0.263,0.312\r\n12086,\"Miami-Dade, FL\",1,0.108,0.168,0.188,0.217,0.264\r\n12087,\"Monroe, FL\",4,0.091,0.159,0.203,0.255,0.297\r\n12089,\"Nassau, FL\",1,0.049,0.091,0.125,0.189,0.23\r\n12091,\"Okaloosa, FL\",3,0.132,0.166,0.21,0.242,0.281\r\n12093,\"Okeechobee, FL\",4,0.047,0.057,0.098,0.089,0.107\r\n12095,\"Orange, FL\",1,0.116,0.157,0.212,0.261,0.306\r\n12097,\"Osceola, FL\",1,0.067,0.092,0.112,0.157,0.178\r\n12099,\"Palm Beach, FL\",1,0.119,0.171,0.221,0.277,0.328\r\n12101,\"Pasco, FL\",1,0.049,0.068,0.091,0.131,0.211\r\n12103,\"Pinellas, FL\",1,0.1,0.146,0.185,0.229,0.283\r\n12105,\"Polk, FL\",2,0.088,0.114,0.129,0.149,0.186\r\n12107,\"Putnam, FL\",4,0.062,0.081,0.083,0.094,0.116\r\n12113,\"Santa Rosa, FL\",2,0.098,0.144,0.186,0.229,0.265\r\n12115,\"Sarasota, FL\",2,0.142,0.177,0.219,0.274,0.311\r\n12117,\"Seminole, FL\",1,0.094,0.195,0.263,0.31,0.35\r\n12109,\"St. Johns, FL\",1,0.085,0.144,0.236,0.331,0.414\r\n12111,\"St. Lucie, FL\",2,0.081,0.109,0.131,0.151,0.19\r\n12119,\"Sumter, FL\",3,0.047,0.07,0.078,0.122,0.264\r\n12121,\"Suwannee, FL\",6,0.056,0.065,0.082,0.105,0.119\r\n12123,\"Taylor, FL\",6,0.064,0.086,0.098,0.089,0.1\r\n12125,\"Union, FL\",6,0.033,0.059,0.079,0.075,0.086\r\n12127,\"Volusia, FL\",2,0.107,0.13,0.148,0.176,0.213\r\n12129,\"Wakulla, FL\",2,0.018,0.084,0.101,0.157,0.172\r\n12131,\"Walton, FL\",3,0.067,0.096,0.119,0.162,0.251\r\n12133,\"Washington, FL\",6,0.04,0.063,0.074,0.092,0.114\r\n\r\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In the last blog you were able to get a dataset with county and population data to display on a US map and zoom in on a state, and maybe even a county if you went exploring.&nbsp; In this demo we will be using the same choroplethr package but this time we will be using [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,9,5,1,11],"tags":[14,10,8,13],"class_list":["post-80","post","type-post","status-publish","format-standard","hentry","category-choropleth","category-r","category-sqlshep","category-uncategorized","category-visualization","tag-choroplethr","tag-r","tag-sqlshep","tag-visualization"],"_links":{"self":[{"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/posts\/80","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sqlshep.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=80"}],"version-history":[{"count":23,"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/posts\/80\/revisions"}],"predecessor-version":[{"id":336,"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/posts\/80\/revisions\/336"}],"wp:attachment":[{"href":"https:\/\/sqlshep.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=80"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sqlshep.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=80"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sqlshep.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=80"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}