{"id":386,"date":"2017-11-29T10:00:59","date_gmt":"2017-11-29T10:00:59","guid":{"rendered":"https:\/\/sqlshep.com\/?p=386"},"modified":"2018-01-08T19:02:41","modified_gmt":"2018-01-08T19:02:41","slug":"getting-started-r-sql-regardless-sql-version-using-odbc","status":"publish","type":"post","link":"https:\/\/sqlshep.com\/?p=386","title":{"rendered":"Getting Started with R and SQL (Regardless of SQL Version) Using ODBC"},"content":{"rendered":"<p>So here you are, you know SQL or you at least do something with it everyday and are wondering what all the hoopla is about R and data science.  Lets break down the first barrier, R and data science actually have little to do with each other, R is a language, data science is an abstract field of work, sort of like saying I am a computer scientist, that will narrow it down but not by much.  What is your industry, what languages do you use, what is your education, hacker, bachelors, masters, phd&#8230;?  You can be a Data Scientist and never use R.<\/p>\n<p>But we are going to use R, today, right now, get ready. <\/p>\n<p><!--more--><\/p>\n<p>I have <a href=\"https:\/\/sqlshep.com\/?p=353\" target=\"_blank\">done prior posts about my environment<\/a>, I will not bore you with the details, just know that I am using a <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/virtual-machines\/data-science-virtual-machines\/\" target=\"_blank\">Data Science Virtual Machine in Azure<\/a>, mostly because I am lazy and don\u2019t want to build my own any more, its worth a few bucks a month to me for someone else to host all this crap and take care of setup and updates for me.  <\/p>\n<p>To get us started, I have <a target=\"_blank\" href=\"https:\/\/cran.r-project.org\/bin\/windows\">R<\/a> from <a target=\"_blank\" href=\"https:\/\/cran.r-project.org\">CRAN<\/a> installed, if you don\u2019t, go do it, there are dozens of videos and blogs on how to.  <a target=\"_blank\" href=\"https:\/\/vimeo.com\/203516510\">Here is one for R!<\/a> and, <a target=\"_blank\" href=\"https:\/\/vimeo.com\/203516968\">Here is one for RStudio!<\/a><\/p>\n<p>I have R Studio installed, it is just the IDE for R, Imagine R Studio as the Management Studio to R. If you are old school and hate yourself you can do everything via command line using the Base R gui too.<\/p>\n<p><a href=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-27-at-2.18.17-PM.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-27-at-2.18.17-PM-1024x660.png\" alt=\"\" width=\"625\" height=\"403\" class=\"alignnone size-large wp-image-387\" srcset=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-27-at-2.18.17-PM-1024x660.png 1024w, https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-27-at-2.18.17-PM-300x193.png 300w, https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-27-at-2.18.17-PM-768x495.png 768w, https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-27-at-2.18.17-PM-624x402.png 624w, https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-27-at-2.18.17-PM.png 1414w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>You can see my R version from the R command &#8220;version&#8221;, and the R Studio version from About RStudio, as we go forward I will try to share the package versions and the R and R studio version, with this many moving parts breaking something is really easy. With that in mind, if you ever update R, RStudio, or a package and something stops working, you can bet it is related to an update somewhere.  As annoying as it can be, don&#8217;t be afraid to stay 2 or 3 versions behind on R or and RStudio, unless you have unlimited time, then by all means install pre-prod code directly from github, i will discuss that later.  <\/p>\n<p>In the title of the post it states \u201cRegardless of SQL Version\u201d, for this we are going to use a package called <a href=\"http:\/\/db.rstudio.com\/odbc\/\">ODBC<\/a>, it simply uses the ODBC on your machine to connect to SQL, and if you have other odbcs unrelated to SQL you can connect to those databases as well.  While SQL Server requires Revo packages to call out to R and back in and a SQL version of 2016 or higher, connecting <strong>to<\/strong> SQL from R does not require Revolution packages or tools or the SQL Launcher and does not require a specific version of SQL.  The link above goes into the nasty details of DBI, RODBC and RODBCDBI, its an interesting read, i suppose, its basically telling you how awesome odbc is.  I say we accept that fact and move on. BTW, if you ever need help on an R command, just put a question mark in front of it, like; ?odbc or ?version in teh R Console. <\/p>\n<p>To be clear, if you are doing some big data crunching there are benefits to using Revolution packages and tools, and we will get into that at a way later date, but not today, today we just connect R to a SQL Server of any version. <\/p>\n<p>Create a new R Script file to get you started, start saving them&#8230;<\/p>\n<p><a href=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-28-at-4.53.33-PM.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-28-at-4.53.33-PM.png\" alt=\"\" width=\"579\" height=\"146\" class=\"alignnone size-full wp-image-440\" srcset=\"https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-28-at-4.53.33-PM.png 579w, https:\/\/sqlshep.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-28-at-4.53.33-PM-300x76.png 300w\" sizes=\"auto, (max-width: 579px) 100vw, 579px\" \/><\/a><br \/>\nLets start with something simple, connect to any version of SQL using ODBC. <\/p>\n<pre>\r\n<code>\r\n#you only need to install the package once per machine \r\n#unless there is an update to the package, \r\n#be sure to look for errors in the console output\r\ninstall.packages(\"odbc\")\r\n\r\n# library will load the package into memory and make it available for use\r\n# This will need to be run every time R is started \r\n\r\nlibrary(odbc)\r\n\r\n# ODBC requires the timezone to be set,\r\n# If you skip this step it will remind you \r\n\r\nSys.setenv(TZ='GMT')\r\n\r\n# If you have ever connected using odbc, this will look familiar\r\n# build the connection \r\nmaster <- dbConnect(odbc::odbc(),\r\n                     Driver    = \"SQL Server\", \r\n                     Server    = \"localhost\",\r\n                     Database  = \"master\",\r\n                     Trusted_Connection = 'Yes')\r\n\r\n# prepare a sql statement to send\r\nSQLStmt <- sprintf(\"SELECT [session_id]\r\n                        ,[login_time]\r\n                   ,[host_name]\r\n                   ,[program_name]\r\n                   FROM [master].[sys].[dm_exec_sessions]\")\r\n\r\n# Alternatively, you can run @@version, just to connect and get out\r\n# SQLStmt <- sprintf(\"select @@version\")\r\n\r\n\r\n# Submit the query, if your syntax is wrong this is where you will see it the error \r\nrs <- dbSendQuery(master, SQLStmt)\r\n\r\n# Receive the results into a data frame called MyStuff\r\nMyStuff <- dbFetch(rs)\r\n\r\n# house keeping \r\ndbClearResult(rs)\r\ndbDisconnect(master)\r\n<\/code>\r\n<\/pre>\n<p>If you received no errors you should now have a dataframe called MyStuff that has some number of rows and 4 columns, you will notice that R calls these observations (rows), and variables (columns).  This is leftover from the statistical language of a study, a case is an observation or a row to you and me, and a variable is a column.  <\/p>\n<p>If you change the query to Select * from sys].[dm_exec_sessions],  and receive the error, \u201cnanodbc.cpp:3102: 07009: [Microsoft][ODBC SQL Server Driver]Invalid Descriptor Index\u201d, in the case of this query it is because the tidyverse odbc has some difficulty with guids, so, take the guids in the query and cast them as varchar(32) and you will get past it(\u201ccast([original_security_id] as varchar(36))\u201d).  Yes, you will have to blow out the entire column list and find all of the guids, but then again you should not be using select * anyway, or so people tell me. \ud83d\ude1b<\/p>\n<p>My Versions today:<\/p>\n<pre><code>\r\n> version\r\n               _                           \r\nplatform       x86_64-w64-mingw32          \r\narch           x86_64                      \r\nos             mingw32                     \r\nsystem         x86_64, mingw32             \r\nstatus                                     \r\nmajor          3                           \r\nminor          3.3                         \r\nyear           2017                        \r\nmonth          03                          \r\nday            06                          \r\nsvn rev        72310                       \r\nlanguage       R                           \r\nversion.string R version 3.3.3 (2017-03-06)\r\nnickname       Another Canoe  \r\n\r\n> packageVersion(\"odbc\")\r\n\r\n[1] \u20181.0.1\u2019\r\n\r\n## If you ran \"select @@version\"\r\n\r\n> MyStuff\r\n                                                                                                                                                                                                                                            \r\n1 Microsoft SQL Server 2016 (SP1-CU1) (KB3208177) - 13.0.4411.0 (X64) \\n\\tJan  6 2017 14:24:37 \\n\\tCopyright (c) Microsoft Corporation\\n\\tDeveloper Edition (64-bit) on Windows Server 2016 Datacenter 6.3 <X64> (Build 14393: ) (Hypervisor)\\n\r\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>So here you are, you know SQL or you at least do something with it everyday and are wondering what all the hoopla is about R and data science. Lets break down the first barrier, R and data science actually have little to do with each other, R is a language, data science is an [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46,9,38,33,5],"tags":[41,40,42,43,45,44,10,39],"class_list":["post-386","post","type-post","status-publish","format-standard","hentry","category-odbc","category-r","category-rstudio","category-sql","category-sqlshep","tag-41","tag-2008-r2","tag-42","tag-43","tag-connectivity","tag-odbc","tag-r","tag-sql"],"_links":{"self":[{"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/posts\/386","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sqlshep.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=386"}],"version-history":[{"count":25,"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/posts\/386\/revisions"}],"predecessor-version":[{"id":541,"href":"https:\/\/sqlshep.com\/index.php?rest_route=\/wp\/v2\/posts\/386\/revisions\/541"}],"wp:attachment":[{"href":"https:\/\/sqlshep.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=386"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sqlshep.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=386"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sqlshep.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=386"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}