SQL Server 2016 - R Service

One More Language to Learn!

So, we have had SQL, TSQL, MDX, DMX, DAX to keep us busy in the SQL Server world for years now so I guess it was about time another language came along to join them!

SQL Server 2016 enters the world of Big Data with its PolyBase and R Services.

 

The SQL Server R Service

The R Service enables integration of R scripts into TSQl scripts. 

What is R?

Well this is what you get if you take a look at Wiki:

 

It is an Open Source language for staistical analysis and graphics, and is commonly used by statisticians and data miners is what it says here.

 

The R Project

R is free and downloadable from the GNU R Project website: https://www.r-project.org/

 

 

As well as providing libraries/packages that enable complex analysis to be carried out on data sets, there are many libraries/packages that assist in cleansing data before it can be analysed.

 

SQL Server & R Scripting

SQL Server 2016 has introduced the R service that provides stored procedures to allow R scripts to be executed from Transact SQL scripts. The following example uses the new WideWorldImporters sample database and shows how an R regular expression library called stringr can be used to employ powerful regular expressions to retrieve values from poorly structured relational data:

 

The sp_execute_external_script stored procedure is called specifying the R language (@language variable) as the scripting language to be used.

The @script variable contains the actual R script to be executed.

The R script firstly defines variables that will be use:

Street <- unlist(Customer_AddressLine2$Street)

The R unlist function is used to extract the values from all records for the column aliased as Street in the Input Data Object called Customer_AddressLine2.

CustomerName <- unlist(Customer_AddressLine2$CustomerName)

The same is done to extract the customer name values.

housenumber <- "[0-9]{1,}[a-z]?"

The housenumber variable is a regular expression that matching a string of consecutive digits 1 to 9 (1 ore more) immediately followed by 0 or 1 lowercase letters.

hnumber <- str_match(Street, housenumber)

hnumber is set to the result of the str_match function, a member of the stringr library, which extracts just the matching string for the regular expression in housenumber.

OutputDataSet <- data.frame(Street, hnumber, CustomerName)

The results are returned via the OutputDataSet variable which in turn is populated by the data.frame function with the threee values of interest for the results set.

 

As can be seen from the screen shot above, the result is that the building number is extracted from the street line of the customer address.

Here is the full script for you:

EXECUTE sp_execute_external_script
        @language = N'R'
       ,@script
	        = N'   library(stringr)
               Street <- unlist(Customer_AddressLine2$Street)
			   CustomerName <- unlist(Customer_AddressLine2$CustomerName)
               housenumber <- "[0-9]{1,}[a-z]?"
               hnumber <- str_match(Street, housenumber)
			   OutputDataSet <- data.frame(Street, hnumber, CustomerName)'
	   ,@input_data_1 = N'SELECT CustomerName, DeliveryAddressLine2 AS Street
				   FROM Sales.Customers AS c'

	   ,@input_data_1_name = N'Customer_AddressLine2'
WITH RESULT SETS (( Street VARCHAR(MAX), hnumber VARCHAR(MAX), 
					CustomerName VARCHAR(MAX) ));
GO

 

Did you find this interesting? Would you like to find out more? Drop us a line at info@ptr.co.uk.

Share this post