Tableau Prep Builder

  1. Tableau Prep Builder Download
  2. Tableau Prep Builder 2020.3

Tableau Prep Builder recently released an update (2019.3) that included a brand-new tool: The Script Step. This step lets us run an R or Python Script upon data in a flow, which opens our analysis up to the vast range of functions available in both programs. Seriously, the possibilities are endless!

In this article we will focus on running an R Script, as:

  1. a) there aren’t many R-specific guides out there, and
  2. b) R has a fairly intuitive scripting language and is easier to get up-and-running than Python (in my opinion).

Why should I bother?

Tableau Prep Builder; Answer Here are some potential options of improving Tableau Prep Builder performance. Avoid wildcard union of a large amount of Excel / Text files. Use a database as a data source instead of flat files to avoid the use of union. Or, save the data in only one file or a few files. Additional Information TC19: Best. To get started with Tableau Prep on the browser, simply upgrade Tableau Server to version 2020.4, then enable flows in web authoring. For more information, read about these settings and topology changes. Now you're ready to start creating flows in the browser. Downloaded a trial version of Tableau Prep Builder? Cleaning data for the first time? Want to know how to begin? This is the video for you.

Whilst Tableau Prep can do a lot of great things, I believe it is missing some essential steps, namely:

  • Indexing
  • Sorting
  • Flagging outliers
  • Ranking*

*Tableau Prep version 2020.1 now allows users to create Level of Detail Calculations, including Rank, so you will only need to use an R script to Rank if using an earlier version

I would also include Top N filtering on this list, but we can do this with two steps: Ranking (via R script) and Filtering (via Tableau Prep).

There is also the huge range of data science functionality that R offers (clustering, modelling, machine learning…) but we’ll leave that for a future article!

Before we get started

  1. 1. Download & Install the latest version of R from here
  2. 2. Download & Install Tableau Prep 2019.3 (or above) from here

You can also download the materials & examples used in this article (including the packaged flow file) here.

How it works

Tableau Prep runs your R Script via the Rserve Server. This means that you need to install and run the Rserve package within R Console before you can run your script.

To install and run Rserve:

  1. 1. Open R Console

If using Rserve for the first time:

  1. 2. Type in the following code into R and press Enter:

install.packages(“Rserve”)

  1. 3. At this point you will be asked to select a CRAN mirror for use in this session – select one in the country closest to you (here we used UK (London 1) [https]).

If Rserve is already installed:

  1. 4. Type in the following code into R and press Enter:

library(Rserve)
Rserve()

You should now have the Rserve Server up and running. Your R workspace should look as follows:

Setting up the Script step in Prep

1. Add a Script step to your data

  1. 2. Make sure the connection type is Rserve
  1. 3. Select “Connect to Rserve Server” input the following credentials, and click Sign In.

Server: localhost

Port: 6311

Note: Port 6311 is the default port for plaintext Rserve servers. If your server is SSL-encrypted you will need Port 4912 and you may need to enter a username and password.

You are now ready to upload an R script and apply it to your data.

How an R Script works

R Scripts are lines of code that perform certain tasks within R. They can be saved and then referenced later, removing the need to write new lines of code every time.

You will need to write and save your R code separately to Tableau Prep as a *.R file, then upload it via the script step.

There are some important things to know about the format of an R script.

  • Your input and output will always be in the form of a data frame. The table of data that flows into the script step in your prep flow forms the data frame (called “df”) by R.
  • You need to reference specific column names. You cannot design an R script that dynamically takes the first column in your data, for example.
  • The columns being referenced in the R script cannot have spaces, special characters (e.g. % or £) or begin with a number, so you may need to rename columns before the script tool. This is because R will interpret these characters as a command.
  • You must list all columns that you want to output in your data frame. Any missing columns will not be returned in the output of the Script step.
    • For this reason, it may be easier to only apply a script to the columns you need to manipulate and then join these back onto the rest of the data afterwards.
  • Your script should contain a function that does something specific to your data. All manipulation is done via an R function. You can explore R functions in R’s official documentation here.
  • You will need to specify the data types of your outputs. This means including an additional function called getOutputSchema.

Creating an R Script

  1. 1. Within R Console, choose File > New Script

2. Once you have written your script, save your file via File > Save

This is the file you will upload to Tableau Prep in the File Name section (you will also need to specify a function).

Layout of an R Script in Tableau Prep

The generic format of an R script that will work in Tableau Prep is as follows:

Practical Examples:

The examples below will use the following simple table of data. Feel free to copy it into Excel and connect to the file in Tableau Prep.

You can also download the materials & examples used in this article (including the data) here.

Note that all the example scripts can be easily customised to match your own data. Just change the names of the columns according to the generic layout given beforehand.

Example 1 – Indexing a Column

Aim: to produce a column of integers that label each record in our dataset from 1 to 12*.

* 12 is not a hard-coded number; the script will calculate how many rows are in the data.

Desired Output:

R Script:

Index_function <- function(df) {

Name <- df$Name

df$Age <- df$Age

df$Index <- 1:nrow(df)

return(df)

}

getOutputSchema <- function() {

return(data.frame(

Name = prep_string(),

Age = prep_int(),

Index = prep_int()

))

}

Note that the R Function 1:nrow(df) takes the entire dataframe, df, as an argument.

In Tableau Prep, upload the R script above and type “Index_function” (the name of the function) into the Function Name field. Press Enter to apply your script.

Example 2 – Sorting a Column

Aim: to sort the data by the value of a column, in this case Age in ascending order (smallest to largest).

Desired Output:

R Script:

Sort_Age <- function(df) {

Name <- df$Name

Age <- df$Age

df <- df[order(df$Age),]

return(df)

}

getOutputSchema <- function() {

return(data.frame(

Name = prep_string(),

Age = prep_int()

))

}

The command line df <- df[order(df$Age),] replaces the data frame with a new data frame in the correct order. Note that the R Function order takes the column df$Age as an argument, and sorts it in ascending order.

To reverse the order of the sorting (largest to smallest), simply replace df$Age with -df$Age

In Tableau Prep, upload the R script above and type “Index_function” (the name of the function) into the Function Name field. Press Enter to apply your script.

NOTE: You can also divide your data into groups and sort by group – e.g. split by gender and then sort by age. To do this you would include the gender column as the first argument of the order function:

df <- df[order(df$Gender, df$Age),]

Example 3– Creating an “Outlier” Flag

Aim: to produce a flag for when an Age is an outlier, defined as a data point above Q3 + 1.5 times the IQR. Here 1 = outlier and 0 = not outlier.

Desired Output:

R Script:

Outlier <- function(df) {

Name <- df$Name

Age <- df$Age

df$Outlier_Flag <- ifelse(test = Age > quantile(Age, probs = 0.75) + IQR(Age) * 1.5,

yes = 1,

no = 0)

return(df)

}

getOutputSchema <- function() {

return(data.frame(

Name = prep_string(),

Age = prep_int(),

Outlier_Flag = prep_int()

))

}

BuilderTableau prep builder 2020.3

Note that the R Function ifelse takes the column Age as an argument, and also references other functions inside it.

In Tableau Prep, upload the R script above and type “Index_function” (the name of the function) into the Function Name field. Press Enter to apply your script.

Example 4 – Ranking a Column

*Tableau Prep version 2020.1 now allows users to create Level of Detail Calculations, including Rank, so you will only need to use an R script to Rank if using an earlier version

Aim: to produce a column of integers that rank each value in the ‘Age’ column from 1 (highest age) to 12 (lowest age).

Desired Output:

R Script:

Rank_Age <- function(df) {

Name <- df$Name

Age <- df$Age

df$Rank_of_Age <- (nrow(df)+1) – rank(Age , ties.method= “max”)

return(df)

}

getOutputSchema <- function() {

return(data.frame(

Name = prep_string(),

Age = prep_int(),

Rank_of_Age = prep_int()

))

}

Note that the R Function rank(Age , ties.method= “max”) takes one column, Age, as an argument.

NOTE: This ranking method uses the Competition (1,2,2,4) method. For other options, look into the ties.method options within the rank() function.

We could reverse the order of the Rank by replacing (nrow(df)+1)- rank(Age, ties.method= “max”) with rank(Age, ties.method= “min”). Again, this would use the Competition (10,9,9,7) ranking method.

In Tableau Prep, upload the R script above and type “Rank_Age” (the name of the function) into the Function Name field. Press Enter to apply your script.

You can also download the materials & examples used in this article (including the packaged flow file) here.

Testing your Script in R

I find it helpful to test the script in R before attempting it within a script tool. Some tips here:

  • Output your data as a *.csv file from Tableau Prep at the point before you will use a script tool
  • Upload this file to R using the read.csv tool.

Ages_Data <- read.csv(“[file path & file name].csv”)

  • Turn this into a data frame called “df” by using the data.frame function.
    df <- data.frame(Ages_Data)
  • Check the column names of your data frame using the names function. If they do not match your data then use the second line of this script.
    NOTE: Command lines beginning with # are comments and do not affect the script
Prep

# To check column names
names(df)

# To change column names
names(df) <- c(“Name”, “Age”)

  • Highlight the first half of your function (everything above getOutputSchema) and press CTRL+R. This will run your function.
  • Test your function

Rank_Age(df)

What next?

I spoke at the Tableau Prep Virtual User Group on Thursday 30th April 2020, showing users how you can expand the current capabilities of Tableau Prep by leveraging R Scripts, including how to sort, rank, index your data, flag outliers, and more. Catch up on the video here

I believe the community is only just beginning to scratch the surface of the possibilities within the Script tool. Look out for future articles on data science functionality & advanced analytics, and please help spread the word by sharing this article!

How to use R Script with Prep Builder is also covered in more detail in our 2-day Prep Builder training course, which touches on all aspects of Prep functionality. Please contact uswith any enquiries and booking requests, we would be happy to accommodate you.

Tableau Prep now supports web authoring for flows. Starting in version 2020.4.1, you can create flows to clean and prepare your data using either Tableau Prep Builder, Tableau Server, or Tableau Online. You can also manually run flows on the web and the Data Management Add-on is not required.

While most of the same Tableau Prep Builder functionality is also supported on the web, there are a few differences when creating and working with your flows.

Important: To create and edit flows on the web you must have a Creator license. However, the Data Management add-on is only required if you want to run your flows on a schedule using Tableau Prep Conductor.

Installation and Deployment

To enable users to create and edit flows on the web, configure the following settings on your server:

  • Web Authoring: Controls whether users can create and edit flows on Tableau Server or Tableau Online. For more information, see Create and Interact with Flows on the Web(Link opens in a new window) in the Tableau Server help.

  • Run Now: Controls whether users or only administrators can run flows manually using the Run Now option.

  • Tableau Prep Conductor: If the Data Management Add-on is enabled, enable this option to let users schedule and track flows in Tableau Server and Tableau Online.

  • Autosave: Enabled by default, this feature automatically saves a user's flow work every few seconds.

    While not recommended, administrators can disable autosave on a site using the Tableau Server REST API method 'Update Site' and flowAutoSaveEnabled setting. For more information, see Tableau Server REST API Site Methods: Update Site(Link opens in a new window). For more information about autosave on the web, see Autosave and working with drafts.

    For more information about configuring site settings, see Site Settings Reference(Link opens in a new window) in the Tableau Server help.

On Tableau Server, administrators can fine-tune the configuration of the Tableau Prep Flow Authoring processes. For more information, see Tableau Prep Flow Authoring.

Sample data and processing limits

To maintain performance while working with flows on the web, limits are applied to the amount of data you can include in a flow.

Tableau Prep Builder Download

The following limits apply:

  • When connecting to files, the maximum file size is 1GB.
  • The data sampling option to include all data is not available. The default sample data limit is 1 million rows.
  • The maximum number of rows that a user can select when using large data sets is configured by the administrator. As a user, you can select the number of rows up to that limit. For more information, see tsm configuration set Options(Link opens in a new window) in the Tableau Server help.
  • In Tableau Online, the number of flow runs you can perform in a day is also limited by the site administrator. For more information, see Tableau Online Site Capacity(Link opens in a new window) in the Tableau Online help.

Available features on the web

When you create and edit flows on the web you may notice a few differences in navigation and the availability of certain features. While most features are available across all platforms, some features are limited or not yet supported in Tableau Server or Tableau Online. The following table identifies features where differences might apply.

FeatureTableau Prep BuilderTableau ServerTableau Online
Connect to Data*
Build and Organize your Flow
Set your data sample size**
Union files and database tables in the Input step***
Clean and Shape Data
Aggregate, Join, or Union Data
Use R and Python scripts in your flow ****
Create reusable flow steps (version 2019.3.2 and later)
Automatically save your flows on the web Not Applicable
Automatic file recovery Not ApplicableNot Applicable
View flow output in Tableau Desktop
Create an extract to a file
Create a published data source
Save flow output data to external databases

* Some connectors may not be supported on the web. Open the Connect pane on your server to see supported connectors.

** In Tableau Server and Tableau Online, the data sample size is subject to limits set by your administrator.

*** Input unions can't be edited or created in Tableau Server or Tableau Online. Only in Tableau Prep Builder.

**** Script steps can't be added when creating or editing a flow in Tableau Online. This is currently supported only in Tableau Prep builder and Tableau Server.

Tableau Prep Builder 2020.3

Autosave and working with drafts

When you create or edit flows on the server, your work is automatically saved as a draft every few seconds so that in the event of a crash, or when closing a tab by accident, you don't lose your work.

Drafts are saved to the server and project you are signed into. You can't save or publish a draft to another server, but you can save the flow to another project on that server using the File > Publish As menu option.

Draft content can only be seen by you until you publish it. If you publish changes and need to revert them, you can use the Revision History dialog to view and revert to a previously published version. For more information about saving flows on the web, see Automatically save your flows on the web .

Publishing flows and embedding credentials

Whether you are publishing flows from Tableau Prep Builder to Tableau Server or Tableau Online or publishing a new or edited flow on the web, you have the option to embed credentials for database connections. Embedding your credentials enables flows that include database connections to run seamlessly on the server without requiring manual intervention.

Note: Embedding credentials only applies to running published flows on your server. You will still manually need to enter your credentials when editing a flow connected to a database in either Tableau Prep Builder or on the web.

When creating or editing flows on the web, you must first publish a flow before you can run it to generate your flow output for your data analysis. During publish, you can choose to embed your credentials for your database connections. By default credentials are not embedded.

You can also change your credential option from the top menu under File > Connection Credentials. For more information, see Publish a flow in web authoring.

Thanks for your feedback!