Tableau Prep Use Cases

In this case, use Tableau Prep, because producing customized data sources will be significantly easier. For example, imagine you’re working with sales data for both the Finance and Marketing departments.

  1. Tableau Case Studies. Below are well known Tableau Case Studies, which are going to help you in practice and interview. JP Morgan Chase & Co. Morgan Chase & Co. Is a leading multinational bank and financial services company based in the USA. It is the largest investment bank in the US and sixth largest in the world.
  2. How to Use Tableau Prep Use Case: For a Superstore to analyze product sales and profits over the last four years for the company using Tableau. For this Superstore data has been collected and tracked differently for each region. The data is entered differently for different regions for which lot of data cleaning is required.
  3. Read this case study from MoneySQ to learn how they improved collaboration, communication, and decision-making with the ability to use near real-time data. Before leveraging big data with Tableau, MoneySQ employees tracked their business targets manually, entering duplicated versions of data into multiple platforms like Excel and other online.

On 8/12/2020, Tableau released Tableau Desktop version 2020.3 that included some fun new features, including Write to Database in Tableau Prep, Export to Crosstab Button, and the IN function. There are lots of great new features in this release, but the IN function caught my eye specifically. Let’s dive deeper into understanding the IN functionality and how you can leverage this in your Tableau development.

Those who are familiar with SQL are already familiar with the IN concept as it’s a common method of defining criteria for a WHERE statement. The IN function in Tableau functions similarly to SQL. See an example SQL query below:

The IN function in Tableau is used to create groupings of different values within a dimension or measure that you specify in the function criteria. The values that you specify in this IN group are essentially creating a permanent Set based upon those criteria.

The explanation of the syntax to use this operation is shown in bold below.

Returns TRUE if < matches any value in <expr1>.

<expr1> can be a Set, or a lister of literal values or a combined field

Cases

Practically, the syntax is going to look like this when assigning values to groups:

You can also define what values you want to show a measure for as well using mostly the same syntax, just with a different ending:

In addition to text functions, you can define criteria if they are numeric as well. Using this method you will not have quotations around the values as they are not text values.

Lastly, you can use parameter values within the IN function as well. Although this may not be a great way to define your criteria, I am simply stating that this is possible.

Whether or not it’s best practice, the most commonly used instance of this new feature will assuredly be creating some sort of criteria based upon a list of text (like below). If your list of text is extremely long, then you could place your items in an Excel sheet and create a formula to put quotes around each word and commas between each word. You could then paste this into your IN syntax.

If your calculations are already structured in this way, this new function might help make some of your old formulas more understandable, both for you and anyone else who might look through your calculations (including other developers making changes). See how much easier it is to create and comprehend the formula with IN rather than the long list of OR statements.

The calculation using IN is below. Notice how much easier it is to create as well as consume.

Admittedly, when I saw the announcement of this function, I wasn’t very excited about it. I thought that everything you could do with IN you could already accomplish using Sets. I still think this is mostly true, but an advantage of instead IN instead of Sets to group data values is that in order for a dimension value to be in a Set, it has to already have occurred in your data. Let me explain what I mean.

Prep

As an example, I want to create a Set containing only states within a certain Region. I then want to use these Sets to create calculations. Let’s say I create that set, the calculation to get the region value would look like below.

The only issue using this methodology is that only states within your data set would be able to be placed in a set. Let’s say you’ve got sales orders from every state in the Midwest except for Ohio. If you created this Set, you would not be able to have this calculation automatically update once you get a sale from Ohio, BUT, if you used your WHEN IN(‘Ohio’,’Iowa’,’Indiana’,’Michigan’, etc) function it would automatically assign Midwest to your Ohio sales.

I want to caveat these examples by saying that if you’re mapping states to region, stores to group, or whatever your situation may be, the optimal solution would be to get those values in your original data set in the back end. If that’s not possible, a mapping table would also be a more efficient solution than using IN (especially given Tableau’s new Relationships). I want to reiterate that my use cases using IN are just examples and may not offer the most performant results.

Thanks so much for reading! If you have any comments, suggestions, or feedback make sure to email me at [email protected]

Want a monthly insights from the Tessellation Team with tips, tricks, and secrets to improve your analytics?
Sign up below and we'll deliver articles, events, and how to's straight to your inbox.

As of version 2020.1.3, Tableau Prep now has the capability to create analytic calculations and {FIXED} LOD calcs – something many I’ve been hoping for ever since I first saw Prep back in 2018! They help us to answer questions that previously needed table calcs, hiding rows, and tricks in Tableau Desktop, or just simply weren’t possible and needed some data prep pre-Tableau.

A caveat that comes with building these fields using these calcs in Tableau Prep is that they’re not dynamic – they won’t respond to filters, for example – so you may find yourself coming back to Prep to create a new fields for each specific question. Nevertheless, these calcs make tackling the cases below much easier than before.

Being in Coronavirus lockdown has given me time to write up my favourite uses (not in order of importance) for these new calcs; this Prep flow has a worked example for each use case.

1. Adding a simple row identifier to your data:

More often that any of us would like, we come across (or get given) datasets that have no unique identifier, or each row’s uniqueness is a combination of different columns (and we’re left to work out which ones they are for ourselves). Worst of all is where there’s nothing to identify that each row is a distinct ‘thing’, save for the fact that it’s a different row!

To create a row id, click the Create Calculated Field… icon and use the calculation:

Note: There’s a catch to be aware of with the ROW_NUMBER() function, in that you have to order by a field (it’s not possible to retain the incoming row order). What this means is that as it stands, it’s not possible to create a calc that records the original sort order (yet).

Now we can use our Row ID as a dimension in Tableau and display individual rows where we need to.

2. What did the customer buy in their 3rd order?

Ordering sequential events:

I worked with a customer running a subscription business who wanted to know all about customers’ third order, since their first two came under an introductory offer. Answering questions about a customer’s nth order can be fiddly to answer using Desktop/Web Edit alone and involve table calculations to get the nth, hiding rows, and difficulties doing deeper analysis on those orders. The RANK() function can make this simpler by ranking a customer’s orders by date (or any other field for that matter). If we write our calculation

then we get an incrementing number for each order date – telling us the nth order for each customer:

Now if we want to analyse how each customer’s third order (if they had one) we no longer need table calcs, we can just filter on our Nth order column.

3. What’s the average interval between a customer’s orders? Can we analyse those ‘high frequency’ customers?

Making comparisons across rows:

Segmenting your customers by the frequency at which they normally ‘do stuff’ (like placing a new order) means that you can track and treat them differently, and possibly intervene when they haven’t ordered for a while. Whilst calculating the time difference between orders is easy with a table calc, it’s tricky is to go further and analyse those ‘frequent orderers’ since table calcs require you to have all data in your view.

To calculate the time difference between orders, we need to look at the row below each one and compare the dates – effectively a lookup function. This capability isn’t immediately apparent since there isn’t a lookup function in Prep as such; however (and this is sometimes how I do it in SQL anyway) we can take our Nth Order field above and then join the data to itself like so (written in SQL style, nowhere in Prep will you have to type anything like this):

Order 1 from the left side will be paired with Order 2 from the right for each customer, so now the columns from your right-hand side represent the next order in sequence. The linked Prep flow will illustrate this better than I can articulate it!

Once we have the time interval, we can easily group our customers into buckets and begin our analysis. More on this below…

4. Pre-computing LOD calcs for performance (at the expense of flexibility)

Now that we’ve got our time interval, can we group our customers into buckets based on their average order frequency? What we want is the average order interval at a customer level.

LOD calcs in Desktop / Web Edit are perfect for this, but with lots of them on lots of data, you can start running into performance issues. You might choose to pre-compute your calculations in Tableau Desktop to alleviate this – however, since LOD calcs are dynamic based on your view, they can’t be pre-computed. The new LOD calc offers a way of pre-computing this into your data source and saving the performance hit of LODs.

This has actually already been possible in Prep using a combination of an aggregation followed by a self-join, but was a time-consuming workaround and ran into problems if you didn’t have a nice unique id for the thing you were aggregating.

Warning – given the caveat about dynamic-ness (dynamism? dynamicity?) above, I would only recommend this in cases where performance is a significant issue and you have already decided what views you need, since your LOD calc will no longer take into account any filters. As such I’d only recommend this part of a performance fix on a finalised and relatively fixed dashboard view!

LOD calculation syntax is exactly the same in Prep as in Desktop/Web Edit:

Now that we have this, it’s easy to bucket up our customers in Tableau with some calculations and save re-computing the LOD.

Tableau Prep Help

5. What percentile does each order sit in? Can I filter out outliers? Calculating percentiles:

Just as in Desktop/Web Edit, the RANK_PERCENTILE() calc calculates what percentile each row is based on a measure, within a certain set of rows – for example, which orders are the top 5% in terms of total profit made in a given year?

Like the examples above, this is something that can be done with a table calc but it’s difficult to analyse further. Happily, Prep makes this easier and the syntax is just like our RANK() above:

Note that you can’t do calculations inline – and if you try you’ll get this error message:

Now we have the percentile that is fixed in our data, so it’s easy to filter and aggregate without disrupting the percentile calculation.

Download Tableau Prep

For the moment, these are the top use cases that have come to my mind but I’m sure there are more out there – please let me know of any in the comments below or tweet me @honytoad !