During my Corporate Tableau Training in Gurgaon , i get questions many time regarding Blending in Tableau .
Tableau comes with the feature of “Cross Database Joins” that allows you to create Joins from Data fetched from different data source. We have already covered “Cross Database Joins in tableau 10.0” in our earlier post in this blog. If you have not gone through it I would suggest you to visit the post once before you go through this so as to make complete sense.
For Corporate training and Online Training contact at info@instrovate.com
The direct link to the post is
https://technicaljockey.com/tableau-training-consultant/cross-database-join-in-tableau-10-0/
Once you go through cross database joins in Tableau , the obvious question that pops up in our mind is, are we done with Blend. Would we not be requiring Blend any further? Or If we still need blend what would be the scenarios in which we would still need to create Blend.
So we will go step by step and cover both Blend and Joins with a Hands On Example so that it all makes sense.
First of all, what is Blending? The definition from tableau guide “Data blending is a method for combining data that supplements a table of data from one data source with columns of data from a second data source.”
In Tableau we generally prefer Joins for this kind of Data combining, however for the Below two Scenarios we will still have to use “Blending”
-
Data is at Different Level of Detail say for e.g. Actual Sales Data that we receive are on City Level while the Planned Sales Data is at State level. In such scenarios creating join would cause Duplicate data and since Blending works after aggregation, Blending would be required.
-
If we want to combine Data from Different Data Source that is still not supported by cross Database Joins.
To have a hands on of the first scenario i.e. Data is at Different Level of Detail , we will go through the example of Sample Super Store Actual Sales are at City Level / Postal Code Level , while the Planned Sales that we get is say at state level .
For this example, we already have the Sample Super Store Data where we have the Actual Sales on Postal Code Level. We will quickly create a Planned Sales Data at State Level so that we can create a Blend between the two data set and analyze the result.
Creating Planned Sales Data
Connect Tableau to Sample Super Store , and create a visualization of State and Actual Sales.
[ akriti.lal@TechnicalJockey.com ]
Export this data into excel sheet, and just to create some planned sales data, we will generate a random number using the rand () in excel, add 1 to it, and then multiply this to actual sales to generate the Planned Sales.
There after we can copy and paste only the value of planned sales in one column and remove the other columns to have the state and planned sales.
State |
Planned Sales |
Alabama |
21507 |
Arizona |
36160 |
Arkansas |
13052 |
California |
852693 |
Colorado |
35660 |
Connecticut |
24736 |
Delaware |
28576 |
District of Columbia |
5506 |
Florida |
148570 |
Georgia |
77233 |
Idaho |
8214 |
Illinois |
86360 |
Indiana |
88142 |
Iowa |
6561 |
Kansas |
4740 |
Kentucky |
48421 |
Louisiana |
11960 |
Maine |
2490 |
Maryland |
24795 |
Massachusetts |
52479 |
Michigan |
94419 |
Minnesota |
52313 |
Mississippi |
13112 |
Missouri |
29685 |
Montana |
9800 |
Nebraska |
10413 |
Nevada |
24066 |
New Hampshire |
12941 |
New Jersey |
67531 |
New Mexico |
6777 |
New York |
517621 |
North Carolina |
108422 |
North Dakota |
1301 |
Ohio |
96662 |
Oklahoma |
20812 |
Oregon |
19966 |
Pennsylvania |
143148 |
Rhode Island |
39893 |
South Carolina |
12271 |
South Dakota |
2597 |
Tennessee |
37711 |
Texas |
295102 |
Utah |
12713 |
Vermont |
15720 |
Virginia |
119945 |
Washington |
247698 |
West Virginia |
1881 |
Wisconsin |
54708 |
Wyoming |
2275 |
[ akriti.lal@TechnicalJockey.com ]
What we have done here has nothing to do with Cross Data base joins or Blend, we are just preparing the Planned Data set to simulate the real world scenario where in we can have the Planning done at State Level, while the Actual Sales that we would get would be at Postal Code level.
Now Our Boss wants us to do the analytics and show the Planned Sales for each State and the Actual Sales next to it, so as to compare how we did compare to our planning.Now if we join our tableau to the new “Planned Sales Data” created using Add button , joins get created
There after when we visualize the Planned Sales for each state
We can see that we are not getting the correct Data , like for e.g. if you see for Alabama the Planned Sales is coming as 1,311,902 , while the actual planning (can be seen in the planned sales excel) is only 21,507. This means some duplication is happening. To find the root cause , if you drill down to the Data detail for Alabama , you can see the below
[ info@instrovate.com ]
Actually what has happened since we have created join, the planned sales 21,506.59 has been duplicated for 61 times ( as many city as state Alabama has ) and so we are getting as 1,311,902 .
So In order to do our analysis correctly what we want to do is Aggregate the “Actual Sales” data at State Level, when we Blend it with Planned Sales which is at “State Level”. So this is an example of Data at different level of details and why we would need to use Blending rather than joining. In order to proceed further go to Data Tab and remove the “Planned Sales Per State” Join Condition.
Blending to Visualize the Actual Sales and Planned Sales at State level
[ info@instrovate.com ]
In order to create Blend between Actual Sales and Planned Sales , follow the step below . Add the new Data source as below
You will be able to connect to Planned Sales excel as below
After creating this new connection, when you go to the sheet, you can see the new Data source added for creating Blend
[ info@instrovate.com ]
Now we need to define the “Edit Relationship” in order to create the Blend
This opens the “Edit relationship” window that and here we can define the blend condition which would be “State” in our case
Now before visualizing the planend sales we need to validate if the Orange Link is enabled in the secondary sheet ( Planned Sales)
There after you can drag Planned Sales as below
So we can see, we get the right Planned Sales and Actual Sales now as Blend works after aggregation. So basically when Actual Sales was blended with Planned Sales, the Planned Sales for the entire state did not get duplicated rather was calculated at the state level of aggregation.