Tableau Integration with Python
During my Corporate Tableau Training in Gurgaon , i get questions many time regarding Python Integration with Tableau . This course is intended to show how to leverage Python to extend Tableau capabilities and visualize outputs from Python.
This is not a Tutorial on Python.
What is Python
-
Python is a widely used general-purpose programming language, popular among academia and industry alike.
-
It provides a wide variety of statistical and machine learning techniques, and is highly extensible.
-
Together, Python and Tableau is the data science dream team to cover any organization’s data analysis needs.
-
In 2013 Tableau introduced the R Integration [since Tableau 8.1], the ability to call R scripts in calculated fields using R Server [ Rserve()].
-
With the release of Tableau 10.1, you can use Python scripts as part of your calculated fields in Tableau, just as you’ve been able to do with R
-
The Python Integration happens through the Tableau Python Server – [ TabPy ] .
-
Tableau works with both the version of Python i.e. 2.7 and 3.x. Python 3.x support has been recently added.
You can install Tableau Python Server by either of the below methods.
TabPy Installation Method 1: (using pip install)
If you are familiar with Python environments and have already set one up or prefer not to use Anaconda and just want to start the server process, you can skip the setup script, install the dependencies and run the process directly from the command line.
The manual installation instructions assume either Conda or Python are defined as part of PATH. It is optional but recommended you create a new Conda environment for this project:
conda create –name Tableau-Python-Server python=2.7 anaconda
The example above creates a Python 2.7 environment but Tableau Python Server is supported on both Python 2.7+ and Python 3.5+.
Now activate the newly created environment. On Windows:
activate Tableau-Python-Server
You can install TabPy by simply running the following command from cmd.
pip install tabpy-server
As the packages are installed, you will see the install locations listed in the command line. These might look like /Users/username/anaconda/envs/Tableau-Python-Server/lib/python2.7/site-packages or \Users\username\anaconda\envs\Tableau-Python-Server\lib\site-packages depending on your environment.
cd c:\Users\User Name\Anaconda\envs\Tableau-Python-Server\Lib\site-packages\tabpy_server
And type → startup and press enter
Navigate to the tabpy_server folder under site-packages and run startup.bat or startup.sh on Windows and Linux/MacOS respectively. You can specify a custom port number as an argument e.g. startup.bat 9001.
You can stop the server simply by terminating the process (for example, with CTRL-C) or Closing the window.
If you are not able to install TabPy from pip install , you can do the installation as below
TabPy Installation Method 2: From Tableau GitHub Site
-
You can Download TabPy from the Tableau’s Git Hub account : https://github.com/tableau/TabPy
-
Click on the Clone or download button in the upper right corner ( see below ) of the TabPy repository page, downloading the zip file and extracting it.
Extract TabPy-master.zip
-
Within the TabPy-master directory, execute setup.bat .
The script carries out the following steps:
-
Downloads and installs Anaconda, under the current user account for example C:\users\yourUserName\ unless Anaconda is in the PATH or has a registry entry. Anaconda provides an exhaustive set of Python packages, including ML-related libraries that can be used by Python code executed by the server.
-
Creates a Python environment named Tableau-Python-Server if one doesn’t already exist. The script then activates the environment.
-
Installs the required Python packages into the new environment, as well as the client package, as it contains common functionality that the server depends on.
-
Initializes the server, displays the install location and instructions on how to start the server next time.
-
This script downloads and installs Python, TabPy and all necessary dependencies.
-
After completion, TabPy starts up and listens on port 9004 or the custom port number that you specified while installing TabPy server
-
When you get the below message it means the python server is successfully installed and running fine.
-
Also Note the path for starting python server next time or else you will keep doing the same process again and again
-
So only the first time the python server takes time to install. From next time you need to go to your location as above and do startup and this gets started instantaneously.
-
So for next time to start python tableau server (tabpy) ,go to the below path
cd c:\Users\User Name\Anaconda\envs\Tableau-Python-Server\Lib\site-packages\tabpy_server
And type → startup and press enter
Setting Up Tableau Desktop with Python
Configure a TabPy Connection on Tableau
On the Help menu in Tableau Desktop choose Settings and Performance > Manage External Service Connection to open the TabPy connection dialog box.
Enter or select a server name using a domain or an IP address.
The drop-down list includes localhost and the server you most recently connected to.
Specify a port. Port 9004 is the default port for TabPy servers.
Click Test Connection.
Click OK.
Pass Expressions to Python
-
In order to let tableau know that the calculations need to go to Python, it must be passed through one of the 4 functions.
-
These 4 functions are : SCRIPT_BOOL , SCRIPT_INT , SCRIPT_REAL , SCRIPT_STR
-
Python Functions are computed as Table calculations in Tableau.
-
Since these are table calculations, all the Fields being passed to Python must be aggregated like Sum(PROFIT), MIN(Profit), Max (Profit), ATTR( Category) etc.
Python Functions in Tableau
Run a Python script on Tableau
SCRIPT_BOOL
Returns a Boolean result from the specified expression. The expression is passed directly to a running external service instance. In Python expressions, use _argn (with a leading underscore ) to reference parameters ( _arg1, _arg2, etc.).
-
In this python example, _arg1 is equal to SUM([Profit])
-
All the Fields being passed to python must be aggregated like Sum(PROFIT), MIN(Profit), Max (Profit), ATTR( Category) etc..
SCRIPT_BOOL : Finding Profit Greater Zero by python
PythonBoolPositive : Python Calculated Field Function Code
SCRIPT_BOOL(“
lst= []
for i in _arg1 :
lst.append(i>0)
return lst
“,
SUM([Profit])
)
Another Simpler Approach :
SCRIPT_BOOL(“return [i > 0 for i in _arg1]”,SUM([Sales]))
SCRIPT_INT – Example → Multiply Sales with 2 from Python
PythonIntegerMultiplyBy2 : Python Calculated Field Function Code
SCRIPT_INT(“
lst= []
for i in _arg1 :
lst.append(i*2)
return lst
“,
SUM([Sales])
)
Another Simpler Approach
SCRIPT_REAL(“return [i * 2 for i in _arg1]”,SUM([Sales]))
SCRIPT_REAL – Example – Finding log of sales by Python
PythonRealLog : Python Calculated Field Function Code
SCRIPT_REAL(“
import math
lst = []
for i in _arg1:
if math.isnan(i):
lst.append(0)
else :
lst.append(math.log(i))
return lst”,
SUM([Sales])
)
Another Simpler Approach
SCRIPT_REAL(“import math
return map(math.log,_arg1)”,SUM([Sales]))
SCRIPT_STR – Concatenate Two Strings using Python
PythonStringConcatenate : Python Calculated Field Function Code
SCRIPT_STR(“
lst= []
for i in range(0,len(_arg1)) :
lst.append(_arg1[i]+_arg2[i])
return lst
“,
ATTR([Category]),ATTR([Sub-Category])
)
Finding correlation coefficient of Sales & Profit by Python
PythonCorrCoeff: Python Calculated Field Function Code
SCRIPT_REAL(“
import numpy as np
return np.corrcoef(_arg1,_arg2)[0,1]”,
SUM([Sales]),sum([Profit])
)