AdventureWorksDW – Python Series#1 – Computing & Plot Yearly Sales for a Territory
Prerequisites To Follow this Exercise :
- Microsoft SQL Server Database Express Edition & Adventure Works DataWarehouse – If you Don’t have a Microsoft SQL Server express Database and want to install it in your system and also install AdventureWorks DW , Follow –https://instrovate.com/2019/05/22/download-install-free-microsoft-sql-server-install-adventureworks-database-data-warehouse/
- Python Installed in your System : If you are a new user to Python and want to know how to install Python via the Anaconda Distribution , You can go through the step by step Blog i have written to install Python via Anaconda Distribution & start using Jupyter Notebook : https://instrovate.com/2019/06/09/python-anaconda-distribution-how-to-download-and-install-it-and-run-the-first-python-program/
Once you have the Microsoft SQL Server Express Edition and Python Installed in your system you are Good to Go ahead and follow the below Use Case and Example.
Computing & Plot Yearly Sales for a Territory in AdventureWorksDW – Python.
- The SQL Server 2017 is used as a back end to store AdventureWorksDW
- We will connect Python to connect to SQL Server to fetch the required data
- We will use pyodbc library of python to connect to SQL Server from Python
- The territory for which we will be computing Sales is United Kingdom
The transaction of orders are stored in fact table named FactInternetSales in the Database AdventureWorksDW.
The column SalesAmount in the table FactInternetSales has the sales amount of the order in consideration.
The dimension table DimSalesTerritory has all the details of territories from which orders are being received.
As marked above the column SalesTerritoryKey is the joining column between the fact and dim table FactInternetSales and DimSalesTerritory. The column SalesTerritoryCountry has the country name. For our problem we will take United KIngdom as the territory for yearly sales analysis.
So, our final output which will be having yearly sales of the territory named United Kingdom will be having data from two tables i.e. FactInternetSales and DimSalesTerritory.
So, the sql query to fetch the desired details are as follows :
Year is the sql server inbuilt function to fetch the value of year from the date field. IN the query summation is performed on column SalesAmount of the table FactInternetSales . And hence, using the other columns in the group by clause which are present in select clause.
To learn how to connect python to sql server you can refer to the below Instrovate Technologies Blog:
Below is the python code to solve the problem :
Code Key Points:
- After the query execution is done in python initialize two python lists namely years and total_sales.
- While iterating through each row the list years is being populated with second indexed value in the row and total_sales is being populated with fourth indexed value.
- Please note the index count starts from zero
- The matplotlib library of python is being used to plot total sales against year.
- The plot function from matplotlib.pyplot is used to plot 2 dimensional Data. For more details below blog can be referred:
After executing the above python program below is the outcome displayed :
AdventureWorks DW Series 4 : How to Create Histogram Visualization Using Python – https://instrovate.com/2019/05/20/adventureworks-dw-series-4-how-to-create-histogram-visualization-using-python/
AdventureWorks DW Series 5 : Box PLot to identify Outliers and Targeted Cutomers in Python – https://instrovate.com/2019/05/28/adventureworks-dw-series-5-box-plot-to-identify-outliers-and-targeted-cutomers-in-python/