Fully Automated Tableu Dashboard Using Python, SQL, and Excel

This project is designed to automate writing files and updating a Tableau dashboard using a Python script, SQL server, and Excel files. In this scenario the company is in May 2018, and just received the report for June. The script is written such as it reads every file in a folder, thus it is possible to modify the script to process data with any granularity, not just monthly reports.

The dataset is a summary report of a US superstore in 2017-2018. In total there are 2930 rows with 26 columns. The dashboard is made using Tableau, which is connected to a Postgres SQL database, while the file reader is made using a Python script. The script reads a .csv Excel file and automatically write this into the SQL server, which Tableau then automatically update the changes. In practice it is relatively straightforward to read any file, such as pdf or .txt file, as long as they are in a tabular-friendly form or even updating the SQL database directly depending on the data source.

The Python Script

The script utilises pandas and SQLAlchemy. It reads every .csv file using glob in the same folder as it and creates a DataFrame which then got appended into the SQL database. Because it uses a DataFrame object pandas can easily read any tabular file, such as .txt files. After its done reading, the file is then moved to a folder so it can read new files. The script can also be modified to be executed automatically as soon as there is a new file arriving in the folder.

A drawback is that the new files must match the column format of the previous ones, or the script will fail because the columns must match the columns in the SQL database. It is possible to modify the script to take into account changes in the columns by renaming the DataFrame columns into the desired format before appending into the SQL database. Though if the number of columns varies this process can be more complicated. In essence consistency between the files is important to ensure the script works properly.

import pandas as pd
import os
import glob
import shutil

path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
#folder destination when done reading
destination = '/Users/39mac/Documents/Projects/Project FORTRAN/Project Tukey/pipeline/file reader/done' 


for f in csv_files:

    # read the csv file
    df = pd.read_csv(f)

    #move file to folder when done
    shutil.move(f, destination)

import sqlalchemy, sqlite3
from sqlalchemy import create_engine
import psycopg2

engine = create_engine("postgresql://localhost/superstore")
df.to_sql('data', engine, if_exists='append', index=False)

The Tableau Dashboard

The dashboard is made using Tableau, which is connected to a localhost Postgres SQL server. Tableau has a great built-in function for connecting to an SQL server, and because SQL and Python work well together as well, a Tableau dashboard is one of the best options for automatically visualising data. Here after the SQL has been updated we must refresh the data in Tableau manually, although this process can be done automatically, say every 5 minutes, if we have a more granular data.

The dashboard has great interactability. Here we can see how monthly sales compare to last year, filter by department, consumer segment, or state, and more. It is possible to adjust the dashboard with any desired metrics, even with a different datasets, for visualising trend and comparison.

Comparing sales between May 2018 and May 2017. Sales in the former is up 46.9%

Data for office supplies in Texas for the home office consumer segment.

Previous
Previous

Amazon Reviews Sentiment Analysis Using Machine Learning Classification Algorithms