Fully Automated Tableu Dashboard Using Python, SQL, and Excel
This project is designed to automate writing files and updating a Tableau dashboard using a Python script, SQL server, and Excel files. In this scenario the company is in May 2018, and just received the report for June. The script is written such as it reads every file in a folder, thus it is possible to modify the script to process data with any granularity, not just monthly reports.
The dataset is a summary report of a US superstore in 2017-2018. In total there are 2930 rows with 26 columns. The dashboard is made using Tableau, which is connected to a Postgres SQL database, while the file reader is made using a Python script. The script reads a .csv Excel file and automatically write this into the SQL server, which Tableau then automatically update the changes. In practice it is relatively straightforward to read any file, such as pdf or .txt file, as long as they are in a tabular-friendly form or even updating the SQL database directly depending on the data source.
The Python Script
The script utilises pandas and SQLAlchemy. It reads every .csv file using glob
in the same folder as it and creates a DataFrame
which then got appended into the SQL database. Because it uses a DataFrame
object pandas can easily read any tabular file, such as .txt files. After its done reading, the file is then moved to a folder so it can read new files. The script can also be modified to be executed automatically as soon as there is a new file arriving in the folder.
A drawback is that the new files must match the column format of the previous ones, or the script will fail because the columns must match the columns in the SQL database. It is possible to modify the script to take into account changes in the columns by renaming the DataFrame
columns into the desired format before appending into the SQL database. Though if the number of columns varies this process can be more complicated. In essence consistency between the files is important to ensure the script works properly.
import pandas as pd
import os
import glob
import shutil
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
#folder destination when done reading
destination = '/Users/39mac/Documents/Projects/Project FORTRAN/Project Tukey/pipeline/file reader/done'
for f in csv_files:
# read the csv file
df = pd.read_csv(f)
#move file to folder when done
shutil.move(f, destination)
import sqlalchemy, sqlite3
from sqlalchemy import create_engine
import psycopg2
engine = create_engine("postgresql://localhost/superstore")
df.to_sql('data', engine, if_exists='append', index=False)
The Tableau Dashboard
The dashboard is made using Tableau, which is connected to a localhost Postgres SQL server. Tableau has a great built-in function for connecting to an SQL server, and because SQL and Python work well together as well, a Tableau dashboard is one of the best options for automatically visualising data. Here after the SQL has been updated we must refresh the data in Tableau manually, although this process can be done automatically, say every 5 minutes, if we have a more granular data.
The dashboard has great interactability. Here we can see how monthly sales compare to last year, filter by department, consumer segment, or state, and more. It is possible to adjust the dashboard with any desired metrics, even with a different datasets, for visualising trend and comparison.