How to query on .csv and .xlsx (excel) files.

Joy Maitra
2 min readOct 28, 2020

We who work with the data and require to run analysis on it, often come across this problem. The data from different source systems are made available in excel or csv formats. after a lot of research for solution that will let me write SQL queries on the data, I finally framed a solution that suits me hopefully might help you too. The solution is not straight forward. Will take a little time to do the initial setup. But there after it is a smooth ride. The concept is to create a python function that will open a .csv or .xlsx file and put the content into a table in a file with extension .db which will act as a sqlite3 database. Next install the open source DB tool DBeaver and connect to the database, and you are done. Now we are ready to run SQL query on the data of the files, join the data from 2 different files and analyse as much required. Below is the flow diagram for the process.

Flow diagram

The code is in the github link provided below available for download :

Download and unzip the code in any folder in your desktop. and follow the steps to setup. Only prerequisite is to have python installed in your system.

Step1: open the command prompt in the location of the unzipped files.

Step 2: create a virtual python environment using the following command : >>python -m venv test

Step 3: activate the virtual environment : >>.\test\Scripts\activate

Step 4: execute the main script >>python main.py

Step 5: the script on execution will ask for the full path of the file and the target table name

That’s it !!!

Now open DBeaver and configure the SQLite DB with the file in the Database folder of the unzipped folder.

Hope this helps !!

--

--

Joy Maitra

I am a Data Practitioner, with experience in python.