STEM with Python – Part 11: Reading an Excel File

Use Python to interact with Excel.

I know, I know – Excel is a Microsoft-exclusive software and you can never use it in Linux. That’s probably how you might have believed… until now. All thanks to super geeks in the open-source community, here are some ways you can actually interact with Excel by utilizing shellscript and Python!

In this post, we’ll learn how we can actually read Excel files with Python and shellscript in Linux. I know Excel is a Windows app and never has been officially supported on any Linux distros. But there’s always a way to interact with the Windows-exclusive app from Linux all thanks to the open-source platform’s rich resources and Python’s flexibility.

The code (shellscript):

First, let’s take a look at how we can implement the idea into shellscript.

read_excel.sh

#!/bin/bash

# Check if the required command line tools are installed
if ! command -v xlsx2csv &> /dev/null
then
    echo "Please install the xlsx2csv command first. You can install it using the command 'apt install xlsx2csv' or 'yum install xlsx2csv'"
    exit 1
fi

if ! command -v column &> /dev/null
then
    echo "Please install the column command first. You can install it using the command 'apt install bsdmainutils' or 'yum install bsdmainutils'"
    exit 1
fi

if [ $# -ne 1 ]; then
    echo "Please specify the excel file as an argument"
    exit 1
fi

# Check if the specified file exists
if [ ! -f "$1" ]; then
    echo "The specified file does not exist"
    exit 1
fi

# Convert the Excel file to CSV format and display it in the console
xlsx2csv "$1" | column -s , -t | less -#2 -N -S

Interesting point:

As you can see, what is most interesting in the code is the special and Linux-only command xlsx2csv. As the name suggests, the command will convert an excel file into a CSV format and display it in the console. Cool, isn’t it

The Excel file:

The executed result:

And this is the executed result. It successfully displayed the file content to the console. This is something I’ve never seen before… Interesting!!

$./read_excel.sh sample.xlsx
      1 Apple  Orange  Water melon
      2 1      13      25
      3 2      14      26
      4 3      15      27
      5 4      16      28
      6 5      17      29
      7 6      18      30
      8 7      19      31
      9 8      20      32
     10 9      21      33
     11 10     22      34
     12 11     23      35
     13 12     24      36
     14 13     25      37
(END)

The code (Python):

Now, we’re going to take a look at how we do it in Python, and the both executed result and the approach to interact with the file are quite different from the shellscript.

import pandas as pd
import sys

if len(sys.argv) < 2:
    print("Please specify the excel file as an argument")
    sys.exit(1)

try:
    df = pd.read_excel(sys.argv[1], engine='openpyxl')
    print(df)
except FileNotFoundError:
    print(f"Error: The file {sys.argv[1]} could not be found.")
    sys.exit(1)
except Exception as e:
    print(f"An error occurred while reading the file: {e}")
    sys.exit(1)

As you can see, we take advantage of Python’s powerful libraries – here we use pandas and openpyxl. As we’ve seen pandas in STEM with Python – Part 7: Pandas and Pandasql, I assume you know what it is. What needs your attention is the other one – openpyxl. Openpyxl is a library responsible for interacting with Excel and other open-source spreadsheets, such as LibreOffice.

As it specifies the library in the following line as its engine to interact with the file, it is a part of pandas – pandas are basically SQL of Python (correct me if I’m wrong XD).

df = pd.read_excel(sys.argv[1], engine='openpyxl')

The executed result:

While shellscript converted the Excel file into CSV format, Python directly interacts with the file all thanks to the Pandas’ power and outputs the scanned result on the console.

$python3 read_excel.py sample.xlsx
    Apple  Orange  Water melon
0       1      13           25
1       2      14           26
2       3      15           27
3       4      16           28
4       5      17           29
5       6      18           30
6       7      19           31
7       8      20           32
8       9      21           33
9      10      22           34
10     11      23           35
11     12      24           36
12     13      25           37

Afterthoughts:

I have a personal history with openpyxl. When I was looking for a new job as a programmer back in 2019, I was coding and experimenting with my ideas in Python. Since I already mastered VBA at that time, what I did was re-programmed my VBA projects in Python/openpyxl.

And eventually, my programming activities somewhat attracted my current employer, and got this job. And personally, I love spreadsheets, both Excel and LibreOffice Calc, and love the interaction with lots of data. So, it’s always enjoyable to play with spreadsheets, numbers, and data!

Kee coding and be happy!

Leave a Reply