Plot graphs in Python

Sharing is caring!

Last Updated on July 14, 2022 by Jay

Excel makes plotting a graph very easy. So does Python! Today we’ll take a quick look at how to plot graphs in Python.

This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.

Excel makes pretty graphs, why bother using Python?

We are in the Internet age. Everything is online – the Internet is inevitably the largest public database out there. One thing that makes Python the superior plotting tool (to Excel) is that we can get data easily from the Internet then plot it using Python. If we need to use some online data and want to plot in Excel, what do we do? Maybe download it to our laptop, then graph it. Or maybe use clunky VBA or PowerQuery to get the data then graph it. I’m sure those are not good experiences if you have done them before. That’s why we should use Python for seamless and painless data extraction, manipulation, and plotting!

Prepare a dataframe for demo

You don’t believe getting data from the Internet is easy using Python? Let’s take a look…We’ll use the John Hopkins University’s COVID19 database to plot the confirmed cases over time for this tutorial. Their daily updated global COVID confirmed cases file can be found here: https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv.

We’ll use the pandas library to process the data. And we’ll use 1 line of code to get the data into a table-like format into Python.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')

>>> df
    Province/State      Country/Region        Lat  ...  9/1/20  9/2/20  9/3/20
0              NaN         Afghanistan  33.939110  ...   38196   38243   38288
1              NaN             Albania  41.153300  ...    9606    9728    9844
2              NaN             Algeria  28.033900  ...   44833   45158   45469
3              NaN             Andorra  42.506300  ...    1184    1199    1199
4              NaN              Angola -11.202700  ...    2729    2777    2805
..             ...                 ...        ...  ...     ...     ...     ...
261            NaN  West Bank and Gaza  31.952200  ...   23281   23875   24471
262            NaN      Western Sahara  24.215500  ...      10      10      10
263            NaN               Yemen  15.552727  ...    1962    1976    1979
264            NaN              Zambia -13.133897  ...   12381   12415   12523
265            NaN            Zimbabwe -19.015438  ...    6559    6638    6678

[266 rows x 230 columns]

There are many countries in the reported data, to make this tutorial easy to follow, we’ll just look at the global confirmed numbers. If you want to focus on a specific country, simply apply a filter to the dataframe for your desired country.

Since the first 4 columns are just geographical information, we can get rid of them and focus on the daily numbers only.

df = df.iloc[:,4:]
global_num = df.sum()

>>> global_num
1/22/20         555
1/23/20         654
1/24/20         941
1/25/20        1434
1/26/20        2118
             ...   
8/30/20    25222709
8/31/20    25484767
9/1/20     25749642
9/2/20     26031410
9/3/20     26304856
Length: 226, dtype: int64

Now we have a 1-dimensional table – dates and the the confirm COVID cases on the corresponding date. We’ll use this to plot the global COVID cases over time. pandas depends on another library called matplotlib for plotting, so we’ll have to import that as well. Otherwise, your pandas plot doesn’t show up. If you haven’t already, pip install it first. By convention, we rename the matplotlib.pyplot as plt.

pip install matplotlib

pandas provides a convenient way to plot graphs directly from a dataframe, so all we need is dataframe.plot(). But we have to remember to let matplotlib display the plot after we draw it, and that is the magic word plt.show().

import matplotlib.pyplot as plt
global_num.plot()
plt.show()
pandas plot on global COVID confirm cases over time
pandas plot on global COVID confirm cases over time

Quite impressive already considering we only used 2 lines of code (including the magic word), we didn’t even tell pandas which column is x-axis and which one is the y-axis! We’ll talk about how to make prettier graphs in the next couple of chapters.

Leave a Reply

Your email address will not be published. Required fields are marked *