BW #67: Electric cars (solution)

BW #67: Electric cars (solution)

This week, we looked at electric cars — where people are (and aren’t) buying them, and how much the picture has changed in the last few years.

This topic was inspired by a combination of current events — specifically, the Biden Administration’s plans to levy massive tariffs on Chinese-made electric cars (https://www.nytimes.com/2024/05/15/business/economy/china-electric-vehicles-biden-tariffs.html?unlocked_article_code=1.t00.LpnP.Iciy7ilqMVES&smid=url-share), along with my surprise at seeing so few electric cars on American roads, when compared with my experience in Israel, Iceland, and Europe.

Data and six questions

This week's data comes from the most recent “Global EV outlook” released last month by the International Energy Agency (IEA), available at https://www.iea.org/reports/global-ev-outlook-2024. The data is downloadable from their "data explorer" page:

https://www.iea.org/data-and-statistics/data-tools/global-ev-data-explorer

You can download the data, or parts of it, by going to the above URL and selecting "ev sales," "cars," and "world" on the pull-down menus. Then click on "download data," and the CSV file should be downloaded to you.

Here are my six tasks and questions for this week. They have to do with filtering, grouping, multi-indexes, pivoting, piping, and also styling Pandas data frames. As always, I'll be back tomorrow with my full solutions, including the Jupyter notebook I used in my solution:

Read the EV data into a data frame. Remove rows from the "world" region. Keep only those rows with "EV sales" and "EV stock" parameters.

Let’s start, as usual, by loading Pandas:

import pandas as pd

With that in place, we can then load the CSV file into a data frame, using “read_csv”:

filename = 'IEA-EV-dataEV stock shareHistoricalCars.csv'


df = (
    pd
    .read_csv(filename)    
)

However, we only want to keep those rows in which the “parameter” column has a value of either “EV sales” or “EV stock”. I’ll use “.loc” to retrieve those rows that match this description by passing it a “lambda” expression, one that gets a data frame as an argument. I’ll use a parameter name of “df_” to indicate that it’s a local variable temporarily assigned a data frame.

But what should our “lambda” expression do? Return a boolean series, indicating where the row contains the value “EV sales” or “EV stock”. There are a few ways we can do that, but the “isin” method, passed a list of strings, is my favorite way to do this:

df = (
    pd
    .read_csv(filename)    
    .loc[lambda df_: df_['parameter'].isin(['EV sales', 'EV stock'])]
)

The resulting data frame now only has “EV sales” and “EV stock” rows. However, I also asked you to remove rows in which the “region” column contains the value “World”. We can use a similar “lambda” expression there:

df = (
    pd
    .read_csv(filename)    
    .loc[lambda df_: df_['parameter'].isin(['EV sales', 'EV stock'])]
    .loc[lambda df_: df_['region'] != 'World']
)

The resulting data frame, which we have assigned to “df”, now contains regions that aren’t “World” and with only EV sales and EV stock info.

Create a data frame showing the number of cars sold each year, in each country, with a BEV ("battery electric vehicle") powertrain.

To create such a data frame, we’ll need to take our existing “df” and then cut it down a bit. First, we’ll keep only those rows in which the “parameters” column contains the value “EV sales”, again using “loc”:

(
    df
    .loc[lambda df_: df_['parameter'] == 'EV sales']
)

Next, we’ll keep only those rows in which the “powertrain” column contains the value “BEV”, for “battery electric vehicle:

(
    df
    .loc[lambda df_: df_['parameter'] == 'EV sales']
    .loc[lambda df_: df_['powertrain'] == 'BEV']
)

But now we need to pull out the big guns. We want to get a new data frame in which:

  • The index contains the years, the distinct values from the “year” column
  • The column names are the distinct values from the “region” column
  • The values are taken from the “value” column

To do this, we can create a pivot table on “df”, using the “pivot_table” method:

(
    df
    .loc[lambda df_: df_['parameter'] == 'EV sales']
    .loc[lambda df_: df_['powertrain'] == 'BEV']
    .pivot_table(index='year',
                columns='region',
                values='value')   
)

The result is a rejiggering of our data, letting us see just how many EVs were sold in each country over the last number of years.

In many cases, the differences are staggering — such as 4 (yes, four) vehicles in Sweden in 2010, and 110,000 in 2013. Or 1,100 vehicles in China in 2010, versus 5,400,000 in 2013.