BW #81: School

BW #81: School

It's late August – which means, at least for many people in the northern hemisphere, that a new school year has already begun, or will be starting soon. It's still fairly quiet at the schools I pass on my morning walk, including the upper school at the end of my street. But we know from years of experience that we're now in the calm before the storm. As of a few days from now, every morning will bring a new traffic jam to our street, and every afternoon will bring a sea of teenagers to our local shopping center.

With the start of school in the air, I thought it would be interesting and appropriate to look at education-related statistics for countries and regions from all over the world.

Data and six questions

This week's data comes from the World Bank (https://worldbank.org), which lends money to countries that want to invest in infrastructure and education. The data is from the World Bank's page for educational data, at:

https://data.worldbank.org/topic/education

To get the data, click on the "CSV" link on that page, or use the following link instead:

https://api.worldbank.org/v2/en/topic/4?downloadformat=csv

This will download a zipfile containing three CSV files to your computer. We'll use those files to answer a number of education-related questions. The files are:

  • The data itself
  • Metadata about the indicators
  • Metadata about the countries

The filenames are all very long, but you can readily identify the metadata-related files, because their names start with the word "Metadata". The third file, whose name starts with API_4, contains the main data.

This week's learning goals include joins, grouping plotting, and filtering columns and rows.

Here are my six questions and tasks for this week. I'll be back tomorrow with my complete answers, as well as a downloadable Jupyter notebook with my solutions:

  • Import the main file into a data frame. Set its index to be the "Country Code" column, drop the "Unnamed: 68" column, and keep only those rows where "Indicator Code" starts with "SE.".
  • Import the two metadata files into data frames, and join them together with the main data frame. Remove any columns that start with the word "Unnamed". Then make the country code into the index.