BW #99: Literacy and numeracy

BW #99: Literacy and numeracy

First and foremost: Happy New Year! I hope that 2025 proves to be a good one for you and those close to you.

Second: Earlier this week, I sent a wrap-up of what I had done during 2024. However, I accidentally set it such that only paid subscribers could read the message. No matter your subscription status, you can see what I've been up to in the last year (and what I plan to do in the coming year) here: https://www.bambooweekly.com/my-2024-in-review/

Third: If you're enjoying and learning from Bamboo Weekly, would you please share it with others? There are millions (!) of Pandas users, and most don't yet know that they can get this sort of regular practice, based on real-world problems and with real-world data.

And now, onto the questions:

This week, we're looking at the "Survey of Adult Skills" run by the Organization for Economic Cooperation and Development (OECD, https://oecd.org/), what the Economist calls a "club of mostly rich countries." This survey is part of their Program for the International Assessment of Adult Competencies, known as PIAAC.

The OECD publicized the results of its most recent survey, covering 160,000 people in 31 countries, in December. The researched asked people to perform a variety of tasks relevant to adult life in the modern world -- from reading graphs to interpreting rules to answering questions based on a thermometer. I first read about this survey in the Economist's December 14th issue, in the article, "Off the books: Are adults forgetting how to read?" (https://www.economist.com/finance-and-economics/2024/12/10/are-adults-forgetting-how-to-read). The idea is to track literacy and numeracy across countries and populations.

This week, we'll look at the data from this second round of the Survey of Adult Skills. (A first round was conducted 10 years ago, and we could theoretically use the data to see which countries are doing better and worse, but we won't be doing that.) Along the way, we'll get some insights into what kinds of skills people do well at, and how that breaks down along different ages and countries.

Data and six questions

The data this week comes from the OECD's survey. The survey's home page is at

https://www.oecd.org/en/publications/the-survey-of-adult-skills_f70238c7-en.html

The data describes the responses that nearly 160,000 people gave to the survey. The data frame will have one row per respondent, and one column per question asked, for a total of 2,483 (!) columns.

You can access and download the data by going to this page:

https://survey.oecd.org/index.php?r=survey/index&sid=424913&lang=en

You'll then have to fill out a short survey. After filling it out, you'll immediately have access to the files in CSV format (as well as SAS and SPSS formats). And yes, it's annoying to download the files in this way, but I believe that I can't link directly to the page from which the downloads take place, or make them distributable elsewhere (e.g., as a single zipfile), which would definitely be easier for everyone.

The data dictionary, in the form of an Excel file, is here:

https://www.oecd.org/content/dam/oecd/en/about/programmes/edu/piaac/data-materials/cycle-2/piaac-cy2-international-codebook.xlsx

The learning goals for this week include working with multiple files, working with oddly defined CSV files, text manipulations, joining data together, and plotting.

Here are my six tasks and questions; I'll be back tomorrow with my solutions and explanations, along with the Jupyter notebook I used to solve things myself.

  • Download all of the available PIAAC CSV files into a directory (the Netherlands hasn't yet released its data), and read all of them into a single data frame, treating all four "unknown" values as NaN -- ., .d, .n, .r, and .v.
  • The CNTRYID column indicates the country in which the respondent lives, but as an integer. Retrieve them from row 3 of the data dictionary, and add a CNTRYNAME column to the data frame so that we'll know the country's name, not just its number.