BW #98: Retail sales
The year is nearly over, which means that retailers are doing everything they can to push us to buy things. Whether it's Black Friday (which has spread far outside of the borders of the United States, where Thanksgiving is celebrated), Christmas, Hanukkah, or just a generic end-of-year sale, we're seeing lots of retail pushes.
This is certainly true where I live in Israel, even though neither Christmas or January 1st is a national holiday, and where gift giving on Hanukkah is far less common than in the United States. The reason has to do with accounting; at the end of the year, retail stores need to count their inventory, and report on its value to the government. Stores will often have big sales before the end of the year to clear out as much inventory as possible – then close for up to a day, to count and report on whatever is left.
I thought it might thus be appropriate to look at retail sales this week. For the sake of variety, since I often look at data from the United States, I instead decided to look at information from the United Kingdom, as provided by ONS, the Office of National Statistics (ONS, https://www.ons.gov.uk). They track a wide variety of businesses, looking at how much they're selling, how that compares with the previous month, and also how it compares with the same month in the previous year. Looking at this data can give us a sense not only of how well businesses are doing, but also which sectors are doing better (and worse),
Data and six questions
This week's data comes from the "Retail Sales Index time series" data set from the ONS, which captures recent retail sales. You can get the data from:
https://www.ons.gov.uk/businessindustryandtrade/retailindustry/datasets/retailsales/current
The most recently updated data files are available under the "Latest version" headline at the top of the page; I downloaded the Excel file via the link there, which is:
Sadly, there doesn't seem to be a single data dictionary for this data set. The column names tend to be long and descriptive, though, so it's not as much of a problem as you might think. That said, the structure of the data set is a bit odd, as you'll soon see.
Learning goals for this week include: Data cleaning, plotting, working with time series, and window functions.
I'll be back tomorrow with my detailed solutions and explanations, along with a downloadable version of the Jupyter notebook that I used to solve these problems.
- Read the data from the Excel spreadsheet into a Pandas data frame. Keep the (long!) titles as the column names. Keep only those rows where there is a year and a three-letter month name, such as "2024 NOV" (the most recent one), turn those into datetime values, and make them the index.
- The dataset combines a variety of different types of information, each measured in a different way. Among them are (a) percentage change since the prevous month, (b) percentage change since the same month one year ago, (c) relative to a base index, seasonally adjusted, (d) relative to a base index, not seasonally adjusted, (e) in absolute terms, typically reported in millions of pounds (£), seasonally adjusted, and (f) in absolute terms, typically reported in millions of pounds (£), not seasonally adjusted. How many columns are in each of these categories?