BW #45: Netflix
Netflix has always been extremely secretive about who is watching what shows. Until last week, that is, when it released viewing data for the first half of 2023 — data that we'll explore with Pandas.
Remember what it was like, back in olden times, to watch a movie at home? You had to leave your house, go all the way to the local video store, accept the fact that your first choices weren't in stock, and return home with something that you hoped would be good. You then had a handful of days before needing to return it.
Things got significantly better with Netflix: They had an almost infinite selection and let you prioritize your selections via a spiffy app. When we lived in Chicago, we would receive most DVDs within two days of them being sent out. It was similarly quick to return them by mail. This system was a big step up in convenience and selection.
Things got even better when Netflix launched video streaming. Pretty soon, the selection was massive, with a growing number of productions that were exclusive to Netflix.
What did people actually watch on Netflix? That was a closely guarded secret. After all, why would you tell people which shows were popular, and which ones weren't, if you didn't have to? A private, subscription-only service could remain mum about the popularity of various shows, thus making it seem like they were all big hits.
Netflix did give us some hints as to what was popular with its "top 10" (https://www.netflix.com/tudum/top10/) and "most popular" (https://www.netflix.com/tudum/top10/most-popular?week=2023-12-03) lists, but they didn't go nearly as far as people want.
However, not all is well in the streaming world: Netflix now has a number of competitors, including Amazon, Apple, and Disney. And streaming, while popular, isn't necessarily profitable. To offset production costs, and make subscriptions more affordable to consumers, several streaming platforms have begun to offer cheaper, advertising-supported plans.
Would you want to advertise on a platform that didn't tell you how many viewers a given show had? Probably not, which is why Netflix's announcement that it would include advertising was accompanied by commentary predicting they would have to share more detailed viewership data with advertisers and investors.
And indeed, on December 12th, Netflix released a report (https://about.netflix.com/en/news/what-we-watched-a-netflix-engagement-report) about what people had been watching during the first half of 2023. Not only did they report this information, but they also provided (limited) raw data in Excel format for us to look over. The Washington Post, among other publications analyzed the data in a story (https://www.washingtonpost.com/entertainment/tv/2023/12/19/netflix-viewership-report-seven-takeaways/).
This week, we'll explore the data set that Netflix provided, and see if we can learn anything interesting from it, limited as it might be.
Data and 8 questions
We're looking at the data set that Netflix has provided. Their "engagement report" is at https://about.netflix.com/en/news/what-we-watched-a-netflix-engagement-report, but what really interested me was the data itself, which we can download in Excel format from here:
This week, I have 8 questions to ask you about the data. Along the way, we'll explore working with Excel files, handling date/time information, working with text, applying external Python packages to data frames, grouping, and plotting.
As always, I'll be back tomorrow with my solutions, including my Jupyter notebook.
- Read the data from Excel into a data frame. Parse "Release Date" as a date.
- Which columns, if any, have missing data? Is this significant?