BW #38: Telework
Last week, the Bureau of Labor Statistics shared data regarding telework — that is, people who work part- or full-time from outside of the office. Which occupations lend themselves more to such work?
Reminder: PythonDAB 4
The fourth cohort of PythonDAB, my Python Data Analytics Bootcamp, will be starting on November 8th. It’s an online, four-month introduction to Python, Git, NumPy, and Pandas with twice-weekly office hours and projects with real-world data.
PythonDAB is your chance to get personal coaching on your Python and Pandas career with me and a small, intimate cohort of peers. Plus, it’s a lot of fun!
For more information, go to https://PythonDAB.com, where you can read about the course and sign up for a free coaching session with me, where we’ll discuss your career and whether PythonDAB is a good fit for you.
Telework
I have been working from home since 1995, when I opened my consulting practice. I always knew some other people who worked from home, but things have accelerated quite a bit in the last decade, as technology improved. I taught a number of courses for a company in London, and discovered on my second trip that I knew my way around the building better than the full-time employees who were taking the course, because they all worked from home, and almost never came into the office.
Things obviously changed quite a bit during the covid-19 pandemic, when companies were basically forced to let their employees work from home. There was some initial worry about whether people would be productive, but many companies discovered that actually, people got quite a lot done. There were other issues around collaboration and office relationships, leading many companies to start bringing people back to the office. Many of the companies for which I do Python and Pandas training now require people to come in at least three days per week, although every company and every group has slightly different policies.
The Bureau of Labor Statistics, part of the US Department of Commerce, recently released information about how many Americans are working from home ("telework," in their language), based on the Current Population Study conducted by the US Census Bureau (https://www.census.gov/programs-surveys/cps.html). They summarized the current state of affairs in a short article (https://www.bls.gov/opub/ted/2023/nearly-half-of-workers-in-financial-activities-teleworked-in-september-2023.htm), showed that about half of people working in the financial industry were working from home.
This week, we're going to look at the raw data provided by the BLS. But whereas we usually use Pandas to analyze data around here, this week we're going to use a newer library that has been getting a lot of attention, namely Polars (https://www.pola.rs/). Polars is largely written in Rust (https://www.rust-lang.org/), an language that is increasingly popular as an alternative to C, thanks to its fast execution speed, compile-time error messages, memory management, and concurrency, among other things. Polars isn't a drop-in replacement for Pandas, but because it offers a Pandas-like API, many people see it as a faster, cleaner, more efficient library with lazy loading and a smart query engine. I've been asked about Polars numerous times in the last few weeks alone, and decided that it was time to dig in and use it.
If you're interested in learning Polars, here are a few good resources on the topic:
- A good introduction from Real Python, at https://realpython.com/polars-python/
- The Polars documentation, at https://pola-rs.github.io/polars/py-polars/html/index.html
We'll be doing some simple things with Polars this week. But I promise that I'll be using Polars quite a bit more in the future, and we'll try a number of different, more advanced techniques at that time -- including comparisons between the speed of Pandas and Polars.
Data and six questions
This week, we'll look at some of the data behind the BLS article, summarized at https://www.bls.gov/cps/telework.htm . The survey has been collecting data about telework for one year, with the data publicized in monthly Excel spreadsheets. We'll look at the latest one, based on data from September 2023:
https://www.bls.gov/cps/telework/telework-tables-2023-09.xlsx
I have six questions and tasks for you this week. The main learning goal is to learn to do some basic tasks with Polars, including reading files, selecting rows and columns, and analyzing the data that we have loaded. Learning Polars can take some time, and the queries can be a bit long and complex, but the documentation is excellent, and the more experience you have with Pandas, the easier (I believe) it'll be to work with Polars.
Here are my questions; tomorrow, as usual, I'll provide detailed answers and explanations, along with the Jupyter notebook I used to solve these:
- Use Polars to retrieve the Excel file for September 2023. We're interested in the data from Table 2, showing the percentages of people working from home with different occupations. This means that we'll want the subset of the spreadsheet from G11 to K39. Rename the columns to something easier to remember and understand.
- Show only those rows having to do with major occupational categories (i.e., ignoring the subcategories). Which three had the greatest percentage of teleworkers, either part- or full-time? Show the full name of the category, along with the total percentage of teleworkers.