[Don't forget that my new "Hands-on Projects with Python" course starts on Sunday! Learn more at https://speakerdeck.com/reuven/hoppy-1, or sign up at https://store.lerner.co.il/hoppy-1.]
The United States is a complex country in many ways. One of those is the fact that you have overlapping jurisdictions -- a combination of federal, state, and local laws. This extends to taxation: Everyone has to pay (or at least report) their income to the IRS, the federal internal revenue service. But many states levy their own income tax. (There are other taxes as well, including property and sales taxes.)
Last week, I read that the Tax Foundation (https://taxfoundation.org/) a think tank that monitors US tax policy -- and which is generally against taxes -- had released a new report on income tax rates in various states. I honestly don't know much about the Tax Foundation, but I thought that it might be interesting to analyze their data. Then I looked at their spreadsheet, and decided that this was definitely worth looking into, mostly because the spreadsheet offered so many opportunities to clean up real-world data.
This week, we're thus looking at data from the Tax Foundation's latest report. After reading the data into Pandas, we'll ask some questions and make some comparisons across states, in order to better understand US state income taxes.
Data and five questions
This week's data is related to the Tax Foundation's recent report on local income taxes:
https://taxfoundation.org/research/all/state/local-income-taxes-2023/
I'll admit off the bad that I'm far from an expert on this subject; I'll concentrate on the Pandas aspect of things, which I know better than the intricacies of US tax law. To their credit, the Tax Foundation provides an Excel spreadsheet with a full list of state income tax brackets (for single filers and married couples), as well as any deducation they might be allowed or any personal exemption. You can download it from:
The spreadsheet includes a huge number of footnotes; for the sake of simplifying our analysis, we are going to completely ignore these, something you shouldn't do when you're calculating your taxes.
Learning goals this week including working with Excel files, cleaning data, regular expressions, and handling multi-indexes.
Paid subscribers can download the data file from a link at the bottom of this message.
Here are the five questions for this week:
- Read the Excel spreadsheet into a Pandas data frame. The index should contain state names (well, abbreviations) from the "State" column, without the footnotes. Remove the footnote lines at the end. You can remove the columns containing ">" symbols. You can treat both "none" and "n.a." as 0 values. Cells that contain text (i.e., not numbers) can be turned into 0.
- Which state has the highest rate for single filers? Which states have no income tax at all?