BW #91: Roller coasters

BW #91: Roller coasters

[I'll soon be starting the sixth cohort of my four-month, online Python Data Analytics Bootcamp. This is the best (and most intensive) course that I offer, giving you a chance to level up your Python and Pandas knowledge with frequent meetings, interactions, and collaboration. Learn more at https://PythonDAB.com, or join me at a free webinar I'm giving tomorrow (Thursday) at 7 p.m. in Israel, aka 12 noon Eastern: https://store.lerner.co.il/pythondab-6-info-webinar-1 . ]

This week's news has been a bit heavy in both Israel and the US. So I decided not to do the topic that I originally considered, instead looking at something fun and whimsical – roller coasters.

I haven't been on that many roller coasters, but do enjoy them, along with other amusement-park rides. So I decided that I'll choose a topic for this week that put a smile on my face, and a pit in my stomach.

Data and six questions

This week's data comes from RCDB, the Roller Coaster Database, at https://rcdb.com . I had heard of RCDB before, and maybe I had explored it, but ... wow, I might enjoy going on roller coasters every so often, but the people at RCDB take this topic quite seriously.

The data will come partly from RCDB, including its overall location data (https://rcdb.com/location.htm), along with data about the Wikipedia entry about the G20 countries (https://en.wikipedia.org/wiki/G20).

Here are six tasks and questions. The learning goals for this week include scraping data from Web sites, cleaning data, text encoding, joining, method chaining, and crosstabs.

I'll be back tomorrow with my solutions, including a downloadable Jupyter notebook.

  • The "Locations" page at RCDB, https://rcdb.com/location.htm, lists every country and territory, and the number of roller coasters that it has. Create a Pandas data frame from the table on this page. Make the "Location" column the index, and make the "Roller Coasters" column an integer value, treating missing values as 0.
  • Three country names came out garbled, with non-alphabetical characters. Which ones? How can we resolve this?