BW #102: WordPress
Want to set up a Web site? A blog? An online store? If you answered "yes" to any of these questions, then the odds are good that you'll use, or at least consider, WordPress. WordPress started as a system that let you create, edit, and manage a blog without needing too much technical knowledge: Install WordPress, or choose a hosting provider who will do that for you, and you're able to do everything via a Web interface.
Better yet, WordPress is open-source software, meaning that anyone can download, use, install, modify, or distribute it free of charge. And over more than 20 years, a community of developers has done just that, not only improving the software but also writing many thousands of plugins and addons. Some of these plugins are paid products, while many others are provided to the community via open source. And sure, there are problems with WordPress ... but for many people, the advantages have long outweighed the disadvantages. A recent report (https://www.wpbeginner.com/research/ultimate-list-of-wordpress-stats-facts-and-other-research/) says that WordPress powers more than 40 percent of Web sites, and more than 60 percent of content-management systems. It's hard to imagine a bigger success.
But in the last few months, WordPress has become famous for something else, namely a whole lot of drama and lawsuits. Matt Mullenweg, the creator of WordPress and owner of a company (Automattic) that hosts WordPress sites at WordPress.com, accused a rival hosting company, WPEngine, of not contributing its fair share to the open-source WordPress project.
Every WordPress system depends on WordPress.org (i.e., home of the open-source project) for installing and upgrading third-party plugins. While WordPress.com and WordPress.org are theoretically distinct, Mullenweg has made it hard to know where the open-source project ends and his commerce interests begin. When WPEngine refused to accept Mullenweg's demands, he blocked WPEngine customers from using WordPress.org for installing and upgrading plugins. And then it got worse from there.
It has all been quite messy, with lots of back-and-forth arguments, as well as lawsuits. A court ordered WordPress to restore WPEngine's access to WordPress.org. The Register summarized the latest on January 14th (https://www.theregister.com/2025/01/14/wordpress_leader_matthew_mullenweg_exiles/), and a new site collects articles about this drama (https://mullenweg.wtf/) .
Lots of people are following this story, from WordPress developers, to businesses that depend on WordPress, to WPEngine customers, to business executives who use open-source software and are a bit nervous about whether this could happen to other packages they use. It has been mentioned a few times in the Pragmatic Engineer newsletter (https://newsletter.pragmaticengineer.com/), as well. It has certainly raised about open-source project governance, and the nature of profit and competition in the open-source world.
Data and six questions
This week's data is a bit different than our usual fare. I thought that it would be interesting to find statistics on WordPress usage, and how it has grown and changed over the years. However, I couldn't find any public data sets with that information.
But WordPress is an open-source project, which means (by definition) that the development process is done in public. This week, we'll thus examine the Git repository used in WordPress development. We'll do this by looking at two CSV files I created based on the Git logs. The first file contains all of the commits in the trunk
branch for the WordPress project. The second file contains the commits, along with the number of lines added to and/or removed from the project in each commit.
The data this week comes from the Git logs of the WordPress project. WordPress is officially developed on its own servers, but there is a mirror of their repositories on GitHub at https://github.com/WordPress/wordpress-develop . I cloned this repository and then created two CSV files based on it -- one with each of the commits in the trunk
branch, and another with the number of lines added and removed in each commit.
Here are the files:
I have six tasks and questions for you this week. Learning goals include grouping, pivot tables, plotting, and joining.
I'll be back tomorrow with my solutions and explanations, as well as the Jupyter notebook I used to solve these problems.
- Import the
wordpress-gitlog.csv
file into a data frame. Make sure thedate
column has adatetime
type. From theemail
column, create two new columns,email_user
andemail_domain
, from the parts before and after the@
sign in the e-mail address. From thesubject
column, create acategory
column containing the category from before the first:
character. - Read the
wordpress-numstats.csv
file into a data frame. The three columns are the commit ID (SHA-1), the number of lines added in that commit, and the number of lines removed. Join the latter two columns into the main data frame.