BW #101: Los Angeles Fires (solution)

BW #101: Los Angeles Fires (solution)

This week, we're looking at data about the wildfires raging in and near Los Angeles, California. The fires have caused astonishing harm to both people and property, and have been described as the largest natural disaster in the US history.

Our questions this week will use data about the LA fires, collected and published by NASA. They, along with the US Forest Service, run FIRMS (https://firms.modaps.eosdis.nasa.gov/usfs/), which provides information about wildfires from its EOS (Earth Observing System) network of satellites. EOS, as the name indicates, looks back at Earth, rather than out into space. Using a variety of sensors, we can learn where fires are taking place, and how hot they are burning. Moreover, the data is frequently updated, giving us information about the current California fires, not just historical data.

We'll use this data to examine and visualize the fires. Along the way, we'll get a chance to explore some ideas and techniques with GeoPandas.

Data and six questions

This week's main data, as I indicated above, comes from FIRMS. The data files are available at

https://firms.modaps.eosdis.nasa.gov/usfs/active_fire/

This page contains links to lots of different data files, in many different formats. We're going to use the 7-day VIIRS data from NOAA-20. You can download that from the above page, or from this link:

https://firms.modaps.eosdis.nasa.gov/data/active_fire/noaa-20-viirs-c2/csv/J1_VIIRS_C2_USA_contiguous_and_Hawaii_7d.csv

This is a CSV file containing much of the data we want. However, we'll also be using some data about Los Angeles and the surrounding counties. We will get that from the TIGER 2024 data, which includes everything that we need to work with counties:

https://www2.census.gov/geo/tiger/TIGER2023/COUNTY/tl_2023_us_county.zip

Because this is Census data, they don't use state names. Rather, they use "STATEFP" codes, which you can translate from here:

https://www2.census.gov/geo/docs/reference/codes2020/national_state2020.txt

The learning goals for this week mainly involve working with GeoPandas, including joining and plotting. But we'll also do some work with dates and times, non-geo joins, grouping, and pivot tables.

Here are my six tasks and questions. A link to the Jupyter notebook I used to solve these problems is at the bottom of this message.

Create a Pandas data frame from the VIRRS / NOAA-20 data that NASA provides. Include a date column, of dtype datetime, based on the acq_date and acq_time columns. The latter is in HHMM format, reflecting the time (GMT) at which the data was collected. Remove acq_date, acq_time, and satellite when you're done.

For starters, I'll load both Pandas as GeoPandas:

import pandas as pd
import geopandas 

Next, I want to create a Pandas data frame from the downloaded CSV file. We can do that with read_csv:

filename = 'J1_VIIRS_C2_USA_contiguous_and_Hawaii_7d.csv'
df = (
    pd
    .read_csv(filename)
)

This is fine, but we want to combine the acq_date and acq_time columns into a single date column. We can do that if we have a string column in a format that pd.to_datetime recognizes – or we can pass a format string (as specified in such places as https://www.strfti.me/) to give it a further hint.

The problem is that the acq_time column is seen as integers by read_csv. Moreover, it's supposed to be a four-digit time in HHMM format, but sometimes it's just HMM.

What we'll do is use assign to create a new date column. Its contents will be the result of invoking pd.to_datetime on a combination of acq_date and acq_time. We use a lambda expression here, because pd.to_datetime isn't a method, and thus needs to be invoked in another context.

However, we can't use acq_time, because it's an integer column. Instead, we'll use astype to turn it into a string column. We'll then use str.zfill to pad our string with leading zeroes, ensuring that we end up with four characters total.

The result will be a string in the format of YYYY-MM-DD HHMM. We can tell pd.to_datetime to use this format by passing the format keyword argument '%Y-%m-%d %H%M'. In other words, we end up with:

filename = 'J1_VIIRS_C2_USA_contiguous_and_Hawaii_7d.csv'
df = (
    pd
    .read_csv(filename)
    .assign(date = lambda df_: pd.to_datetime(
        df_['acq_date'] + ' ' + df_['acq_time'].astype(str).str.zfill(4),
        format='%Y-%m-%d %H%M'))
)

Following this, we invoke drop on the three columns that we can remove:

filename = 'J1_VIIRS_C2_USA_contiguous_and_Hawaii_7d.csv'
df = (
    pd
    .read_csv(filename)
    .assign(date = lambda df_: pd.to_datetime(
        df_['acq_date'] + ' ' + df_['acq_time'].astype(str).str.zfill(4),
        format='%Y-%m-%d %H%M'))
    .drop(columns=['acq_date', 'acq_time', 'satellite'])
)

The result is a data frame with 7,895 rows and 11 columns.

Create a GeoDataFrame based on the data in the regular Pandas data frame you created. Use the latitude and longitude columns to create the special geometry column. Use the EPSG:4326 coordinate reference system (CRS).

GeoPandas defines a subclass of DataFrame known as a GeoDataFrame. The big difference between the two is that everyGeoDataFrame has a special geometry column, which we can use to perform special geographic calculations. In all other ways, a GeoDataFrame is the same as a regular data frame.

To get a GeoDataFrame from what we've created in df, we need to invoke geopandas.GeoDataFrame, passing it df. But then we need to tell it how to define the geometry column. In this case, it's pretty simple – df has longitude and latitude columns, and GeoPandas has a special geopandas.points_from_xy function, designed for precisely these occasions.

We invoke the function on df, passing the keyword arguments that tell it what to use for longitude and latitude. We also have to indicate which coordinate reference system (CRS) we want to use; in this case, I asked you to choose EPSG:4326, which is often used in GPS systems (https://epsg.io/4326).

We create gdf, the GeoDataFrame, and it's just like df was before it – but now it has a geometry column, one with POINT objects that represent the location of where the satellite picked up information.