Western United States Fire History Map

Inspiration

Since the whole state of California is on fire as I write this, I wondered what sort of data existed for historical fires in the state and whether I could make a visualization that would make it easy to see what fires happened where and when. A bit of searching led me to this page where CALFIRE has data on fires from 1878 to 2019.

The Process

The first step was figuring out what data I had. The data itself comes as a .gdb, which is a GIS database and pretty easy to load in QGIS1. Looking at the data in the file, there are 3 layers, 2 dealing with (I think?) prescribed burns and one with fires. Inside the fire layer, there are a bunch of polygons for the perimeter of each fire and then a whole bunch of associated fields for each fire, including things like the reporting agency and the responsible agency and internal id numbers, but also things that are more interesting like the name of the fire, how big it is, and what caused it. Some of these columns are a little cryptic, but this page explains them all pretty well.

Now that I had the data, I needed to figure out how to visualize it. I decided to try MapBox, since what I had in mind should fit well inside their free tier and I was interested in seeing how it worked anyway. To get the data I wanted into MapBox, I exported the fire19_1 firep19_1 layer to a .geojson file, making sure to use WGS84 as the CRS for the export, since any downstream tools were likely going to expect that.

At this point, I was able to create a web page that displayed a map and then loaded the (quite large) .geojson directly and displayed it on the map. This worked, but there were a couple of downsides, namely the fact that the .geojson file would have to be hosted somewhere and it’s hundreds of megabytes (granted, it could be shrunk down a fair bit if need be). Looking at what MapBox offered, I noticed that MapBox allows you to create a custom map with arbitrary data that they will host for you. Perfect! The .geojson file was a bit large to upload directly, so I first converted it to an .mbtiles2 file with the tippecanoe tool from MapBox, like so3:

tippecanoe --force  -zg -y "YEAR_" -y "FIRE_NAME" -y "CAUSE" -y "GIS_ACRES" -y "ALARM_DATE" --use-attribute-for-id="OBJECTID" -o perimeters_with_attrs.mbtiles perimeters.geojson

The end result was a file that was only about 25 MB instead of nearly 300 MB. From there, it was pretty easy to use the online MapBox Studio tools to create a new map that contained the complete data I had downloaded from CALFIRE.

To use this data, I used MapBox GL JS to build a basic web page that allows the user to select a year range to view the fire data from. Clicking on any fire perimeter polygon will display data for that fire, clicking on a fire in the table will highlight it on the map.

Of course, once I showed this to a few friends, they started asking why I had only done California and not some other state as well. So I started looking for data for other states and then I stumbled upon the National Interagency Fire Center’s Perimeter History dataset, which cover’s the entire United States! The data from NIFC is pretty similar to the data from CALFIRE, but lacks a consistent start date and cause field, so I’ve omitted those for the non-Californian states. I also used QGIS to ensure that all features had a valid area, as some of the data in the NIFC dataset had an empty area field.

๐Ÿ‘‰๐Ÿ‘‰ CLICK HERE FOR THE MAP! ๐Ÿ‘ˆ๐Ÿ‘ˆ

Disclaimer

I have only put a little bit of work into cleaning the source data and some of it is pretty dirty, especially the data from NIFC. There are definitely duplicate entries, missing entries, misspelled entries, etc. So if you’re going to try and use this for anything serious, you should probably look directly at the source data.


  1. Underneath, it’s more or less just a SQLite database. The trick to opening it in QGIS is to set the source type to “Directory”. ↩︎

  2. mbtiles is a compressed format for vector tiles created by MapBox. ↩︎

  3. The -y arguments specify which columns from the input dataset to keep in the output dataset. ↩︎