Creating Choropleth Visualizations with Altair
By Mark Sussman
I recently completed the Data Science Graduation Certificate program at Georgetown University where I led a team Capstone project that tried to determine if the District of Columbia's new dockless bikeshare pilot is impacting demand it's traditional bikeshare system, Capital Bikeshare. You can learn more about my team's Capstone on our Github.
Being able to visualize data geographically was paramount for my team's Capstone project and after being introduced to Altair in the program's visualization course, I wanted to explore the GIS capabilites of this lesser known, but up and coming Python visualization package.
In addition to Altair's excellent documentation There are a good number of general Altair walkthroughs that I highly recommend you browse prior to this walkthrough, as I will be focusing soley on how to build Choropleth maps from scratch in Altair. Below are some recommended Altair tutorials:
In addition to Altair, which I recommend you follow the installation instructions here, we'll be using the packages below for this demonstration.
Requests - In order to pull the DC geopolitical GeoJSON from the Open Data DC website.
Pandas - In order to read in the cleaned DC population data that we'll add as the choropleth layer on our map
Geopandas - In order to join the DC population and GeoJSON data together
JSON - In order to convert the Geopandas dataframe into a JSON, which is required by Altair. More context on Altair Geopandas incompatibility can be found here.
A full requirements file is located on my GitHub here.
import altair as alt import requests import pandas as pd import geopandas as gpd import json
In order to render Altair plots in Jupyter Notebook, you must enable the "alt.renderers.enable('notebook')."
If you're interested in saving your maps or any other plot as a JPEG, it's highly recommended that you also enable the 'opaque' theme, as the default Altair theme is transparent. The defauly theme will cause a your JPEG to have a checkered background.
Note that while Jupyter notebook is fully supported by Altair, the developers recommend using Jupyterlab for a better experience.
alt.renderers.enable('notebook') alt.themes.enable('opaque') ThemeRegistry.enable('opaque')
Create Baselayer of Map from GeoJSON
For this demonstration, we'll leverage the GeoJSON for the DC's Advisory Neighborhood Commission district provided by Open Data DC. There is a more recent version of this GeoJSON, but the 2000 and 2010 Census population data we'll be adding later is based on this iteration of the ANC geopolitical districts.
First, we use the "download_json" function to download the ANC GeoJSON from the opendata website. We can then create the base layer of our map by passing the GeoJSON directly to Altair and marking the geoshape accordingly as shown in the "gen_base" function.
def download_json(): '''Downloads ANC JSON from Open Data DC''' url = "https://opendata.arcgis.com/datasets/bfe6977cfd574c2b894cd67cf6a787c3_2.geojson" resp = requests.get(url) return resp.json() def gen_base(geojson): '''Generates baselayer of DC ANC map''' base = alt.Chart(alt.Data(values=geojson)).mark_geoshape( stroke='black', strokeWidth=1 ).encode( ).properties( width=400, height=400 ) return base anc_json = download_json() base_layer = gen_base(geojson=anc_json) base_layer
Convert to Geopandas Dataframe
Next, we'll convert the GeoJSON used to create the base layer of the map to a Geopandas dataframe in order to join on the ANC specific population data and make some additional data manipulations. Geopandas dataframes function almost exactly like standard Pandas dataframe, except they have additional functionality for geographic geometry like points and polygons.
# Convert GeoJSON to Geopandas Dataframe gdf = gpd.GeoDataFrame.from_features((anc_json)) gdf.head()
Add Population Data to Geopandas Dataframe
Now that we have a Geopandas Dataframe, we can join on our 2000 and 2010 population data that comes from the DC Office of Planning. This data is in PDF format, which we could leverage here directly, but to simplifly the process I've provided a CSV of this data in my Github. We will read this CSV directly into a dataframe and join onto our Geopandas dataframe.
pop_df = pd.read_csv('../data/anc_population.csv') gdf = gdf.merge(pop_df, on='ANC_ID', how='inner') gdf.head()
Determine Center of Each ANC Polygon
For the next data preparation step, we'll calculate the centroid (center) coordinates of each ANC polygon in order later add centered ANC labels to each geographic ANC polygon. The Geopandas centroid method makes this calculation easy.
gdf['centroid_lon'] = gdf['geometry'].centroid.x gdf['centroid_lat'] = gdf['geometry'].centroid.y gdf.head()
Convert Geopandas Dataframe back to GeoJSON
Now that we have all the data we need to create my map, we can convert the Geopandas dataframe back to a GeoJSON and render the features from the GeoJSON into Altair.
choro_json = json.loads(gdf.to_json()) choro_data = alt.Data(values=choro_json['features'])
Add Choropleth and Label Layers to the Map
Having now compiled all the data we need into a GeoJSON, we can expand the "gen_base" function from before to add all three layers to my map:
With the ANC Population in 2000 choropleth map exactly how we want it, here are some finer items to focus on in the 'gen_map' function.
Color Scheme: The color scheme is explicitly defined 'bluegreen' as a parameter of the Scale method when encoding the Choropleth layer. Since Altaier is built on Vega, the available color schemes are predefined by what's available in Vega
Specifying Data Types: In the labels layer, the data typesare explicitly defined as quantitative ":Q" and ordinal ":O". This is necessary because we're passing a JSON, not a dataframe into the Altair Chart method, so the data types are cannot be communicated to Altair. Most Altair plots leverage a dataframe, so this step isn't generally necessary, but is a good habit to ensure altair is rendering the data as intended.
Adding Layers: In the return statement, we use the "+" to add the layers on top of each other, which highlights the elegant simplicity that separates Altair from other visualization packages.
def gen_map(geodata, color_column, title): '''Generates DC ANC map with population choropleth and ANC labels''' # Add Base Layer base = alt.Chart(geodata, title = title).mark_geoshape( stroke='black', strokeWidth=1 ).encode( ).properties( width=400, height=400 ) # Add Choropleth Layer choro = alt.Chart(geodata).mark_geoshape( fill='lightgray', stroke='black' ).encode( alt.Color(color_column, type='quantitative', scale=alt.Scale(scheme='bluegreen'), title = "DC Population") ) # Add Labels Layer labels = alt.Chart(geodata).mark_text(baseline='top' ).properties( width=400, height=400 ).encode( longitude='properties.centroid_lon:Q', latitude='properties.centroid_lat:Q', text='properties.ANC_ID:O', size=alt.value(8), opacity=alt.value(1) ) return base + choro + labels pop_2000_map = gen_map(geodata=choro_data, color_column='properties.pop_2000', title='2000') pop_2000_map
Add 2010 Population to Map
Lastly, we generate a second choropleth for 2010 population using the "gen_map" function and combine the two maps into one plot. In Altair, the "|" operator adds concatenates two plots horizontally and the "&" operator vertically. Again, highlighting Altair's ease of use.
By combining these two maps together, the color scale automatically adjusts to span both maps.
pop_2010_map = gen_map(geodata=choro_data, color_column='properties.pop_2010', title='2010') pop_2000_map | pop_2010_map
Conclusion: Observations on Final Maps
Now that we can view the two maps side-by-side, some trends jump out immediately:
ANCs 5A, 5B and 5C are both the largest ANCs by size and population, which is why they were cut up during the 2013 redistricting.
Migration to center of the district can be seen by concentration gains ANCs 1B, 5C, 6B, and 6C
Wards 7 and 8 are largely losing population in both absolute and relative terms
I hope this walkthrough has peaked your interest in Altair. You can find the notebook that this post is based on here
SUBSCRIBE TO THE DDL BLOG
Did you enjoy this post? Don't miss the next one!
Learn Data Science at Work!
On-site training for you and your co-workers on the latest data science, analytics, and machine learning methods and tools.
Need help with Data visualization?
Data visualization is a critical tool in your business’ strategy and operations. Schedule a free consultation to find out how we can help!