Creating Choropleth Visualizations with Altair

By Mark Sussman

I recently completed the Data Science Graduation Certificate program at Georgetown University where I led a team Capstone project that tried to determine if the District of Columbia's new dockless bikeshare pilot is impacting demand it's traditional bikeshare system, Capital Bikeshare. You can learn more about my team's Capstone on our Github.

Being able to visualize data geographically was paramount for my team's Capstone project and after being introduced to Altair in the program's visualization course, I wanted to explore the GIS capabilites of this lesser known, but up and coming Python visualization package.

In addition to Altair's excellent documentation There are a good number of general Altair walkthroughs that I highly recommend you browse prior to this walkthrough, as I will be focusing soley on how to build Choropleth maps from scratch in Altair. Below are some recommended Altair tutorials:

Required Packages

In addition to Altair, which I recommend you follow the installation instructions here, we'll be using the packages below for this demonstration.

  • Requests - In order to pull the DC geopolitical GeoJSON from the Open Data DC website.

  • Pandas - In order to read in the cleaned DC population data that we'll add as the choropleth layer on our map

  • Geopandas - In order to join the DC population and GeoJSON data together

  • JSON - In order to convert the Geopandas dataframe into a JSON, which is required by Altair. More context on Altair Geopandas incompatibility can be found here.

A full requirements file is located on my GitHub here.

import altair as alt
import requests
import pandas as pd
import geopandas as gpd
import json

Altair Settings

In order to render Altair plots in Jupyter Notebook, you must enable the "alt.renderers.enable('notebook')."

If you're interested in saving your maps or any other plot as a JPEG, it's highly recommended that you also enable the 'opaque' theme, as the default Altair theme is transparent. The defauly theme will cause a your JPEG to have a checkered background.

Note that while Jupyter notebook is fully supported by Altair, the developers recommend using Jupyterlab for a better experience.

alt.renderers.enable('notebook')
alt.themes.enable('opaque')

ThemeRegistry.enable('opaque')

Create Baselayer of Map from GeoJSON

For this demostration, we'll leverage the GeoJSON for the DC's Advisory Neighborhood Commision district provided by Open Data DC. There is a more recent version of this GeoJSON, but the 2000 and 2010 Census population data we'll be adding later is based on this interation of the ANC geopolitical districts.

First, we use the "download_json" function to download the ANC GeoJSON from the opendata website. We can then create the base layer of our map by passing the GeoJSON directly to Altair and marking the geoshape accordingly as shown in the "gen_base" function.

def download_json():
    '''Downloads ANC JSON from Open Data DC'''
    url = "https://opendata.arcgis.com/datasets/bfe6977cfd574c2b894cd67cf6a787c3_2.geojson"
    resp = requests.get(url)
    return resp.json()

def gen_base(geojson):
    '''Generates baselayer of DC ANC map'''
    base = alt.Chart(alt.Data(values=geojson)).mark_geoshape(
        stroke='black',
        strokeWidth=1
    ).encode(
    ).properties(
        width=400,
        height=400
    )
    return base

anc_json = download_json()
base_layer = gen_base(geojson=anc_json)
base_layer

 District Data Labs,   data science companies near me, data science consulting services, data science consulting firms, how to use data to improve business, how to use google analytics data for business, how to use data to grow your business, how to use big data, companies using data science, benefits of data analytics in business, how do businesses use big data, how data science helps business, how data science is used in business, data driven decision making, how can data science help a business, how to use data analytics to grow your business, business value of data science, data science for small business, custom analytics consulting, data analytics consultancy, top analytics consulting firms, data analysis consulting, small business analytics consulting, analytics consulting companies, data monitoring services, automate manual processes, business process automation companies, corporate analytics training, predictive analytics, machine learning course, data analytics corporate training

Convert to Geopandas Dataframe

Next, we'll convert the GeoJSON used to create the base layer of the map to a Geopandas dataframe in order to join on the ANC specific population data and make some additional data manipulations. Geopandas dataframes function almost exactly like standard Pandas dataframe, except they have additional functionality for geographic geometry like points and polygons.

# Convert GeoJSON to Geopandas Dataframe 
gdf = gpd.GeoDataFrame.from_features((anc_json))
gdf.head()

Geopandas Dataframe

Add Population Data to Geopandas Dataframe

Now that we have a Geopandas Dataframe, we can join on our 2000 and 2010 population data that comes from the DC Office of Planning. This data is in PDF format, which we could leverage here directly, but to simplifly the process I've provided a CSV of this data in my Github. We will read this CSV directly into a dataframe and join onto our Geopandas dataframe.

pop_df = pd.read_csv('../data/anc_population.csv')
gdf = gdf.merge(pop_df, on='ANC_ID', how='inner')
gdf.head()

Population Data

Determine Center of Each ANC Polygon

For the next data preparation step, we'll calculate the centroid (center) coordinates of each ANC polygon in order later add centered ANC labels to each geographic ANC polygon. The Geopandas centroid method makes this calculation easy.

gdf['centroid_lon'] = gdf['geometry'].centroid.x
gdf['centroid_lat'] = gdf['geometry'].centroid.y
gdf.head()

ANC Polygon Centroids

Convert Geopandas Dataframe back to GeoJSON

Now that we have all the data we need to create my map, we can convert the Geopandas dataframe back to a GeoJSON and render the features from the GeoJSON into Altair.

choro_json = json.loads(gdf.to_json())
choro_data = alt.Data(values=choro_json['features'])

Add Choropleth and Label Layers to the Map

Having now compiled all the data we need into a GeoJSON, we can expand the "gen_base" function from before to add all three layers to my map:

  1. Base

  2. Choropeth

  3. ANC Labels

With the ANC Population in 2000 choropleth map exactly how we want it, here are some finer items to focus on in the 'gen_map' function.

  • Color Scheme: The color scheme is explicitly defined 'bluegreen' as a parameter of the Scale method when encoding the Choropleth layer. Since Altaier is built on Vega, the available color schemes are predefined by what's available in Vega

  • Specifying Data Types: In the labels layer, the data typesare explicitly defined as quantitative ":Q" and ordinal ":O". This is necessary because we're passing a JSON, not a dataframe into the Altair Chart method, so the data types are cannot be communicated to Altair. Most Altair plots leverage a dataframe, so this step isn't generally necessary, but is a good habit to ensure altair is rendering the data as intended.

  • Adding Layers: In the return statement, we use the "+" to add the layers on top of each other, which highlights the elegant simplicity that separates Altair from other visualization packages.

def gen_map(geodata, color_column, title):
    '''Generates DC ANC map with population choropleth and ANC labels'''
    # Add Base Layer
    base = alt.Chart(geodata, title = title).mark_geoshape(
        stroke='black',
        strokeWidth=1
    ).encode(
    ).properties(
        width=400,
        height=400
    )
    # Add Choropleth Layer
    choro = alt.Chart(geodata).mark_geoshape(
        fill='lightgray',
        stroke='black'
    ).encode(
        alt.Color(color_column, 
                  type='quantitative', 
                  scale=alt.Scale(scheme='bluegreen'),
                  title = "DC Population")
    )
    # Add Labels Layer
    labels = alt.Chart(geodata).mark_text(baseline='top'
     ).properties(
        width=400,
        height=400
     ).encode(
         longitude='properties.centroid_lon:Q',
         latitude='properties.centroid_lat:Q',
         text='properties.ANC_ID:O',
         size=alt.value(8),
         opacity=alt.value(1)
     )

    return base + choro + labels

pop_2000_map = gen_map(geodata=choro_data, color_column='properties.pop_2000', title='2000')
pop_2000_map

 District Data Labs,   data science companies near me, data science consulting services, data science consulting firms, how to use data to improve business, how to use google analytics data for business, how to use data to grow your business, how to use big data, companies using data science, benefits of data analytics in business, how do businesses use big data, how data science helps business, how data science is used in business, data driven decision making, how can data science help a business, how to use data analytics to grow your business, business value of data science, data science for small business, custom analytics consulting, data analytics consultancy, top analytics consulting firms, data analysis consulting, small business analytics consulting, analytics consulting companies, data monitoring services, automate manual processes, business process automation companies, corporate analytics training, predictive analytics, machine learning course, data analytics corporate training

Add 2010 Population to Map

Lastly, we generate a second choropleth for 2010 population using the "gen_map" function and combine the two maps into one plot. In Altair, the "|" operator adds concatenates two plots horizontally and the "&" operator vertically. Again, highlighting Altair's ease of use.

By combining these two maps together, the color scale automatically adjusts to span both maps.

pop_2010_map = gen_map(geodata=choro_data, color_column='properties.pop_2010', title='2010')
pop_2000_map | pop_2010_map

 District Data Labs,   data science companies near me, data science consulting services, data science consulting firms, how to use data to improve business, how to use google analytics data for business, how to use data to grow your business, how to use big data, companies using data science, benefits of data analytics in business, how do businesses use big data, how data science helps business, how data science is used in business, data driven decision making, how can data science help a business, how to use data analytics to grow your business, business value of data science, data science for small business, custom analytics consulting, data analytics consultancy, top analytics consulting firms, data analysis consulting, small business analytics consulting, analytics consulting companies, data monitoring services, automate manual processes, business process automation companies, corporate analytics training, predictive analytics, machine learning course, data analytics corporate training

Conclusion: Observations on Final Maps

Now that we can view the two maps side-by-side, some trends jump out immediately:

  • ANCs 5A, 5B and 5C are both the largest ANCs by size and population, which is why they were cut up during the 2013 redistricting.

  • Migration to center of the district can be seen by concentration gains ANCs 1B, 5C, 6B, and 6C

  • Wards 7 and 8 are largely losing population in both absolute and relative terms

I hope this walkthrough has peaked your interest in Altair. You can find the notebook that this post is based on here


District Data Labs provides data science consulting and corporate training services. We work with companies and teams of all sizes, helping them make their operations more data-driven and enhancing the analytical abilities of their employees. Interested in working with us? Let us know!


 

SUBSCRIBE TO THE DDL BLOG

Did you enjoy this post? Don't miss the next one!