Methods to make choropleths with totally different information buildings in Python
Choropleth maps are used to point out the variations in information over geographic areas (Inhabitants Training). I’ve used choropleths to point out the variety of accessible rental residences throughout ZIP Codes in New York Metropolis and to point out the variety of mortgage transactions per ZIP Code over a given interval. Python’s Folium library permits customers to construct a number of sorts of customized maps, together with choropleths, which you’ll be able to share as .html
information with exterior customers who have no idea the way to code.
U.S. authorities web sites usually have the geographic information information essential to create maps. NYC’s OpenData web site and the U.S. Census Bureau’s web site have geographic boundary information accessible in a number of datatypes. Python lets you load a number of filetypes, together with GeoJSON (.geojson
) information and shapefiles (.shp
). These information comprise the spatial boundaries of a given location.
Folium’s documentation for the folium.Choropleth()
technique states that the geo_data
parameter accepts GeoJSON geometries as a string to create the map, “URL, file path, or information (json, dict, geopandas, and many others) to your GeoJSON
geometries” (Folium documentation). Irrespective of how we load the file we should convert the geometry information to operate correctly with this technique. The key_on
parameter of this technique binds the info for every particular location (GeoJSON information) with the info for that location (i.e. inhabitants).
GeoJSON
GeoJSON information retailer geometric shapes, on this case the boundaries of a location, and its related attributes. For example, the code to load the GeoJSON file with the boundaries of NYC ZIP Codes (referenced above) is as follows:
# Code to open a .geojson file and retailer its contents in a variablewith open ('nyczipcodetabulationareas.geojson', 'r') as jsonFile:
nycmapdata = json.load(jsonFile)
The variable nycmapdata
accommodates a dictionary with a minimum of two keys, the place one of many keys is known as options
, this key’s holding a listing of dictionaries the place every dictionary represents a location. The excerpt of the primary GeoJSON construction with the primary location is beneath:
{'sort': 'FeatureCollection',
'options': [{'type': 'Feature',
'properties': {'OBJECTID': 1,
'postalCode': '11372',
'PO_NAME': 'Jackson Heights',
'STATE': 'NY',
'borough': 'Queens',
'ST_FIPS': '36',
'CTY_FIPS': '081',
'BLDGpostal': 0,
'@id': 'http://nyc.pediacities.com/Resource/PostalCode/11372',
'longitude': -73.883573184,
'latitude': 40.751662187},
'geometry': {'type': 'Polygon',
'coordinates': [[[-73.86942457284177, 40.74915687096788],
[-73.89143129977276, 40.74684466041932],
[-73.89507143240859, 40.746465470812154],
[-73.8961873786782, 40.74850942518088],
[-73.8958395418514, 40.74854687570604],
[-73.89525242774397, 40.748306609450246],
[-73.89654041085562, 40.75054199814359],
[-73.89579868613829, 40.75061972133262],
[-73.89652230661434, 40.75438879610903],
[-73.88164812188481, 40.75595161704187],
[-73.87221855882478, 40.75694324806748],
[-73.87167992356792, 40.75398717439604],
[-73.8720704651389, 40.753862007052064],
[-73.86942457284177, 40.74915687096788]]]}}, ... ]}
The key_on
parameter of the folium.Choropleth()
technique requires customers to reference the distinctive index key within the location dictionaries inside the GeoJSON file as a string:
key_on (string, default None) — Variable within the geo_data GeoJSON file to bind the info to. Should begin with ‘function’ and be in JavaScript objection notation. Ex: ‘function.id’ or ‘function.properties.statename’.
Within the above case the index key’s the ZIP Code, the info that associates with every location should even have a ZIP Code index key or column. The key_on
parameter for the above instance could be the next string:
‘function.properties.postalCode’
Observe: The primary portion of the string should all the time be the singular phrase function
, it isn’t plural just like the mum or dad dictionary holding the record of every particular person location dictionary.
The key_on
parameter is accessing the properties
key of every particular location. The properties
key itself is holding a dictionary with eleven keys, on this case the postalCode
key’s the index worth that may hyperlink the geometric form to no matter worth we want to plot.
GeoPandas
One other solution to load geographic information is to make use of Python’s GeoPandas library (hyperlink). This library is helpful when loading shapefiles, that are supplied on the U.S. Census’ web site (Cartographic Boundary Information — Shapefile). GeoPandas works equally to Pandas, solely it could retailer and carry out capabilities on geometric information. For example, the code to load the shapefile with the boundaries of all U.S. states is as follows:
# Utilizing GeoPandasimport geopandas as gpd
usmap_gdf = gpd.read_file('cb_2018_us_state_500k/cb_2018_us_state_500k.shp')
For those who have been to name the primary row’s (Mississippi) geometry column in Jupyter Pocket book you’ll see the next:
usmap_gdf[“geometry”].iloc[0]
Not like the contents of the GeoJSON dictionary, there is no such thing as a options
key with interior dictionaries to entry and there’s no properties
column. The key_on
parameter of the folium.Choropleth()
technique nonetheless requires the primary portion of the string to be function
, nonetheless as an alternative of referencing a GeoJSON’s location dictionaries this technique will probably be referencing columns in a GeoPandas dataframe. On this case the key_on
parameter will equal “function.properties.GEOID”
, the place GEOID
is the column that accommodates the distinctive state codes that may bind our information to the geographic boundary. The GEOID
column has main zeros, the California GEOID
is 06
. You may additionally use the STATEFP
column as an index, be sure you are in line with each the columns used, codecs, and information sorts.
Geographic information and the related information to plot might be saved as two separate variables or all collectively. It is very important maintain observe of the info sorts of the columns and to ensure the index (key_on
) column is similar for the geographic information and the related information for the placement.
I accessed the U.S. Census API’s American Group Survey (hyperlink) and Inhabitants Estimates and Projections (hyperlink) tables to acquire inhabitants and demographic information from 2019 to 2021. The top of the dataframe is as follows:
I saved the info as a .csv
file, in some circumstances it will change the datatypes of the columns; as an illustration strings might turn into numerical values. The datatypes when .data()
is known as are as follows:
One other vital factor to notice is that each one main zeros within the state
column don’t seem after loading the info body. This should be corrected; the id should match and be the identical information sort (i.e. it can’t be an integer in a single information body and a string in one other).
As mentioned above, Folium lets you create maps utilizing geographic datatypes, together with GeoJSON and GeoPandas. These datatypes have to be formatted to be used with the Folium library and it isn’t all the time intuitive (to me, a minimum of) why sure errors happen. The next examples describe the way to put together each the geographic information (on this case U.S. state boundaries) and related plotting information (the inhabitants of the states) to be used with the folium.Choropleth()
technique.
Methodology 1: With Pandas and GeoJSON, with out Specifying an ID Column
This technique most carefully resembles the documentation’s instance for choropleth maps. The strategy makes use of a GeoJSON file which accommodates the state boundaries information and a Pandas dataframe to create the map.
As I began with a GeoPandas file I might want to convert it to a GeoJSON file utilizing GeoPandas’ to_json()
technique. As a reminder the usmap_gdf GeoPandas dataframe seems like:
I then apply the .to_json()
technique and specify that we’re dropping the id
from the dataframe, if it exists:
usmap_json_no_id = usmap_gdf.to_json(drop_id=True)
Observe: usmap_json_no_id
is the variable holding the json string on this state of affairs
This technique returns a string, I formatted it so it could be simpler to learn and present as much as the primary set of coordinates beneath:
'{"sort": "FeatureCollection",
"options": [{"type": "Feature",
"properties": {"AFFGEOID": "0400000US28",
"ALAND": 121533519481,
"AWATER": 3926919758,
"GEOID": 28,
"LSAD": "00",
"NAME": "Mississippi",
"STATEFP": "28",
"STATENS": "01779790",
"STUSPS": "MS"},
"geometry": {"type": "MultiPolygon",
"coordinates": [[[[-88.502966, 30.215235]'
Observe: The “properties” dictionary has no key known as “id”
Now we’re prepared to attach the newly created JSON variable with the US Census dataframe obtained in a earlier part, the top of which is beneath:
Utilizing folium’s Choropleth()
technique, we create the map object:
The geo_data
parameter is about to the newly created usmap_json_no_id
variable and the information
parameter is about to the all_states_census_df dataframe. As no id was specified when creating the GeoJSON variable the key_on
parameter should reference a particular key from the geodata, and that it really works like a dictionary (‘GEOID’ is a price of the ‘properties’ key). On this case the GEOID
key holds the state code which connects the state geometric boundary information to the corresponding US Census information within the all_states_census_df dataframe. The choropleth is beneath:
Methodology 2: With Pandas and GeoJSON, and Specifying an ID Column
This course of is sort of precisely the identical as above besides an index will probably be used previous to calling the .to_json()
technique.
Theusmap_gdf
dataframe didn’t have an index within the above instance, to appropriate this I’ll set the index to the GEOID
column after which instantly name the .to_json()
technique:
usmap_json_with_id = usmap_gdf.set_index(keys = “GEOID”).to_json()
The ensuing string, up till the primary pair of coordinates for the primary state’s information, is beneath:
'{"sort": "FeatureCollection",
"options": [{"id": "28",
"type": "Feature",
"properties": {"AFFGEOID": "0400000US28",
"ALAND": 121533519481,
"AWATER": 3926919758,
"LSAD": "00",
"NAME": "Mississippi",
"STATEFP": "28",
"STATENS": "01779790",
"STUSPS": "MS"},
"geometry": {"type": "MultiPolygon",
"coordinates": [[[[-88.502966, 30.215235],'
The “properties” dictionary not has the GEOID
key as a result of it’s now saved as a brand new key known as id
within the outer dictionary. You also needs to word that the id
worth is now a string as an alternative of an integer. As talked about beforehand, you’ll have to be sure that the info sorts of the connecting information are constant. This may turn into tedious if main and trailing zeroes are concerned. To repair this concern I create a brand new column known as state_str
from the state
column within the all_states_census_df
:
all_states_census_df[“state_str”]=all_states_census_df[“state”].astype(“str”)
Now we are able to create the choropleth:
The distinction between this code and the code used beforehand is that the key_on
parameter references id
and never properties.GEOID
. The ensuing map is strictly the identical as in technique 1:
Methodology 3: With Pandas and GeoPandas’ Python Function Assortment
This technique creates a GeoJSON like object (python function assortment) from the the unique GeoPandas dataframe with the __geo_interface__
property.
I set the index of the usmap_gdf
dataframe (US geographic information) to the STATEFP
column, which shops the state ids, with main zeroes, as a string:
usmap_gdf.set_index(“STATEFP”, inplace = True)
I then created an identical column within the all_states_census_df
dataframe (US Census information) by including one main zero:
all_states_census_df[“state_str”] = all_states_census_df[“state”].astype(“str”).apply(lambda x: x.zfill(2))
Lastly, I used the __geo_interface__
property of the us_data_gdf
GeoPandas dataframe to get a python function assortment of geometric state boundaries, saved as a dictionary, just like those from the primary two strategies:
us_geo_json = gpd.GeoSeries(information = usmap_gdf[“geometry”]).__geo_interface__
An excerpt of the us_geo_json
variable is beneath:
{'sort': 'FeatureCollection',
'options': [{'id': '28',
'type': 'Feature',
'properties': {},
'geometry': {'type': 'MultiPolygon',
'coordinates': [(((-88.502966, 30.215235), ...))]
Lastly, we create the choropleth:
The map seems the identical as those from above, so I excluded it.
Methodology 4: With Geopandas’ Geometry Sort Column
Right here we persist with GeoPandas. I created a GeoPandas dataframe known as us_data_gdf
which mixes the geometric information and the census information in a single variable:
us_data_gdf = pd.merge(left = usmap_gdf,
proper = all_states_census_df,
how = "left",
left_on = ["GEOID", "NAME"],
right_on = ["state", "NAME"]
)
Observe: all_states_census_df is a pandas dataframe of US Census information and usmap_gdf is a GeoPandas dataframe storing state geometric boundary information.
The code to create a choropleth with a GeoPandas dataframe is beneath:
Within the above instance the geo_data
parameter and the information
parameter each reference the identical GeoPandas dataframe as the knowledge is saved in a single place. As I didn’t set an index the key_on
parameter equals “function.properties.GEOID”
. Even with GeoPandas folium requires the key_on
parameter to behave as whether it is referencing a dictionary like object.
As earlier than, the map seems the identical as those from above, so I excluded it.
Methodology 5: With Geopandas Geometry Sort and Branca
Right here we create a extra fashionable map utilizing the Branca library and folium’s examples with it. Step one with Branca, except for putting in it, is to create a ColorMap
object:
colormap = branca.colormap.LinearColormap(
vmin=us_data_gdf["Total_Pop_2021"].quantile(0.0),
vmax=us_data_gdf["Total_Pop_2021"].quantile(1),
colours=["red", "orange", "lightblue", "green", "darkgreen"],
caption="Whole Inhabitants By State",
)
Within the above code we entry the branca.colormap.LinearColormap
class. Right here we are able to set the colours we use and what values to make use of for the colour scale. For this choropleth I would like the colours to scale proportionally to the bottom and highest inhabitants values within the US Census information. To set these values I exploit the vmin
and vmax
parameters as above. If I neglect to do that then the areas with no values will probably be thought-about within the shade scale, the outcomes with out these set parameters are beneath:
As soon as the ColorMap
object is created we are able to create a choropleth (the total code is beneath):
I tailored the examples on folium’s web site to make use of the us_data_gdf
GeoPandas dataframe. The instance permits us to exclude parts (seem clear) of the geographic information which shouldn’t have related census information (if the inhabitants for a state was null then the colour on the choropleth could be black except it was excluded). The ensuing choropleth is beneath:
Branca is customizable however the explanations of the way to use it are few and much between. The ReadMe for its repository states:
There’s no documentation, however you’ll be able to browse the examples gallery.
It’s a must to apply utilizing it to make the sort of map you need.
Folium can be utilized to make informative maps, like choropleths, for these with and with out coding information. Authorities web sites usually have the geographic information essential to create location boundaries on your information which will also be obtained from authorities websites. It is very important perceive your datatypes and filetypes as this may result in pointless frustration. These maps are extremely customizable, as an illustration you’ll be able to add tooltips to annotate your map. It takes apply to utilize this library’s full potential.
My repository for this text might be discovered right here. Completely happy coding.