Thursday, June 16, 2022
HomeData ScienceCreating Choropleth Maps with Python’s Folium Library | by Alex Mitrani |...

Creating Choropleth Maps with Python’s Folium Library | by Alex Mitrani | Jun, 2022


Methods to make choropleths with totally different information buildings in Python

A choropleth of accessible rental residences in NYC, April 2019 (GitHub)

Choropleth maps are used to point out the variations in information over geographic areas (Inhabitants Training). I’ve used choropleths to point out the variety of accessible rental residences throughout ZIP Codes in New York Metropolis and to point out the variety of mortgage transactions per ZIP Code over a given interval. Python’s Folium library permits customers to construct a number of sorts of customized maps, together with choropleths, which you’ll be able to share as .html information with exterior customers who have no idea the way to code.

U.S. authorities web sites usually have the geographic information information essential to create maps. NYC’s OpenData web site and the U.S. Census Bureau’s web site have geographic boundary information accessible in a number of datatypes. Python lets you load a number of filetypes, together with GeoJSON (.geojson) information and shapefiles (.shp). These information comprise the spatial boundaries of a given location.

Folium’s documentation for the folium.Choropleth() technique states that the geo_data parameter accepts GeoJSON geometries as a string to create the map, “URL, file path, or information (json, dict, geopandas, and many others) to your GeoJSON
geometries” (Folium documentation). Irrespective of how we load the file we should convert the geometry information to operate correctly with this technique. The key_on parameter of this technique binds the info for every particular location (GeoJSON information) with the info for that location (i.e. inhabitants).

GeoJSON

GeoJSON information retailer geometric shapes, on this case the boundaries of a location, and its related attributes. For example, the code to load the GeoJSON file with the boundaries of NYC ZIP Codes (referenced above) is as follows:

# Code to open a .geojson file and retailer its contents in a variablewith open ('nyczipcodetabulationareas.geojson', 'r') as jsonFile:
nycmapdata = json.load(jsonFile)

The variable nycmapdata accommodates a dictionary with a minimum of two keys, the place one of many keys is known as options, this key’s holding a listing of dictionaries the place every dictionary represents a location. The excerpt of the primary GeoJSON construction with the primary location is beneath:

{'sort': 'FeatureCollection',
'options': [{'type': 'Feature',
'properties': {'OBJECTID': 1,
'postalCode': '11372',
'PO_NAME': 'Jackson Heights',
'STATE': 'NY',
'borough': 'Queens',
'ST_FIPS': '36',
'CTY_FIPS': '081',
'BLDGpostal': 0,
'@id': 'http://nyc.pediacities.com/Resource/PostalCode/11372',
'longitude': -73.883573184,
'latitude': 40.751662187},
'geometry': {'type': 'Polygon',
'coordinates': [[[-73.86942457284177, 40.74915687096788],
[-73.89143129977276, 40.74684466041932],
[-73.89507143240859, 40.746465470812154],
[-73.8961873786782, 40.74850942518088],
[-73.8958395418514, 40.74854687570604],
[-73.89525242774397, 40.748306609450246],
[-73.89654041085562, 40.75054199814359],
[-73.89579868613829, 40.75061972133262],
[-73.89652230661434, 40.75438879610903],
[-73.88164812188481, 40.75595161704187],
[-73.87221855882478, 40.75694324806748],
[-73.87167992356792, 40.75398717439604],
[-73.8720704651389, 40.753862007052064],
[-73.86942457284177, 40.74915687096788]]]}}, ... ]}

The key_on parameter of the folium.Choropleth() technique requires customers to reference the distinctive index key within the location dictionaries inside the GeoJSON file as a string:

key_on (string, default None) — Variable within the geo_data GeoJSON file to bind the info to. Should begin with ‘function’ and be in JavaScript objection notation. Ex: ‘function.id’ or ‘function.properties.statename’.

Within the above case the index key’s the ZIP Code, the info that associates with every location should even have a ZIP Code index key or column. The key_on parameter for the above instance could be the next string:

‘function.properties.postalCode’

Observe: The primary portion of the string should all the time be the singular phrase function, it isn’t plural just like the mum or dad dictionary holding the record of every particular person location dictionary.

The key_on parameter is accessing the properties key of every particular location. The properties key itself is holding a dictionary with eleven keys, on this case the postalCode key’s the index worth that may hyperlink the geometric form to no matter worth we want to plot.

GeoPandas

One other solution to load geographic information is to make use of Python’s GeoPandas library (hyperlink). This library is helpful when loading shapefiles, that are supplied on the U.S. Census’ web site (Cartographic Boundary Information — Shapefile). GeoPandas works equally to Pandas, solely it could retailer and carry out capabilities on geometric information. For example, the code to load the shapefile with the boundaries of all U.S. states is as follows:

# Utilizing GeoPandasimport geopandas as gpd
usmap_gdf = gpd.read_file('cb_2018_us_state_500k/cb_2018_us_state_500k.shp')
The top of the usmap_gdf dataframe

For those who have been to name the primary row’s (Mississippi) geometry column in Jupyter Pocket book you’ll see the next:

usmap_gdf[“geometry”].iloc[0]

When a particular geometry worth is known as you see a geometrical picture as an alternative of the string representing the boundary of the form, the above is the geometry worth for the primary row (Mississippi)

Not like the contents of the GeoJSON dictionary, there is no such thing as a options key with interior dictionaries to entry and there’s no properties column. The key_on parameter of the folium.Choropleth() technique nonetheless requires the primary portion of the string to be function, nonetheless as an alternative of referencing a GeoJSON’s location dictionaries this technique will probably be referencing columns in a GeoPandas dataframe. On this case the key_on parameter will equal “function.properties.GEOID”, the place GEOID is the column that accommodates the distinctive state codes that may bind our information to the geographic boundary. The GEOID column has main zeros, the California GEOID is 06. You may additionally use the STATEFP column as an index, be sure you are in line with each the columns used, codecs, and information sorts.

Geographic information and the related information to plot might be saved as two separate variables or all collectively. It is very important maintain observe of the info sorts of the columns and to ensure the index (key_on) column is similar for the geographic information and the related information for the placement.

I accessed the U.S. Census API’s American Group Survey (hyperlink) and Inhabitants Estimates and Projections (hyperlink) tables to acquire inhabitants and demographic information from 2019 to 2021. The top of the dataframe is as follows:

The top of the U.S. Census dataframe

I saved the info as a .csv file, in some circumstances it will change the datatypes of the columns; as an illustration strings might turn into numerical values. The datatypes when .data() is known as are as follows:

The info sorts for the census information earlier than and after saving and loading the info body as a CSV file

One other vital factor to notice is that each one main zeros within the state column don’t seem after loading the info body. This should be corrected; the id should match and be the identical information sort (i.e. it can’t be an integer in a single information body and a string in one other).

As mentioned above, Folium lets you create maps utilizing geographic datatypes, together with GeoJSON and GeoPandas. These datatypes have to be formatted to be used with the Folium library and it isn’t all the time intuitive (to me, a minimum of) why sure errors happen. The next examples describe the way to put together each the geographic information (on this case U.S. state boundaries) and related plotting information (the inhabitants of the states) to be used with the folium.Choropleth() technique.

Methodology 1: With Pandas and GeoJSON, with out Specifying an ID Column

This technique most carefully resembles the documentation’s instance for choropleth maps. The strategy makes use of a GeoJSON file which accommodates the state boundaries information and a Pandas dataframe to create the map.

As I began with a GeoPandas file I might want to convert it to a GeoJSON file utilizing GeoPandas’ to_json() technique. As a reminder the usmap_gdf GeoPandas dataframe seems like:

The top of the usmap_gdf dataframe

I then apply the .to_json() technique and specify that we’re dropping the id from the dataframe, if it exists:

usmap_json_no_id = usmap_gdf.to_json(drop_id=True)

Observe: usmap_json_no_id is the variable holding the json string on this state of affairs

This technique returns a string, I formatted it so it could be simpler to learn and present as much as the primary set of coordinates beneath:

'{"sort": "FeatureCollection",
"options": [{"type": "Feature",
"properties": {"AFFGEOID": "0400000US28",
"ALAND": 121533519481,
"AWATER": 3926919758,
"GEOID": 28,
"LSAD": "00",
"NAME": "Mississippi",
"STATEFP": "28",
"STATENS": "01779790",
"STUSPS": "MS"},
"geometry": {"type": "MultiPolygon",
"coordinates": [[[[-88.502966, 30.215235]'

Observe: The “properties” dictionary has no key known as “id”

Now we’re prepared to attach the newly created JSON variable with the US Census dataframe obtained in a earlier part, the top of which is beneath:

The top of the U.S. Census dataframe, known as all_states_census_df beneath

Utilizing folium’s Choropleth() technique, we create the map object:

The code to create a Choropleth with a GeoJSON variable which doesn't specify an id

The geo_data parameter is about to the newly created usmap_json_no_id variable and the information parameter is about to the all_states_census_df dataframe. As no id was specified when creating the GeoJSON variable the key_on parameter should reference a particular key from the geodata, and that it really works like a dictionary (‘GEOID’ is a price of the ‘properties’ key). On this case the GEOID key holds the state code which connects the state geometric boundary information to the corresponding US Census information within the all_states_census_df dataframe. The choropleth is beneath:

The ensuing choropleth from the above technique

Methodology 2: With Pandas and GeoJSON, and Specifying an ID Column

This course of is sort of precisely the identical as above besides an index will probably be used previous to calling the .to_json() technique.

Theusmap_gdf dataframe didn’t have an index within the above instance, to appropriate this I’ll set the index to the GEOID column after which instantly name the .to_json() technique:

usmap_json_with_id = usmap_gdf.set_index(keys = “GEOID”).to_json()

The ensuing string, up till the primary pair of coordinates for the primary state’s information, is beneath:

'{"sort": "FeatureCollection",
"options": [{"id": "28",
"type": "Feature",
"properties": {"AFFGEOID": "0400000US28",
"ALAND": 121533519481,
"AWATER": 3926919758,
"LSAD": "00",
"NAME": "Mississippi",
"STATEFP": "28",
"STATENS": "01779790",
"STUSPS": "MS"},
"geometry": {"type": "MultiPolygon",
"coordinates": [[[[-88.502966, 30.215235],'

The “properties” dictionary not has the GEOID key as a result of it’s now saved as a brand new key known as id within the outer dictionary. You also needs to word that the id worth is now a string as an alternative of an integer. As talked about beforehand, you’ll have to be sure that the info sorts of the connecting information are constant. This may turn into tedious if main and trailing zeroes are concerned. To repair this concern I create a brand new column known as state_str from the state column within the all_states_census_df:

all_states_census_df[“state_str”]=all_states_census_df[“state”].astype(“str”)

Now we are able to create the choropleth:

The code to create a choropleth with a GeoJSON variable which specifies an id

The distinction between this code and the code used beforehand is that the key_on parameter references id and never properties.GEOID. The ensuing map is strictly the identical as in technique 1:

The ensuing Choropleth from the above technique

Methodology 3: With Pandas and GeoPandas’ Python Function Assortment

This technique creates a GeoJSON like object (python function assortment) from the the unique GeoPandas dataframe with the __geo_interface__ property.

I set the index of the usmap_gdf dataframe (US geographic information) to the STATEFP column, which shops the state ids, with main zeroes, as a string:

usmap_gdf.set_index(“STATEFP”, inplace = True)

I then created an identical column within the all_states_census_df dataframe (US Census information) by including one main zero:

all_states_census_df[“state_str”] = all_states_census_df[“state”].astype(“str”).apply(lambda x: x.zfill(2))

Lastly, I used the __geo_interface__ property of the us_data_gdf GeoPandas dataframe to get a python function assortment of geometric state boundaries, saved as a dictionary, just like those from the primary two strategies:

us_geo_json = gpd.GeoSeries(information = usmap_gdf[“geometry”]).__geo_interface__

An excerpt of the us_geo_json variable is beneath:

{'sort': 'FeatureCollection',
'options': [{'id': '28',
'type': 'Feature',
'properties': {},
'geometry': {'type': 'MultiPolygon',
'coordinates': [(((-88.502966, 30.215235), ...))]

Lastly, we create the choropleth:

The code to create a choropleth with a GeoPanda's __geo_interface__ property

The map seems the identical as those from above, so I excluded it.

Methodology 4: With Geopandas’ Geometry Sort Column

Right here we persist with GeoPandas. I created a GeoPandas dataframe known as us_data_gdf which mixes the geometric information and the census information in a single variable:

us_data_gdf = pd.merge(left = usmap_gdf,
proper = all_states_census_df,
how = "left",
left_on = ["GEOID", "NAME"],
right_on = ["state", "NAME"]
)

Observe: all_states_census_df is a pandas dataframe of US Census information and usmap_gdf is a GeoPandas dataframe storing state geometric boundary information.

The code to create a choropleth with a GeoPandas dataframe is beneath:

The code to create a choropleth utilizing a GeoPandas dataframe

Within the above instance the geo_data parameter and the information parameter each reference the identical GeoPandas dataframe as the knowledge is saved in a single place. As I didn’t set an index the key_on parameter equals “function.properties.GEOID”. Even with GeoPandas folium requires the key_on parameter to behave as whether it is referencing a dictionary like object.

As earlier than, the map seems the identical as those from above, so I excluded it.

Methodology 5: With Geopandas Geometry Sort and Branca

Right here we create a extra fashionable map utilizing the Branca library and folium’s examples with it. Step one with Branca, except for putting in it, is to create a ColorMap object:

colormap = branca.colormap.LinearColormap(
vmin=us_data_gdf["Total_Pop_2021"].quantile(0.0),
vmax=us_data_gdf["Total_Pop_2021"].quantile(1),
colours=["red", "orange", "lightblue", "green", "darkgreen"],
caption="Whole Inhabitants By State",
)

Within the above code we entry the branca.colormap.LinearColormap class. Right here we are able to set the colours we use and what values to make use of for the colour scale. For this choropleth I would like the colours to scale proportionally to the bottom and highest inhabitants values within the US Census information. To set these values I exploit the vmin and vmax parameters as above. If I neglect to do that then the areas with no values will probably be thought-about within the shade scale, the outcomes with out these set parameters are beneath:

A Branca choropleth with out the vmin and vmax parameters set

As soon as the ColorMap object is created we are able to create a choropleth (the total code is beneath):

Making a choropleth with a GeoPandas dataframe and the Branca library

I tailored the examples on folium’s web site to make use of the us_data_gdf GeoPandas dataframe. The instance permits us to exclude parts (seem clear) of the geographic information which shouldn’t have related census information (if the inhabitants for a state was null then the colour on the choropleth could be black except it was excluded). The ensuing choropleth is beneath:

A choropleth made with Branca and GeoPandas

Branca is customizable however the explanations of the way to use it are few and much between. The ReadMe for its repository states:

There’s no documentation, however you’ll be able to browse the examples gallery.

It’s a must to apply utilizing it to make the sort of map you need.

Folium can be utilized to make informative maps, like choropleths, for these with and with out coding information. Authorities web sites usually have the geographic information essential to create location boundaries on your information which will also be obtained from authorities websites. It is very important perceive your datatypes and filetypes as this may result in pointless frustration. These maps are extremely customizable, as an illustration you’ll be able to add tooltips to annotate your map. It takes apply to utilize this library’s full potential.

My repository for this text might be discovered right here. Completely happy coding.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments