The bblocks package

bblocks is a python package with tools to download and analyse development data. These tools are meant to be the building blocks of further analysis.

We have built bblocks to support our work at ONE, but we hope that it will be useful to others working with development data. We welcome feedback, feature requests, and collaboration.

bblocks is organised around the following main features:

Import tools to help with import data from:
- The World Bank (building on the wbgapi package)
- The IMF World Economic outlook (building on the weo package)
- The IMF data on Special Drawing Rights
- The World Food Programme (WFP) data on food security and inflation
- The FAO (notably the price index)
- The UNDP Human Development Report data
- UNAIDS
- The WHO Government Health Expenditure data
Cleaning tools to help with:
- Cleaning numbers/numeric series
- Transforming country identifiers (ISO2, ISO3, WB, UN, etc., building on the country_converter package)
- Transforming text to datetime objects, and datetime objects to text
- Formatting numbers as text (percentages, millions, billions, etc.)
Analysis tools to help with:
- Calculating period averages
- Calculating the change from one period to another
DataFrame tools to help with:
- Adding a population column to a DataFrame
- Adding a “share of population” / “per capita” column to a DataFrame
- Adding a population density column to a DataFrame
- Adding a GDP column to a DataFrame
- Adding a “share of GDP” column to a DataFrame
- Adding a poverty ratio column to a DataFrame
- Adding a government expenditure column to a DataFrame
- Adding a “share of government expenditure” column to a DataFrame
- Adding a “World Bank income level” column to a DataFrame
- Adding a column with short country names to a DataFrame
- Adding a column with ISO3 codes to a DataFrame
- Adding the median observation of a group
- Adding a column with geojson geometries to a DataFrame
Other tools like:
- Dictionaries mapping ISO3 codes (and vice-versa) to
  - OECD DAC codes
  - WB income groups
  - geojson geometries
  - G7, EU27, G20 countries
  - Income levels
  - Life expectancy
  - Population

More information is available:

Documentation: https://bblocks.readthedocs.io/
GitHub: https://github.com/ONECampaign/bblocks
PyPI: https://pypi.org/project/bblocks/

Installation

bblocks can be installed from using pip

pip install bblocks --upgrade

The package is compatible with Python 3.10 and above.

Basic usage

To get started, import the package. It is strongly recommended that you specify the path to the folder where you want to store the data.

You only have to do this once per file/notebook.

from bblocks import set_bblocks_data_path

# Set to the folder you want
set_bblocks_data_path("path/to/data/folder")

All the examples below assume that you have done this.

Importing data from the World Bank

from bblocks import WorldBankData

# create a WorldBankData object. This object will allow you
# to download indicators from the World Bank and get them as DataFrames
wb = WorldBankData()

# For example to get "primary completion rate" (SE.PRM.CMPT.ZS) from 2010 to 2020.
# If the data is not already in your data folder, it will be downloaded
wb.load_data(
    indicator="SE.PRM.CMPT.ZS",
    start_year=2010,
    end_year=2020
)

# Get the data as a DataFrame
df = wb.get_data()

# Print a sample of 10 rows
print(df.sample(10))

The above would return a DataFrame like this:

date	iso_code	indicator_code	value
2010-01-01	LMC	SE.PRM.CMPT.ZS	87.753189
2012-01-01	SWZ	SE.PRM.CMPT.ZS	84.697472
2013-01-01	NAM	SE.PRM.CMPT.ZS	93.020042
2012-01-01	PAK	SE.PRM.CMPT.ZS	63.486210
2015-01-01	LIC	SE.PRM.CMPT.ZS	63.463470
2016-01-01	BGD	SE.PRM.CMPT.ZS	NaN
2019-01-01	SYR	SE.PRM.CMPT.ZS	NaN
2013-01-01	NAC	SE.PRM.CMPT.ZS	99.025703
2011-01-01	AND	SE.PRM.CMPT.ZS	NaN
2013-01-01	GRL	SE.PRM.CMPT.ZS	NaN

You can also get the latest data (most recent non-empty observation) for one or more indicators:

from bblocks import WorldBankData

# create a WorldBankData object.
wb_data = WorldBankData()

# Load the indicators. If they are not downloaded, they will be
wb_data.load_data(
    indicator=["SH.XPD.CHEX.PC.CD", "SH.XPD.CHEX.GD.ZS"],
    most_recent_only=True
)

# Get the data as a DataFrame
df = wb_data.get_data(indicators="all")

# Print a sample of the data
print(df.sample(10))

This would return a DataFrame like this:

date	iso_code	indicator_code	value
2019-01-01	HRV	SH.XPD.CHEX.PC.CD	1040.085693
2019-01-01	ERI	SH.XPD.CHEX.GD.ZS	4.458767
2019-01-01	JAM	SH.XPD.CHEX.PC.CD	327.403534
2019-01-01	MYS	SH.XPD.CHEX.PC.CD	436.612030
2019-01-01	BHS	SH.XPD.CHEX.GD.ZS	5.749775
2015-01-01	YEM	SH.XPD.CHEX.PC.CD	73.176743
2019-01-01	PER	SH.XPD.CHEX.PC.CD	370.109955
2019-01-01	IDA	SH.XPD.CHEX.PC.CD	52.076285
2019-01-01	ERI	SH.XPD.CHEX.PC.CD	25.267935
2019-01-01	WLD	SH.XPD.CHEX.PC.CD	1115.008730

In all cases, if you had already downloaded the data and you want to update it you can call .update_data() after loading the data in order to refresh it.

wb_data.update_data(reload_data=True)

Importing data from UNAIDS

from bblocks import Aids

# create an Aids object. This object will allow you
# to download indicators from UNAIDS and get them as DataFrames
aids = Aids()

# To view all the indicators that can be downloaded using this tool
# you can use the `.available_indicators` property
aids.available_indicators

Her are the first 10 indicators, but over 50 are available:

	indicator	category
0	Trend of new HIV infections	Epidemic transition metrics
1	Trend of AIDS-related deaths	Epidemic transition metrics
2	Incidence:prevalence ratio	Epidemic transition metrics
3	Incidence:mortality ratio	Epidemic transition metrics
4	People living with HIV - All ages	People living with HIV
5	People living with HIV - Children (0-14)	People living with HIV
6	People living with HIV - Adolescents (10-19)	People living with HIV
7	People living with HIV - Young people (15-24)	People living with HIV
8	People living with HIV - Adults (15+)	People living with HIV
9	People living with HIV - Adults (15-49)	People living with HIV

# to load/download indicators, you can use the `.load_data` method
# you can also specify whether to download "country", "region", or "all"
aids.load_data(
    indicator="Trend of AIDS-related deaths",
    area_grouping="region"
)

# get the data as a DataFrame
df = aids.get_data()

# print a sample of 10 rows
print(df.sample(10))

area_name	area_id	year	indicator	dimension	value
Global	03M49WLD	2013	Trend of AIDS-related deaths	All ages estimate	1.061395e+06
Latin America	UNALA	2021	Trend of AIDS-related deaths	All ages estimate	2.916500e+04
Middle East and North Africa	UNAMENA	2018	Trend of AIDS-related deaths	All ages lower estimate	4.089657e+03
Western & Central Europe and North America	UNAWCENA	2019	Trend of AIDS-related deaths	All ages estimate	1.305140e+04
Caribbean	UNACAR	2021	Trend of AIDS-related deaths	All ages lower estimate	4.213485e+03
Middle East and North Africa	UNAMENA	2021	Trend of AIDS-related deaths	All ages upper estimate	6.867407e+03
Western & Central Europe and North America	UNAWCENA	2016	Trend of AIDS-related deaths	All ages upper estimate	1.771698e+04
Western & Central Europe and North America	UNAWCENA	2020	Trend of AIDS-related deaths	All ages upper estimate	1.632782e+04
Eastern Europe and Central Asia	UNAEECA	2017	Trend of AIDS-related deaths	All ages upper estimate	4.553729e+04
Latin America	UNALA	2020	Trend of AIDS-related deaths	All ages upper estimate	4.577862e+04

As with other bblocks tools, you can also get multiple indicators at once (see the WorldBank example).

In all cases, if you had already downloaded the data and you want to update it you can call .update_data() after loading the data in order to refresh it.

aids.update_data(reload_data=True)

Importing SDR data from the IMF

# Import the SDR object from the sdr module of "import_tools"
from bblocks.import_tools.sdr import SDR

# Create an SDR object
sdr = SDR()

# To view the latest date for which data is available,
# call the `.latest_date()` method
sdr.latest_date()

# To download the latest data
sdr.load_data(date="latest")

# To get the data as a DataFrame. You can specify getting a 
# specific indicator by using 'indicator'. In this case,
# we'll get holdings (allocations are also available)
df = sdr.get_data(indicator="holdings")

# Print a sample of 10 rows
print(df.sample(10))

entity	indicator	value	date
Samoa	holdings	1.584296e+07	2023-01-31
Iraq	holdings	3.301367e+07	2023-01-31
Lao People's Democratic Republic	holdings	5.870183e+07	2023-01-31
Haiti	holdings	9.169516e+07	2023-01-31
Bahamas, The	holdings	1.245326e+08	2023-01-31
Total	holdings	6.606989e+11	2023-01-31
Libya	holdings	3.187335e+09	2023-01-31
Namibia	holdings	1.783556e+08	2023-01-31
Tajikistan, Republic of	holdings	1.891507e+08	2023-01-31
Malta	holdings	2.499760e+08	2023-01-31

In all cases, if you had already downloaded the data and you want to update it you can call .update_data() after loading the data in order to refresh it.

sdr.update_data(reload_data=True)

Adding World Bank income levels to a DataFrame

For this example, we will continue using the SDR data as above.

from bblocks import add_income_level_column

# We can add the column by passing the dataframe to the function

df = add_income_level_column(
    df=df,
    id_column="entity",
    id_type="regex",  # so the text can be matched to the right country
)

Which adds the income level column:

entity	indicator	value	date	income_level
Montenegro, Republic of	holdings	7.404593e+07	2023-01-31	Upper middle income
Gambia, The	holdings	5.857020e+07	2023-01-31	Low income
Suriname	holdings	1.211070e+08	2023-01-31	Upper middle income
Syrian Arab Republic	holdings	5.636629e+08	2023-01-31	Low income
Iran, Islamic Republic of	holdings	4.976198e+09	2023-01-31	Lower middle income
Uruguay	holdings	6.330507e+08	2023-01-31	High income
South Africa	holdings	4.424154e+09	2023-01-31	Upper middle income
Nigeria	holdings	3.755370e+09	2023-01-31	Lower middle income
Dominican Republic	holdings	4.498683e+08	2023-01-31	Upper middle income
Trinidad and Tobago	holdings	7.722810e+08	2023-01-31	High income

An optional argument can be passed to the function to redownload the income classification data from the World Bank.

df = add_income_level_column(
    df=df,
    id_column="entity",
    id_type="regex",
    update_data=True,
)

Adding a GDP share column to a DataFrame

For this example, we will continue working with data on military expenditure downloaded using the World Bank tool.

# First import the function from the `add` module of `dataframe_tools`
from bblocks.dataframe_tools.add import add_gdp_share_column
from bblocks import WorldBankData

# this data is in local currency units
df = WorldBankData().load_data(indicator="MS.MIL.XPND.CN", most_recent_only=True).get_data()

date	iso_code	indicator_code	value
2021-01-01	BDI	MS.MIL.XPND.CN	1.351000e+11
2014-01-01	YEM	MS.MIL.XPND.CN	3.685000e+11
2021-01-01	AFG	MS.MIL.XPND.CN	2.304000e+10
2021-01-01	PER	MS.MIL.XPND.CN	9.086000e+09
2021-01-01	AUS	MS.MIL.XPND.CN	4.229595e+10

# Then call the function, passing the DataFrame and the column name
df = add_gdp_share_column(
    df=df,
    id_column="iso_code",
    id_type="ISO3",
    date_column="date",  # to match the gdp values with the year of the data
    value_column="value",
    decimals=1,
    usd=False,  # since the data is in local currency units
    include_estimates=True,  # to include official data and IMF estimates for GDP
)

print(df.sample(10))

Which returns a dataframe with an extra column “gdp_share”.

date	iso_code	indicator_code	value	gdp_share
2021-01-01	GIN	MS.MIL.XPND.CN	2.406750e+12	1.5
2014-01-01	ARE	MS.MIL.XPND.CN	8.356800e+10	5.6
2021-01-01	NGA	MS.MIL.XPND.CN	1.783120e+12	1.0
2021-01-01	GNQ	MS.MIL.XPND.CN	9.439700e+10	1.4
2021-01-01	ISL	MS.MIL.XPND.CN	0.000000e+00	0.0
2021-01-01	ESP	MS.MIL.XPND.CN	1.652680e+10	1.4
2021-01-01	BHR	MS.MIL.XPND.CN	5.194000e+08	3.6
2021-01-01	GEO	MS.MIL.XPND.CN	9.723000e+08	1.6
2021-01-01	MDA	MS.MIL.XPND.CN	9.144000e+08	0.4
2013-01-01	LAO	MS.MIL.XPND.CN	1.782500e+11	0.2

Cleaning a numeric series which contains numbers with text

Sometimes dataframes contain columns which don’t have clean text. For example, something like

	iso_code	value
0	USA	10%
1	GBR	+12%
2	FRA	13.4%
3	DEU	%14.3
4	ITA	15.3 %
5	ESP	16%
6	CAN	17%
7	JPN	18%
8	AUS	19%
9	CHN	20%

bblocks can help clean that data.

from bblocks import clean_numeric_series

df['value'] = clean_numeric_series(
    data=df['value'],
    to=float  # or if dealing with integers, use to=int
)

Returns a clean version of the data

	iso_code	value
0	USA	10.0
1	GBR	12.0
2	FRA	13.4
3	DEU	14.3
4	ITA	15.3
5	ESP	16.0
6	CAN	17.0
7	JPN	18.0
8	AUS	19.0
9	CHN	20.0

Contributing

Interested in contributing to the package? Please reach out.