bblocks.dataframe_tools.add

Functions

`__validate_add_column_params`(→ tuple)	Validate parameters to use in an add column function type
`add_population_column`(→ pandas.DataFrame)	Add population column to a dataframe
`add_poverty_ratio_column`(→ pandas.DataFrame)	Add poverty headcount column to a dataframe
`add_population_density_column`(→ pandas.DataFrame)	Add population density column to a dataframe
`add_gdp_column`(→ pandas.DataFrame)	Add GDP column to a dataframe
`add_gov_expenditure_column`(→ pandas.DataFrame)	Add Government Expenditure column to a dataframe
`add_gdp_share_column`(→ pandas.DataFrame)	Add value as share of GDP column to a dataframe
`add_population_share_column`(→ pandas.DataFrame)	Add population share column to a dataframe
`add_gov_exp_share_column`(→ pandas.DataFrame)	Add value as share of Government Expenditure column to a dataframe
`add_income_level_column`(→ pandas.DataFrame)	Add an income levels column to a dataframe
`add_short_names_column`(→ pandas.DataFrame)	Add short names column to a dataframe
`add_iso_codes_column`(→ pandas.DataFrame)	Add ISO3 column to a dataframe
`add_median_observation`(→ pandas.DataFrame)	Add median observation column to a dataframe
`add_flourish_geometries`(→ pandas.DataFrame)	Add flourish geometries column to a dataframe
`add_value_as_share`(→ pandas.DataFrame)

Module Contents

bblocks.dataframe_tools.add.__validate_add_column_params(*, df: pandas.DataFrame, id_column: str, id_type: str | None, date_column: str | None) → tuple: Validate parameters to use in an add column function type

bblocks.dataframe_tools.add.add_population_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'population', update_data: bool = False) → pandas.DataFrame

Add population column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
date_column – Optionally, a date column can be specified. If so, the population for that year will be used. If it’s missing, it will be missing in the returned column as well. If the _data isn’t specified, the most recent population _data from the world bank is used.
target_column – the column where the population _data will be stored.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the population _data.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_poverty_ratio_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'poverty_ratio', update_data: bool = False) → pandas.DataFrame

Add poverty headcount column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
date_column – Optionally, a date column can be specified. If so, the population for that year will be used. If it’s missing, it will be missing in the returned column as well. If the _data isn’t specified, the most recent _data is used.
target_column – the column where the population _data will be stored.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the poverty _data.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_population_density_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'population_density', update_data: bool = False) → pandas.DataFrame

Add population density column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
date_column – Optionally, a date column can be specified. If so, the population for that year will be used. If it’s missing, it will be missing in the returned column as well. If the _data isn’t specified, the most recent _data is used.
target_column – the column where the population _data will be stored.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the population: density _data.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_gdp_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'gdp', usd: bool = True, include_estimates: bool = False, update_data: bool = False) → pandas.DataFrame

Add GDP column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
date_column – Optionally, a date column can be specified. If so, the GDP for that year will be used. If it’s missing, it will be missing in the returned column as well. If the date isn’t specified, the most recent _data is used.
include_estimates – Whether to include years for which the WEO _data is labelled as estimates.
usd – Whether to add the _data as US dollars or Local Currency Units.
target_column – the column where the gdp _data will be stored.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the gdp _data from: the IMF World Economic Outlook.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_gov_expenditure_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'gov_exp', usd: bool = True, include_estimates: bool = False, update_data: bool = False) → pandas.DataFrame

Add Government Expenditure column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
date_column – Optionally, a date column can be specified. If so, the expenditure for that year will be used. If it’s missing, it will be missing in the returned column as well. If the date isn’t specified, the most recent _data is used.
include_estimates – Whether to include years for which the WEO _data is labelled as estimates.
usd – Whether to add the _data as US dollars or Local Currency Units.
target_column – the column where the expenditure _data will be stored.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the expenditure _data from: the IMF World Economic Outlook.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_gdp_share_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, value_column: str = 'value', target_column: str = 'gdp_share', decimals: int = 2, usd: bool = False, include_estimates: bool = False, update_data: bool = False) → pandas.DataFrame

Add value as share of GDP column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
date_column – Optionally, a date column can be specified. If so, the GDP for that year will be used. If it’s missing, it will be missing in the returned column as well. If the date isn’t specified, the most recent _data is used.
value_column – the column containing the value to be converted to a share of GDP.
decimals – the number of decimals to use in the returned column.
include_estimates – Whether to include years for which the WEO _data is labelled as estimates.
usd – Whether to add the data as US dollars or Local Currency Units.
target_column – the column where the gdp _data will be stored.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the _data as a share: of gdp _data, using the IMF World Economic Outlook.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_population_share_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, value_column: str = 'value', target_column: str = 'population_share', decimals: int = 2, update_data: bool = False) → pandas.DataFrame

Add population share column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
date_column – Optionally, a date column can be specified. If so, the population for that year will be used. If it’s missing, it will be missing in the returned column as well. If the _data isn’t specified, the most recent population _data from the world bank is used.
value_column – the column containing the value to be used in the calculation.
target_column – the column where the population _data will be stored.
decimals – the number of decimals to use in the returned column.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing value as share of population.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_gov_exp_share_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, value_column: str = 'value', target_column: str = 'gov_exp_share', usd: bool = False, include_estimates: bool = False, update_data: bool = False) → pandas.DataFrame

Add value as share of Government Expenditure column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
date_column – Optionally, a date column can be specified. If so, the expenditure _data for that year will be used. If it’s missing, it will be missing in the returned column as well. If the date isn’t specified, the most recent _data is used.
value_column – the column containing the value to be converted to a share of expenditure.
include_estimates – Whether to include years for which the WEO _data is labelled as estimates.
usd – Whether to add the _data as US dollars or Local Currency Units.
target_column – the column where the expenditure _data will be stored.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the _data as a share: of expenditure, using the IMF World Economic Outlook.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_income_level_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'income_level', update_data: bool = False) → pandas.DataFrame

Add an income levels column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.
target_column – the column where the income level _data will be stored.
update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the income level _data.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_short_names_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'name_short') → pandas.DataFrame

Add short names column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DAC code, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DAC” must be passed.
target_column – the column where the short names will be stored.

Returns:

the original DataFrame with a new column containing short names.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_iso_codes_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'iso_code') → pandas.DataFrame

Add ISO3 column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DAC code, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DAC” must be passed.
target_column – the column where the iso codes will be stored.

Returns:

the original DataFrame with a new column containing ISO3 codes.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_median_observation(df: pandas.DataFrame, group_by: str | list = None, value_columns: str | list[str] = 'value', append: bool = True, group_name: str | None = None) → pandas.DataFrame

Add median observation column to a dataframe

Parameters:

df – the dataframe to which the column will be added
group_by – the column(s) by which to group the _data to calculate the median.
value_columns – the column(s) containing the values to be used for the median.
append – if True, the median observation will be appended to the dataframe. If False, the median observation will be stored in a new column.
group_name – the name of the group to be used in the id_column or as the name of
observations. (the column containing the median)

Returns:

the original dataframe with added rows for the median (if append is True): or a new column containing the median observations (if append is False).

Return type:

DataFrame

bblocks.dataframe_tools.add.add_flourish_geometries(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'geometry') → pandas.DataFrame

Add flourish geometries column to a dataframe

Parameters:

df – the dataframe to which the column will be added
id_column – the column containing the name, ISO3, ISO2, DAC code, UN code, etc.
id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DAC” must be passed.
target_column – the column where the flourish geometries will be stored.

Returns:

the original DataFrame with a new column containing the flourish geometries.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_value_as_share(df: pandas.DataFrame, value_col: str, share_of_value_col: str, target_col: str | None = None, decimals: int = 2) → pandas.DataFrame