bblocks.dataframe_tools.add

Functions

__validate_add_column_params(→ tuple)

Validate parameters to use in an add column function type

add_population_column(→ pandas.DataFrame)

Add population column to a dataframe

add_poverty_ratio_column(→ pandas.DataFrame)

Add poverty headcount column to a dataframe

add_population_density_column(→ pandas.DataFrame)

Add population density column to a dataframe

add_gdp_column(→ pandas.DataFrame)

Add GDP column to a dataframe

add_gov_expenditure_column(→ pandas.DataFrame)

Add Government Expenditure column to a dataframe

add_gdp_share_column(→ pandas.DataFrame)

Add value as share of GDP column to a dataframe

add_population_share_column(→ pandas.DataFrame)

Add population share column to a dataframe

add_gov_exp_share_column(→ pandas.DataFrame)

Add value as share of Government Expenditure column to a dataframe

add_income_level_column(→ pandas.DataFrame)

Add an income levels column to a dataframe

add_short_names_column(→ pandas.DataFrame)

Add short names column to a dataframe

add_iso_codes_column(→ pandas.DataFrame)

Add ISO3 column to a dataframe

add_median_observation(→ pandas.DataFrame)

Add median observation column to a dataframe

add_flourish_geometries(→ pandas.DataFrame)

Add flourish geometries column to a dataframe

add_value_as_share(→ pandas.DataFrame)

Module Contents

bblocks.dataframe_tools.add.__validate_add_column_params(*, df: pandas.DataFrame, id_column: str, id_type: str | None, date_column: str | None) tuple

Validate parameters to use in an add column function type

bblocks.dataframe_tools.add.add_population_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'population', update_data: bool = False) pandas.DataFrame

Add population column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • date_column – Optionally, a date column can be specified. If so, the population for that year will be used. If it’s missing, it will be missing in the returned column as well. If the _data isn’t specified, the most recent population _data from the world bank is used.

  • target_column – the column where the population _data will be stored.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the population _data.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_poverty_ratio_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'poverty_ratio', update_data: bool = False) pandas.DataFrame

Add poverty headcount column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • date_column – Optionally, a date column can be specified. If so, the population for that year will be used. If it’s missing, it will be missing in the returned column as well. If the _data isn’t specified, the most recent _data is used.

  • target_column – the column where the population _data will be stored.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the poverty _data.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_population_density_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'population_density', update_data: bool = False) pandas.DataFrame

Add population density column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • date_column – Optionally, a date column can be specified. If so, the population for that year will be used. If it’s missing, it will be missing in the returned column as well. If the _data isn’t specified, the most recent _data is used.

  • target_column – the column where the population _data will be stored.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the population

density _data.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_gdp_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'gdp', usd: bool = True, include_estimates: bool = False, update_data: bool = False) pandas.DataFrame

Add GDP column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • date_column – Optionally, a date column can be specified. If so, the GDP for that year will be used. If it’s missing, it will be missing in the returned column as well. If the date isn’t specified, the most recent _data is used.

  • include_estimates – Whether to include years for which the WEO _data is labelled as estimates.

  • usd – Whether to add the _data as US dollars or Local Currency Units.

  • target_column – the column where the gdp _data will be stored.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the gdp _data from

the IMF World Economic Outlook.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_gov_expenditure_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, target_column: str = 'gov_exp', usd: bool = True, include_estimates: bool = False, update_data: bool = False) pandas.DataFrame

Add Government Expenditure column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • date_column – Optionally, a date column can be specified. If so, the expenditure for that year will be used. If it’s missing, it will be missing in the returned column as well. If the date isn’t specified, the most recent _data is used.

  • include_estimates – Whether to include years for which the WEO _data is labelled as estimates.

  • usd – Whether to add the _data as US dollars or Local Currency Units.

  • target_column – the column where the expenditure _data will be stored.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the expenditure _data from

the IMF World Economic Outlook.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_gdp_share_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, value_column: str = 'value', target_column: str = 'gdp_share', decimals: int = 2, usd: bool = False, include_estimates: bool = False, update_data: bool = False) pandas.DataFrame

Add value as share of GDP column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • date_column – Optionally, a date column can be specified. If so, the GDP for that year will be used. If it’s missing, it will be missing in the returned column as well. If the date isn’t specified, the most recent _data is used.

  • value_column – the column containing the value to be converted to a share of GDP.

  • decimals – the number of decimals to use in the returned column.

  • include_estimates – Whether to include years for which the WEO _data is labelled as estimates.

  • usd – Whether to add the data as US dollars or Local Currency Units.

  • target_column – the column where the gdp _data will be stored.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the _data as a share

of gdp _data, using the IMF World Economic Outlook.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_population_share_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, value_column: str = 'value', target_column: str = 'population_share', decimals: int = 2, update_data: bool = False) pandas.DataFrame

Add population share column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • date_column – Optionally, a date column can be specified. If so, the population for that year will be used. If it’s missing, it will be missing in the returned column as well. If the _data isn’t specified, the most recent population _data from the world bank is used.

  • value_column – the column containing the value to be used in the calculation.

  • target_column – the column where the population _data will be stored.

  • decimals – the number of decimals to use in the returned column.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing value as share of population.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_gov_exp_share_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, date_column: str | None = None, value_column: str = 'value', target_column: str = 'gov_exp_share', usd: bool = False, include_estimates: bool = False, update_data: bool = False) pandas.DataFrame

Add value as share of Government Expenditure column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • date_column – Optionally, a date column can be specified. If so, the expenditure _data for that year will be used. If it’s missing, it will be missing in the returned column as well. If the date isn’t specified, the most recent _data is used.

  • value_column – the column containing the value to be converted to a share of expenditure.

  • include_estimates – Whether to include years for which the WEO _data is labelled as estimates.

  • usd – Whether to add the _data as US dollars or Local Currency Units.

  • target_column – the column where the expenditure _data will be stored.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the _data as a share

of expenditure, using the IMF World Economic Outlook.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_income_level_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'income_level', update_data: bool = False) pandas.DataFrame

Add an income levels column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DACcode, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DACcode” must be passed.

  • target_column – the column where the income level _data will be stored.

  • update_data – whether to update the underlying _data or not.

Returns:

the original DataFrame with a new column containing the income level _data.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_short_names_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'name_short') pandas.DataFrame

Add short names column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DAC code, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DAC” must be passed.

  • target_column – the column where the short names will be stored.

Returns:

the original DataFrame with a new column containing short names.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_iso_codes_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'iso_code') pandas.DataFrame

Add ISO3 column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DAC code, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DAC” must be passed.

  • target_column – the column where the iso codes will be stored.

Returns:

the original DataFrame with a new column containing ISO3 codes.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_median_observation(df: pandas.DataFrame, group_by: str | list = None, value_columns: str | list[str] = 'value', append: bool = True, group_name: str | None = None) pandas.DataFrame

Add median observation column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • group_by – the column(s) by which to group the _data to calculate the median.

  • value_columns – the column(s) containing the values to be used for the median.

  • append – if True, the median observation will be appended to the dataframe. If False, the median observation will be stored in a new column.

  • group_name – the name of the group to be used in the id_column or as the name of

  • observations. (the column containing the median)

Returns:

the original dataframe with added rows for the median (if append is True)

or a new column containing the median observations (if append is False).

Return type:

DataFrame

bblocks.dataframe_tools.add.add_flourish_geometries(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'geometry') pandas.DataFrame

Add flourish geometries column to a dataframe

Parameters:
  • df – the dataframe to which the column will be added

  • id_column – the column containing the name, ISO3, ISO2, DAC code, UN code, etc.

  • id_type – the type of ID used in th id_column. The default ‘regex’ tries to infer using the rules from the ‘country_converter’ package. For the DAC codes, “DAC” must be passed.

  • target_column – the column where the flourish geometries will be stored.

Returns:

the original DataFrame with a new column containing the flourish geometries.

Return type:

DataFrame

bblocks.dataframe_tools.add.add_value_as_share(df: pandas.DataFrame, value_col: str, share_of_value_col: str, target_col: str | None = None, decimals: int = 2) pandas.DataFrame