bblocks.cleaning_tools.filter

Functions

filter_latest_by(→ pandas.DataFrame)

Calculate the latest value of (a) column(s) over a period of time.

_filter_by(→ pandas.DataFrame)

Helper function to filter a DataFrame by membership to a specific grouping.

filter_by_continent(→ pandas.DataFrame)

Filter a DataFrame by continent.

filter_by_un_region(→ pandas.DataFrame)

Filter a DataFrame by UN region. This includes, for example, "Western Africa",

filter_african_countries(→ pandas.DataFrame)

Filter a DataFrame to keep only African countries.

filter_eu_countries(→ pandas.DataFrame)

Filter a DataFrame to keep only European countries. The current list of members

Module Contents

bblocks.cleaning_tools.filter.filter_latest_by(data: pandas.DataFrame, date_column: str, value_columns: str | list | None = None, group_by: str | list | None = None) pandas.DataFrame

Calculate the latest value of (a) column(s) over a period of time.

Parameters:
  • data – a DataFrame with a date column (datetime or int) and one or more numeric columns

  • date_column – the name of the date (datetime or int) column

  • value_columns – one or more columns to calculate the average over

  • group_by – Optionally, specify which columns to consider for the latest operation

Returns:

A DataFrame with the latest value of the specified columns

bblocks.cleaning_tools.filter._filter_by(df: pandas.DataFrame, by: str, by_value: str, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame

Helper function to filter a DataFrame by membership to a specific grouping. The groupings come from those available through the country_converter package. More info available at: https://github.com/konstantinstadler/country_converter#classification-schemes

Parameters:
  • df – the DataFrame to filter

  • by – the type of grouping to filter by (e.g. “Continent”, “UNRegion”, “EU”)

  • by_value – the value of the grouping to filter by (e.g. “Africa”, “Europe”, “EU”)

  • id_column – the name of the column to use for the id (default: “iso_code”)

  • id_type – the type of id to use (default: “regex”)

Returns:

A filtered copy of the DataFrame.

bblocks.cleaning_tools.filter.filter_by_continent(df: pandas.DataFrame, continent: str, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame

Filter a DataFrame by continent. :param df: the DataFrame to filter :param continent: the continent to filter by (e.g. “Africa”, “Europe”, “EU”) :param id_column: the name of the column to use for the id (default: “iso_code”) :param id_type: the type of id to use (default: “regex”)

Returns:

A filtered copy of the DataFrame.

bblocks.cleaning_tools.filter.filter_by_un_region(df: pandas.DataFrame, region: str, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame

Filter a DataFrame by UN region. This includes, for example, “Western Africa”, “Eastern Africa”, “Southern Asia”, “Northern America”, “Central America”, “Eastern Asia”.

Parameters:
  • df – the DataFrame to filter

  • region – the region to filter by (e.g. “Western Africa”, “Eastern Africa”, etc.)

  • id_column – the name of the column to use for the id (default: “iso_code”)

  • id_type – the type of id to use (default: “regex”)

Returns:

bblocks.cleaning_tools.filter.filter_african_countries(df: pandas.DataFrame, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame

Filter a DataFrame to keep only African countries. :param df: the DataFrame to filter :param id_column: the name of the column to use for the id (default: “iso_code”) :param id_type: the type of id to use (default: “regex”)

Returns:

A filtered copy of the DataFrame.

bblocks.cleaning_tools.filter.filter_eu_countries(df: pandas.DataFrame, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame

Filter a DataFrame to keep only European countries. The current list of members of the European Union is always used.

Parameters:
  • df – the DataFrame to filter

  • id_column – the name of the column to use for the id (default: “iso_code”)

  • id_type – the type of id to use (default: “regex”)

Returns:

A filtered copy of the DataFrame.