bblocks.cleaning_tools.filter
Functions
|
Calculate the latest value of (a) column(s) over a period of time. |
|
Helper function to filter a DataFrame by membership to a specific grouping. |
|
Filter a DataFrame by continent. |
|
Filter a DataFrame by UN region. This includes, for example, "Western Africa", |
|
Filter a DataFrame to keep only African countries. |
|
Filter a DataFrame to keep only European countries. The current list of members |
Module Contents
- bblocks.cleaning_tools.filter.filter_latest_by(data: pandas.DataFrame, date_column: str, value_columns: str | list | None = None, group_by: str | list | None = None) pandas.DataFrame
Calculate the latest value of (a) column(s) over a period of time.
- Parameters:
data – a DataFrame with a date column (datetime or int) and one or more numeric columns
date_column – the name of the date (datetime or int) column
value_columns – one or more columns to calculate the average over
group_by – Optionally, specify which columns to consider for the latest operation
- Returns:
A DataFrame with the latest value of the specified columns
- bblocks.cleaning_tools.filter._filter_by(df: pandas.DataFrame, by: str, by_value: str, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame
Helper function to filter a DataFrame by membership to a specific grouping. The groupings come from those available through the country_converter package. More info available at: https://github.com/konstantinstadler/country_converter#classification-schemes
- Parameters:
df – the DataFrame to filter
by – the type of grouping to filter by (e.g. “Continent”, “UNRegion”, “EU”)
by_value – the value of the grouping to filter by (e.g. “Africa”, “Europe”, “EU”)
id_column – the name of the column to use for the id (default: “iso_code”)
id_type – the type of id to use (default: “regex”)
- Returns:
A filtered copy of the DataFrame.
- bblocks.cleaning_tools.filter.filter_by_continent(df: pandas.DataFrame, continent: str, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame
Filter a DataFrame by continent. :param df: the DataFrame to filter :param continent: the continent to filter by (e.g. “Africa”, “Europe”, “EU”) :param id_column: the name of the column to use for the id (default: “iso_code”) :param id_type: the type of id to use (default: “regex”)
- Returns:
A filtered copy of the DataFrame.
- bblocks.cleaning_tools.filter.filter_by_un_region(df: pandas.DataFrame, region: str, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame
Filter a DataFrame by UN region. This includes, for example, “Western Africa”, “Eastern Africa”, “Southern Asia”, “Northern America”, “Central America”, “Eastern Asia”.
- Parameters:
df – the DataFrame to filter
region – the region to filter by (e.g. “Western Africa”, “Eastern Africa”, etc.)
id_column – the name of the column to use for the id (default: “iso_code”)
id_type – the type of id to use (default: “regex”)
Returns:
- bblocks.cleaning_tools.filter.filter_african_countries(df: pandas.DataFrame, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame
Filter a DataFrame to keep only African countries. :param df: the DataFrame to filter :param id_column: the name of the column to use for the id (default: “iso_code”) :param id_type: the type of id to use (default: “regex”)
- Returns:
A filtered copy of the DataFrame.
- bblocks.cleaning_tools.filter.filter_eu_countries(df: pandas.DataFrame, id_column: str = 'iso_code', id_type: str = 'regex') pandas.DataFrame
Filter a DataFrame to keep only European countries. The current list of members of the European Union is always used.
- Parameters:
df – the DataFrame to filter
id_column – the name of the column to use for the id (default: “iso_code”)
id_type – the type of id to use (default: “regex”)
- Returns:
A filtered copy of the DataFrame.