bblocks.cleaning_tools.clean
Functions
|
Clean a string and return as float or integer. |
|
Clean a numeric column in a Pandas DataFrame or a Pandas Series which is |
|
Converts a Pandas series into a date series. |
|
Takes a Pandas' series with country IDs and converts them into the desired type. |
|
Converts a Pandas' series into a string series. |
|
Formats a Pandas' numeric series into a formatted string series. |
|
Custom function to convert values to datetime. |
Module Contents
- bblocks.cleaning_tools.clean.clean_number(number: str | pandas.Series, to: Type = float) float | int
Clean a string and return as float or integer. When selecting to=int, the default python round behaviour is used.
- Parameters:
number – the string to clean
to – the type to convert to (int or float)
- bblocks.cleaning_tools.clean.clean_numeric_series(data: pandas.Series | pandas.DataFrame, series_columns: str | list | None = None, to: Type = float) pandas.DataFrame | pandas.Series
Clean a numeric column in a Pandas DataFrame or a Pandas Series which is meant to be numeric. When selecting to=int, the default python round behaviour is used.
- Parameters:
data – it accepts a series or a dataframe. If a dataframe is passed, the column(s) to clean must be specified
series_columns – optionally declared (only when _data is a dataframe). To apply to one or more columns.
to – the type to convert to (int or float)
- bblocks.cleaning_tools.clean.to_date_column(series: pandas.Series, date_format: str | None = None) pandas.Series
Converts a Pandas series into a date series. The series must contain integers or strings that can be converted into datetime objects
- bblocks.cleaning_tools.clean.convert_id(series: pandas.Series, from_type: str = 'regex', to_type: str = 'ISO3', not_found: str | None = None, *, additional_mapping: dict = None) pandas.Series
Takes a Pandas’ series with country IDs and converts them into the desired type.
- Parameters:
series – the Pandas series to convert
from_type – the classification type according to which the series is encoded. Available types come from the country_converter package (https://github.com/konstantinstadler/country_converter#classification-schemes) For example: ISO3, ISO2, name_short, DACcode, etc.
to_type – the target classification type. Same options as from_type
not_found – what to do if the value is not found. Can pass a string or None. If None, the original value is passed through.
additional_mapping – Optionally, a dictionary with additional mappings can be used. The keys are the values to be converted and the values are the converted values. The keys follow the same datatype as the original values. The values must follow the same datatype as the target type.
- bblocks.cleaning_tools.clean.date_to_str(series: pandas.Series, date_format: str = '%d %B %Y') pandas.Series
Converts a Pandas’ series into a string series.
- Parameters:
series – the Pandas series to convert to a formatted date string
date_format – the format to use for the date string. The default is “%d %B %Y”
- bblocks.cleaning_tools.clean.format_number(series: pandas.Series, as_units: bool = False, as_percentage: bool = False, as_millions: bool = False, as_billions: bool = False, decimals: int = 2, add_sign: bool = False, other_format: str = '{:,.2f}') pandas.Series
Formats a Pandas’ numeric series into a formatted string series.
- Parameters:
series – the series to convert to a formatted string
as_units – formatted with commas to separate thousands and the specified decimals
as_percentage – formatted as a percentage with the specified decimals. This assumes that the series contains numbers where 1 would equal 100%.
as_millions – divided by 1 million, formatted with commas and the specified decimals
as_billions – divided by 1 billion, formatted with commas and the specified decimals
decimals – the number of decimals to use
add_sign – add a plus sign to positive numbers
other_format – Other formats to use. This option can only be used if all others are false. Examples are available at: https://mkaz.blog/code/python-string-format-cookbook/
- bblocks.cleaning_tools.clean.convert_to_datetime(date: str | int | pandas.Series) pandas.Series | pandas.Timestamp
Custom function to convert values to datetime. It handles integers or strings that represent only a year.