bblocks.cleaning_tools.clean ============================ .. py:module:: bblocks.cleaning_tools.clean Functions --------- .. autoapisummary:: bblocks.cleaning_tools.clean.clean_number bblocks.cleaning_tools.clean.clean_numeric_series bblocks.cleaning_tools.clean.to_date_column bblocks.cleaning_tools.clean.convert_id bblocks.cleaning_tools.clean.date_to_str bblocks.cleaning_tools.clean.format_number bblocks.cleaning_tools.clean.convert_to_datetime Module Contents --------------- .. py:function:: clean_number(number: str | pandas.Series, to: Type = float) -> float | int Clean a string and return as float or integer. When selecting to=int, the default python round behaviour is used. :param number: the string to clean :param to: the type to convert to (int or float) .. py:function:: clean_numeric_series(data: pandas.Series | pandas.DataFrame, series_columns: str | list | None = None, to: Type = float) -> pandas.DataFrame | pandas.Series Clean a numeric column in a Pandas DataFrame or a Pandas Series which is meant to be numeric. When selecting to=int, the default python round behaviour is used. :param data: it accepts a series or a dataframe. If a dataframe is passed, the column(s) to clean must be specified :param series_columns: optionally declared (only when _data is a dataframe). To apply to one or more columns. :param to: the type to convert to (int or float) .. py:function:: to_date_column(series: pandas.Series, date_format: str | None = None) -> pandas.Series Converts a Pandas series into a date series. The series must contain integers or strings that can be converted into datetime objects .. py:function:: convert_id(series: pandas.Series, from_type: str = 'regex', to_type: str = 'ISO3', not_found: str | None = None, *, additional_mapping: dict = None) -> pandas.Series Takes a Pandas' series with country IDs and converts them into the desired type. :param series: the Pandas series to convert :param from_type: the classification type according to which the series is encoded. Available types come from the country_converter package (https://github.com/konstantinstadler/country_converter#classification-schemes) For example: ISO3, ISO2, name_short, DACcode, etc. :param to_type: the target classification type. Same options as from_type :param not_found: what to do if the value is not found. Can pass a string or None. If None, the original value is passed through. :param additional_mapping: Optionally, a dictionary with additional mappings can be used. The keys are the values to be converted and the values are the converted values. The keys follow the same datatype as the original values. The values must follow the same datatype as the target type. .. py:function:: date_to_str(series: pandas.Series, date_format: str = '%d %B %Y') -> pandas.Series Converts a Pandas' series into a string series. :param series: the Pandas series to convert to a formatted date string :param date_format: the format to use for the date string. The default is "%d %B %Y" .. py:function:: format_number(series: pandas.Series, as_units: bool = False, as_percentage: bool = False, as_millions: bool = False, as_billions: bool = False, decimals: int = 2, add_sign: bool = False, other_format: str = '{:,.2f}') -> pandas.Series Formats a Pandas' numeric series into a formatted string series. :param series: the series to convert to a formatted string :param as_units: formatted with commas to separate thousands and the specified decimals :param as_percentage: formatted as a percentage with the specified decimals. This assumes that the series contains numbers where 1 would equal 100%. :param as_millions: divided by 1 million, formatted with commas and the specified decimals :param as_billions: divided by 1 billion, formatted with commas and the specified decimals :param decimals: the number of decimals to use :param add_sign: add a plus sign to positive numbers :param other_format: Other formats to use. This option can only be used if all others are false. Examples are available at: https://mkaz.blog/code/python-string-format-cookbook/ .. py:function:: convert_to_datetime(date: str | int | pandas.Series) -> pandas.Series | pandas.Timestamp Custom function to convert values to datetime. It handles integers or strings that represent only a year.