bblocks ======= .. py:module:: bblocks Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/bblocks/analysis_tools/index /autoapi/bblocks/cleaning_tools/index /autoapi/bblocks/config/index /autoapi/bblocks/dataframe_tools/index /autoapi/bblocks/import_tools/index /autoapi/bblocks/logger/index /autoapi/bblocks/other_tools/index Attributes ---------- .. autoapisummary:: bblocks.__version__ Classes ------- .. autoapisummary:: bblocks.WorldBankData bblocks.GHED bblocks.WFPData bblocks.WorldEconomicOutlook bblocks.Aids bblocks.DebtIDS Functions --------- .. autoapisummary:: bblocks.get_dsa bblocks.add_iso_codes_column bblocks.add_income_level_column bblocks.add_short_names_column bblocks.clean_number bblocks.clean_numeric_series bblocks.to_date_column bblocks.convert_id bblocks.date_to_str bblocks.format_number bblocks.filter_by_continent bblocks.filter_by_un_region bblocks.filter_eu_countries bblocks.filter_african_countries bblocks.filter_latest_by bblocks.set_bblocks_data_path Package Contents ---------------- .. py:data:: __version__ :value: '1.4.1' .. py:class:: WorldBankData Bases: :py:obj:`bblocks.import_tools.common.ImportData` An object to help download data from the World Bank. In order to use, create an instance of this class. Then, call the load_indicator method to load an indicator. This can be done multiple times. If the _data for an indicator has never been downloaded, it will be downloaded. If it has been downloaded, it will be loaded from disk. If `update_data` is set to True when creating the object, the _data will be updated from the World Bank for each indicator. You can force an update by calling `update` if you want to refresh the _data stored on disk. You can get a dataframe of the _data by calling `get_data`. .. py:attribute:: _indicators :type: dict[str, tuple[pandas.DataFrame, dict]] .. py:method:: load_data(indicator: str | list[str], start_year: int | None = None, end_year: int | None = None, most_recent_only: bool = False, db: int = 2, **kwargs) -> WorldBankData Get an indicator from the World Bank API :param indicator: the code from the World Bank data portal (e.g. "SP.POP.TOTL") :param start_year: The first year to include in the data :param end_year: The last year to include in the data :param most_recent_only: If True, only get the most recent non-empty value for each country :param db: The database to use. By default, use the WDI database (2) :returns: The same object to allow chaining .. py:method:: update_data(reload_data: bool = True) -> bblocks.import_tools.common.ImportData Update the _data saved on disk for the different indicators When called, it will go through each indicator and update the _data saved based on the parameters passed to load_indicator. :returns: The same object to allow chaining .. py:method:: get_data(indicators: str | list = 'all', **kwargs) -> pandas.DataFrame .. py:class:: GHED Bases: :py:obj:`bblocks.import_tools.common.ImportData` An object to extract GHED _data To use, create an instance of the class and call the load_indicator method. If the _data is already downloaded, it will be loaded from disk. If not, it will be downloaded. If `update_data` is set to True, the _data will be downloaded regardless of whether it is already on disk. To force an update, call the update method. To get the _data, call the get_data method. To get the metadata, call the get_metadata method. .. py:attribute:: _metadata :type: pandas.DataFrame :value: None .. py:method:: load_data() -> bblocks.import_tools.common.ImportData Load GHED data :returns: The same object to allow chaining .. py:method:: update_data(reload_data: bool) -> bblocks.import_tools.common.ImportData Update GHED _data :returns: The same object to allow chaining .. py:method:: get_metadata() -> pandas.DataFrame Get GHED metadata as a pandas dataframe :returns: A pandas dataframe with the metadata .. py:class:: WFPData Bases: :py:obj:`bblocks.import_tools.common.ImportData` Class to download and read WFP inflation and insufficient food data .. py:property:: available_indicators :type: KeysView View the available indicators from WFP .. py:method:: _country_codes() -> dict .. py:method:: load_data(indicator: str | list) -> None Load an indicator into the WFPData object .. py:method:: update_data(reload_data: bool = True) -> None Update the data for all the indicators currently loaded .. py:class:: WorldEconomicOutlook Bases: :py:obj:`bblocks.import_tools.common.ImportData` World Economic Outlook _data .. py:attribute:: year :type: Optional[int] :value: None .. py:attribute:: release :type: Optional[int] :value: None .. py:method:: __post_init__() -> None .. py:method:: __repr__() -> str .. py:method:: __load_data() -> None loading WEO as a clean dataframe :param latest_y: passed only optional to override the behaviour to get the latest :param release year for the WEO.: :param latest_r: passed only optionally to override the behaviour to get the latest :param released value: :type released value: 1 or 2 .. py:method:: _check_indicators(indicators: str | list | None = None) -> None | dict .. py:method:: load_data(indicator: str | list) -> bblocks.import_tools.common.ImportData Loads a specific indicator from the World Economic Outlook _data .. py:method:: update_data(reload_data: bool = True) -> None Update the stored WEO _data, using WEO package. Args: .. py:method:: available_indicators() -> None Print the available indicators in the dataset .. py:method:: get_data(indicators: str | list = 'all', keep_metadata: bool = False) -> pandas.DataFrame .. py:class:: Aids Bases: :py:obj:`bblocks.import_tools.common.ImportData` An object to extract data from UNAIDS. To use, create an instance of the class. The load indicators using the load_indicators method. This can be done multiple times. To return a dataframe of all available indicators to load, use the available_indicators class attribute. If the data for an indicator has never been downloaded, it will be downloaded. If it has been downloaded, it will be loaded from disk. If update_data is set to true, the data will be downloaded each time an indicator is loaded. You can force an update by calling 'update', and all indicators will be reloaded into the object. You can get a dataframe by calling 'get_data' and passing the indicator name(s) (or None and this will return all indicators) and passing the area grouping(s) ('all' by default) .. py:property:: available_indicators :type: pandas.DataFrame Returns a dataframe of available indicators .. py:method:: load_data(indicator: str, area_grouping: str = 'all') -> bblocks.import_tools.common.ImportData Load an indicator to the object indicator (str): The name of the indicator to load. To see a DataFrame of available indicators, use the available_indicators method. area_grouping (str): The grouping to use. Choose from ["country", "region", "all"]. :returns: The same object to allow chaining .. py:method:: update_data(reload_data: bool) Update all loaded indicators saved on the disk When called, it will go through each loaded indicator/area grouping combination and update the data saved on disk. :returns: The same object to allow chaining .. py:method:: get_data(indicators: Optional[str | list] = None, area_grouping: str = 'all') -> pandas.DataFrame Get the data as a Pandas DataFrame :param indicators: By default, all indicators are returned in a single DataFrame. If a list of indicators is passed, only those indicators will be returned. A single indicator can be passed as a string as well. :param area_grouping: The area grouping to use. Choose from ["country", "region", "all"]. Default is "all". :type area_grouping: str :returns: A Pandas DataFrame with the requested indicator data .. py:class:: DebtIDS Bases: :py:obj:`bblocks.import_tools.common.ImportData` Import data from the World Bank's International Debt Statistics database. To use this object, first create an instance of it. Then use the `load_data` method to load indicators. One or more indicators can be loaded at a time, and a starting and end year must be specified. If the data has not been downloaded before, it will be downloaded from the World Bank API. If the data has been downloaded before, it will be loaded from the local data folder. To get a DataFrame, use the `get_data` method. You can get the data for one or more, or for all indicators at once. To update the data, use the `update_data` method. This will download the latest data from the World Bank API and overwrite the local data. - To get a list of available indicators, use the `get_available_indicators` method. - To get a list of available debt service indicators, use the `debt_service_indicators` method. - To get a list of available debt stocks indicators, use the `debt_stocks_indicators` method. .. py:method:: __post_init__() Set the path to the data folder and create it if it doesn't exist .. py:method:: _check_stored_data(indicator: str, start_year: int, end_year: int) -> str | bool Check if the data is already stored locally This also checks if the years requested are inside another file. :param indicator: The indicator to check :type indicator: str :param start_year: The start year of the data :type start_year: int :param end_year: The end year of the data :type end_year: int :returns: The filename of the data if it exists bool: False if the data doesn't exist :rtype: str .. py:method:: _indicator_parameters(indicator: str) -> tuple[str, int, int] :staticmethod: Get the indicator, start year and end year from the indicator name. .. py:method:: get_available_indicators() -> dict :classmethod: Get a dictionary of all available indicators in the IDS database. .. py:method:: debt_service_indicators(detailed_category: bool = True) -> dict :classmethod: Get a dictionary of Debt Service indicators in the IDS database. .. py:method:: debt_stocks_indicators(detailed_category: bool = True) -> dict :classmethod: Get a dictionary of Debt Service indicators in the IDS database. .. py:method:: _get_indicator(indicator: str, start_year: int, end_year: int) -> bblocks.import_tools.common.ImportData Get data for an indicator. This method is not meant to be accessed directly. Instead, use the `.get_data()` method. :param indicator: The indicator to get. They must be in the IDS format (e.g. DT.DOD.DECT.CD). To view all available indicators, call `.get_available_indicators()`. :returns: The same object to allow chaining of methods .. py:method:: load_data(indicators: str | list, start_year: int, end_year: int) -> bblocks.import_tools.common.ImportData Load the data for an indicator or a list of indicators. :param indicators: The indicator(s) to load. They must be in the IDS format (e.g. DT.DOD.DECT.CD). To view all available indicators, call `.get_available_indicators()`. :param start_year: The first year to include in the data :param end_year: The last year to include in the data .. py:method:: update_data(reload_data: bool = True) -> bblocks.import_tools.common.ImportData Update the data for all loaded indicators. .. py:method:: get_data(indicators: str | list = 'all', **kwargs) -> pandas.DataFrame Get the data for an indicator or a list of indicators. :param indicators: The indicator(s) to get. They must be in the IDS format (e.g. DT.DOD.DECT.CD). To get all available indicators, set `indicators="all"`. :returns: A pandas dataframe with the requested data. .. py:function:: get_dsa(update=False, local_path: str = None) -> pandas.DataFrame Extract DSA _data from the Extract the most recent Debt Sustainability Assessment (DSA) _data for PRGT-Eligible Countries from the IMF website. URL = https://www.imf.org/external/Pubs/ft/dsa/DSAlist.pdf :param local_path: where the downloaded PDF will be stored :param update: if True, updates the _data from the IMF website. Otherwise it loads the _data from the local file. If a local file does not exist, the _data will be extracted from the website. :type update: bool :returns: pandas dataframe with country, latest publication date, and risk of debt distress .. py:function:: add_iso_codes_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'iso_code') -> pandas.DataFrame Add ISO3 column to a dataframe :param df: the dataframe to which the column will be added :param id_column: the column containing the name, ISO3, ISO2, DAC code, UN code, etc. :param id_type: the type of ID used in th id_column. The default 'regex' tries to infer using the rules from the 'country_converter' package. For the DAC codes, "DAC" must be passed. :param target_column: the column where the iso codes will be stored. :returns: the original DataFrame with a new column containing ISO3 codes. :rtype: DataFrame .. py:function:: add_income_level_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'income_level', update_data: bool = False) -> pandas.DataFrame Add an income levels column to a dataframe :param df: the dataframe to which the column will be added :param id_column: the column containing the name, ISO3, ISO2, DACcode, UN code, etc. :param id_type: the type of ID used in th id_column. The default 'regex' tries to infer using the rules from the 'country_converter' package. For the DAC codes, "DACcode" must be passed. :param target_column: the column where the income level _data will be stored. :param update_data: whether to update the underlying _data or not. :returns: the original DataFrame with a new column containing the income level _data. :rtype: DataFrame .. py:function:: add_short_names_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'name_short') -> pandas.DataFrame Add short names column to a dataframe :param df: the dataframe to which the column will be added :param id_column: the column containing the name, ISO3, ISO2, DAC code, UN code, etc. :param id_type: the type of ID used in th id_column. The default 'regex' tries to infer using the rules from the 'country_converter' package. For the DAC codes, "DAC" must be passed. :param target_column: the column where the short names will be stored. :returns: the original DataFrame with a new column containing short names. :rtype: DataFrame .. py:function:: clean_number(number: str | pandas.Series, to: Type = float) -> float | int Clean a string and return as float or integer. When selecting to=int, the default python round behaviour is used. :param number: the string to clean :param to: the type to convert to (int or float) .. py:function:: clean_numeric_series(data: pandas.Series | pandas.DataFrame, series_columns: str | list | None = None, to: Type = float) -> pandas.DataFrame | pandas.Series Clean a numeric column in a Pandas DataFrame or a Pandas Series which is meant to be numeric. When selecting to=int, the default python round behaviour is used. :param data: it accepts a series or a dataframe. If a dataframe is passed, the column(s) to clean must be specified :param series_columns: optionally declared (only when _data is a dataframe). To apply to one or more columns. :param to: the type to convert to (int or float) .. py:function:: to_date_column(series: pandas.Series, date_format: str | None = None) -> pandas.Series Converts a Pandas series into a date series. The series must contain integers or strings that can be converted into datetime objects .. py:function:: convert_id(series: pandas.Series, from_type: str = 'regex', to_type: str = 'ISO3', not_found: str | None = None, *, additional_mapping: dict = None) -> pandas.Series Takes a Pandas' series with country IDs and converts them into the desired type. :param series: the Pandas series to convert :param from_type: the classification type according to which the series is encoded. Available types come from the country_converter package (https://github.com/konstantinstadler/country_converter#classification-schemes) For example: ISO3, ISO2, name_short, DACcode, etc. :param to_type: the target classification type. Same options as from_type :param not_found: what to do if the value is not found. Can pass a string or None. If None, the original value is passed through. :param additional_mapping: Optionally, a dictionary with additional mappings can be used. The keys are the values to be converted and the values are the converted values. The keys follow the same datatype as the original values. The values must follow the same datatype as the target type. .. py:function:: date_to_str(series: pandas.Series, date_format: str = '%d %B %Y') -> pandas.Series Converts a Pandas' series into a string series. :param series: the Pandas series to convert to a formatted date string :param date_format: the format to use for the date string. The default is "%d %B %Y" .. py:function:: format_number(series: pandas.Series, as_units: bool = False, as_percentage: bool = False, as_millions: bool = False, as_billions: bool = False, decimals: int = 2, add_sign: bool = False, other_format: str = '{:,.2f}') -> pandas.Series Formats a Pandas' numeric series into a formatted string series. :param series: the series to convert to a formatted string :param as_units: formatted with commas to separate thousands and the specified decimals :param as_percentage: formatted as a percentage with the specified decimals. This assumes that the series contains numbers where 1 would equal 100%. :param as_millions: divided by 1 million, formatted with commas and the specified decimals :param as_billions: divided by 1 billion, formatted with commas and the specified decimals :param decimals: the number of decimals to use :param add_sign: add a plus sign to positive numbers :param other_format: Other formats to use. This option can only be used if all others are false. Examples are available at: https://mkaz.blog/code/python-string-format-cookbook/ .. py:function:: filter_by_continent(df: pandas.DataFrame, continent: str, id_column: str = 'iso_code', id_type: str = 'regex') -> pandas.DataFrame Filter a DataFrame by continent. :param df: the DataFrame to filter :param continent: the continent to filter by (e.g. "Africa", "Europe", "EU") :param id_column: the name of the column to use for the id (default: "iso_code") :param id_type: the type of id to use (default: "regex") :returns: A filtered copy of the DataFrame. .. py:function:: filter_by_un_region(df: pandas.DataFrame, region: str, id_column: str = 'iso_code', id_type: str = 'regex') -> pandas.DataFrame Filter a DataFrame by UN region. This includes, for example, "Western Africa", "Eastern Africa", "Southern Asia", "Northern America", "Central America", "Eastern Asia". :param df: the DataFrame to filter :param region: the region to filter by (e.g. "Western Africa", "Eastern Africa", etc.) :param id_column: the name of the column to use for the id (default: "iso_code") :param id_type: the type of id to use (default: "regex") Returns: .. py:function:: filter_eu_countries(df: pandas.DataFrame, id_column: str = 'iso_code', id_type: str = 'regex') -> pandas.DataFrame Filter a DataFrame to keep only European countries. The current list of members of the European Union is always used. :param df: the DataFrame to filter :param id_column: the name of the column to use for the id (default: "iso_code") :param id_type: the type of id to use (default: "regex") :returns: A filtered copy of the DataFrame. .. py:function:: filter_african_countries(df: pandas.DataFrame, id_column: str = 'iso_code', id_type: str = 'regex') -> pandas.DataFrame Filter a DataFrame to keep only African countries. :param df: the DataFrame to filter :param id_column: the name of the column to use for the id (default: "iso_code") :param id_type: the type of id to use (default: "regex") :returns: A filtered copy of the DataFrame. .. py:function:: filter_latest_by(data: pandas.DataFrame, date_column: str, value_columns: str | list | None = None, group_by: str | list | None = None) -> pandas.DataFrame Calculate the latest value of (a) column(s) over a period of time. :param data: a DataFrame with a date column (datetime or int) and one or more numeric columns :param date_column: the name of the date (datetime or int) column :param value_columns: one or more columns to calculate the average over :param group_by: Optionally, specify which columns to consider for the latest operation :returns: A DataFrame with the latest value of the specified columns .. py:function:: set_bblocks_data_path(path)