bblocks
=======

.. py:module:: bblocks


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/bblocks/analysis_tools/index
   /autoapi/bblocks/cleaning_tools/index
   /autoapi/bblocks/config/index
   /autoapi/bblocks/dataframe_tools/index
   /autoapi/bblocks/import_tools/index
   /autoapi/bblocks/logger/index
   /autoapi/bblocks/other_tools/index


Attributes
----------

.. autoapisummary::

   bblocks.__version__


Classes
-------

.. autoapisummary::

   bblocks.WorldBankData
   bblocks.GHED
   bblocks.WFPData
   bblocks.WorldEconomicOutlook
   bblocks.Aids
   bblocks.DebtIDS


Functions
---------

.. autoapisummary::

   bblocks.get_dsa
   bblocks.add_iso_codes_column
   bblocks.add_income_level_column
   bblocks.add_short_names_column
   bblocks.clean_number
   bblocks.clean_numeric_series
   bblocks.to_date_column
   bblocks.convert_id
   bblocks.date_to_str
   bblocks.format_number
   bblocks.filter_by_continent
   bblocks.filter_by_un_region
   bblocks.filter_eu_countries
   bblocks.filter_african_countries
   bblocks.filter_latest_by
   bblocks.set_bblocks_data_path


Package Contents
----------------

.. py:data:: __version__
   :value: '1.4.1'


.. py:class:: WorldBankData

   Bases: :py:obj:`bblocks.import_tools.common.ImportData`


   An object to help download data from the World Bank.
   In order to use, create an instance of this class.
   Then, call the load_indicator method to load an indicator. This can be done multiple times.
   If the _data for an indicator has never been downloaded, it will be downloaded.
   If it has been downloaded, it will be loaded from disk.
   If `update_data` is set to True when creating the object, the _data will be updated
   from the World Bank for each indicator.
   You can force an update by calling `update` if you want to refresh the _data stored on disk.
   You can get a dataframe of the _data by calling `get_data`.


   .. py:attribute:: _indicators
      :type:  dict[str, tuple[pandas.DataFrame, dict]]


   .. py:method:: load_data(indicator: str | list[str], start_year: int | None = None, end_year: int | None = None, most_recent_only: bool = False, db: int = 2, **kwargs) -> WorldBankData

      Get an indicator from the World Bank API

      :param indicator: the code from the World Bank data portal (e.g. "SP.POP.TOTL")
      :param start_year: The first year to include in the data
      :param end_year: The last year to include in the data
      :param most_recent_only: If True, only get the most recent non-empty value for each country
      :param db: The database to use. By default, use the WDI database (2)

      :returns: The same object to allow chaining


   .. py:method:: update_data(reload_data: bool = True) -> bblocks.import_tools.common.ImportData

      Update the _data saved on disk for the different indicators

      When called, it will go through each indicator and update the _data saved
      based on the parameters passed to load_indicator.

      :returns: The same object to allow chaining


   .. py:method:: get_data(indicators: str | list = 'all', **kwargs) -> pandas.DataFrame


.. py:class:: GHED

   Bases: :py:obj:`bblocks.import_tools.common.ImportData`


   An object to extract GHED _data

   To use, create an instance of the class and call the load_indicator method.
   If the _data is already downloaded, it will be loaded from disk. If not, it will be downloaded.
   If `update_data` is set to True, the _data will be downloaded regardless of whether it is already on disk.
   To force an update, call the update method.
   To get the _data, call the get_data method.
   To get the metadata, call the get_metadata method.


   .. py:attribute:: _metadata
      :type:  pandas.DataFrame
      :value: None


   .. py:method:: load_data() -> bblocks.import_tools.common.ImportData

      Load GHED data

      :returns: The same object to allow chaining


   .. py:method:: update_data(reload_data: bool) -> bblocks.import_tools.common.ImportData

      Update GHED _data

      :returns: The same object to allow chaining


   .. py:method:: get_metadata() -> pandas.DataFrame

      Get GHED metadata as a pandas dataframe

      :returns: A pandas dataframe with the metadata


.. py:class:: WFPData

   Bases: :py:obj:`bblocks.import_tools.common.ImportData`


   Class to download and read WFP inflation and insufficient food data


   .. py:property:: available_indicators
      :type: KeysView


      View the available indicators from WFP


   .. py:method:: _country_codes() -> dict


   .. py:method:: load_data(indicator: str | list) -> None

      Load an indicator into the WFPData object


   .. py:method:: update_data(reload_data: bool = True) -> None

      Update the data for all the indicators currently loaded


.. py:class:: WorldEconomicOutlook

   Bases: :py:obj:`bblocks.import_tools.common.ImportData`


   World Economic Outlook _data


   .. py:attribute:: year
      :type:  Optional[int]
      :value: None


   .. py:attribute:: release
      :type:  Optional[int]
      :value: None


   .. py:method:: __post_init__() -> None


   .. py:method:: __repr__() -> str


   .. py:method:: __load_data() -> None

      loading WEO as a clean dataframe

      :param latest_y: passed only optional to override the behaviour to get the latest
      :param release year for the WEO.:
      :param latest_r: passed only optionally to override the behaviour to get the latest
      :param released value:
      :type released value: 1 or 2


   .. py:method:: _check_indicators(indicators: str | list | None = None) -> None | dict


   .. py:method:: load_data(indicator: str | list) -> bblocks.import_tools.common.ImportData

      Loads a specific indicator from the World Economic Outlook _data


   .. py:method:: update_data(reload_data: bool = True) -> None

      Update the stored WEO _data, using WEO package.

      Args:


   .. py:method:: available_indicators() -> None

      Print the available indicators in the dataset


   .. py:method:: get_data(indicators: str | list = 'all', keep_metadata: bool = False) -> pandas.DataFrame


.. py:class:: Aids

   Bases: :py:obj:`bblocks.import_tools.common.ImportData`


   An object to extract data from UNAIDS.

   To use, create an instance of the class.
   The load indicators using the load_indicators method. This can be done multiple times.
   To return a dataframe of all available indicators to load, use the available_indicators class attribute.
   If the data for an indicator has never been downloaded, it will be downloaded.
   If it has been downloaded, it will be loaded from disk. If update_data is set to true,
   the data will be downloaded each time an indicator is loaded.
   You can force an update by calling 'update', and all indicators will be reloaded into the object.
   You can get a dataframe by calling 'get_data' and passing the indicator name(s)
   (or None and this will return all indicators) and passing the area grouping(s) ('all' by default)


   .. py:property:: available_indicators
      :type: pandas.DataFrame


      Returns a dataframe of available indicators


   .. py:method:: load_data(indicator: str, area_grouping: str = 'all') -> bblocks.import_tools.common.ImportData

      Load an indicator to the object

      indicator (str): The name of the indicator to load. To see a DataFrame of available
          indicators, use the available_indicators method.
      area_grouping (str): The grouping to use. Choose from ["country", "region", "all"].

      :returns: The same object to allow chaining


   .. py:method:: update_data(reload_data: bool)

      Update all loaded indicators saved on the disk

      When called, it will go through each loaded indicator/area grouping combination
      and update the data saved on disk.

      :returns: The same object to allow chaining


   .. py:method:: get_data(indicators: Optional[str | list] = None, area_grouping: str = 'all') -> pandas.DataFrame

      Get the data as a Pandas DataFrame

      :param indicators: By default, all indicators are returned in a single DataFrame.
                         If a list of indicators is passed, only those indicators will be returned.
                         A single indicator can be passed as a string as well.
      :param area_grouping: The area grouping to use. Choose from ["country", "region", "all"].
                            Default is "all".
      :type area_grouping: str

      :returns: A Pandas DataFrame with the requested indicator data


.. py:class:: DebtIDS

   Bases: :py:obj:`bblocks.import_tools.common.ImportData`


   Import data from the World Bank's International Debt Statistics database.

   To use this object, first create an instance of it.
   Then use the `load_data` method to load indicators. One or more indicators can
   be loaded at a time, and a starting and end year must be specified.

   If the data has not been downloaded before, it will be downloaded from the
   World Bank API. If the data has been downloaded before, it will be loaded from
   the local data folder.

   To get a DataFrame, use the `get_data` method. You can get the data for one or more,
   or for all indicators at once.

   To update the data, use the `update_data` method. This will download the latest
   data from the World Bank API and overwrite the local data.

   - To get a list of available indicators, use the `get_available_indicators` method.
   - To get a list of available debt service indicators, use the
     `debt_service_indicators` method.
   - To get a list of available debt stocks indicators, use the
     `debt_stocks_indicators` method.


   .. py:method:: __post_init__()

      Set the path to the data folder and create it if it doesn't exist


   .. py:method:: _check_stored_data(indicator: str, start_year: int, end_year: int) -> str | bool

      Check if the data is already stored locally

      This also checks if the years requested are inside another file.

      :param indicator: The indicator to check
      :type indicator: str
      :param start_year: The start year of the data
      :type start_year: int
      :param end_year: The end year of the data
      :type end_year: int

      :returns: The filename of the data if it exists
                bool: False if the data doesn't exist
      :rtype: str


   .. py:method:: _indicator_parameters(indicator: str) -> tuple[str, int, int]
      :staticmethod:


      Get the indicator, start year and end year from the indicator name.


   .. py:method:: get_available_indicators() -> dict
      :classmethod:


      Get a dictionary of all available indicators in the IDS database.


   .. py:method:: debt_service_indicators(detailed_category: bool = True) -> dict
      :classmethod:


      Get a dictionary of Debt Service indicators in the IDS database.


   .. py:method:: debt_stocks_indicators(detailed_category: bool = True) -> dict
      :classmethod:


      Get a dictionary of Debt Service indicators in the IDS database.


   .. py:method:: _get_indicator(indicator: str, start_year: int, end_year: int) -> bblocks.import_tools.common.ImportData

      Get data for an indicator. This method is not meant to be accessed
      directly. Instead, use the `.get_data()` method.

      :param indicator: The indicator to get. They must be in the IDS format
                        (e.g. DT.DOD.DECT.CD). To view all available indicators, call
                        `.get_available_indicators()`.

      :returns: The same object to allow chaining of methods


   .. py:method:: load_data(indicators: str | list, start_year: int, end_year: int) -> bblocks.import_tools.common.ImportData

      Load the data for an indicator or a list of indicators.

      :param indicators: The indicator(s) to load. They must be in the IDS format
                         (e.g. DT.DOD.DECT.CD). To view all available indicators, call
                         `.get_available_indicators()`.
      :param start_year: The first year to include in the data
      :param end_year: The last year to include in the data


   .. py:method:: update_data(reload_data: bool = True) -> bblocks.import_tools.common.ImportData

      Update the data for all loaded indicators.


   .. py:method:: get_data(indicators: str | list = 'all', **kwargs) -> pandas.DataFrame

      Get the data for an indicator or a list of indicators.

      :param indicators: The indicator(s) to get. They must be in the IDS format
                         (e.g. DT.DOD.DECT.CD). To get all available indicators, set
                         `indicators="all"`.

      :returns: A pandas dataframe with the requested data.


.. py:function:: get_dsa(update=False, local_path: str = None) -> pandas.DataFrame

   Extract DSA _data from the

   Extract the most recent Debt Sustainability Assessment (DSA) _data
   for PRGT-Eligible Countries from the IMF website.
   URL = https://www.imf.org/external/Pubs/ft/dsa/DSAlist.pdf

   :param local_path: where the downloaded PDF will be stored
   :param update: if True, updates the _data from the IMF website. Otherwise
                  it loads the _data from the local file. If a local file does not exist,
                  the _data will be extracted from the website.
   :type update: bool

   :returns: pandas dataframe with country, latest publication date, and risk of debt distress


.. py:function:: add_iso_codes_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'iso_code') -> pandas.DataFrame

   Add ISO3 column to a dataframe

   :param df: the dataframe to which the column will be added
   :param id_column: the column containing the name, ISO3, ISO2, DAC code, UN code, etc.
   :param id_type: the type of ID used in th id_column. The default 'regex' tries to infer
                   using the rules from the 'country_converter' package. For the DAC codes,
                   "DAC" must be passed.
   :param target_column: the column where the iso codes  will be stored.

   :returns: the original DataFrame with a new column containing ISO3 codes.
   :rtype: DataFrame


.. py:function:: add_income_level_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'income_level', update_data: bool = False) -> pandas.DataFrame

   Add an income levels column to a dataframe

   :param df: the dataframe to which the column will be added
   :param id_column: the column containing the name, ISO3, ISO2, DACcode, UN code, etc.
   :param id_type: the type of ID used in th id_column. The default 'regex' tries to infer
                   using the rules from the 'country_converter' package. For the DAC codes,
                   "DACcode" must be passed.
   :param target_column: the column where the income level _data will be stored.
   :param update_data: whether to update the underlying _data or not.

   :returns: the original DataFrame with a new column containing the income level _data.
   :rtype: DataFrame


.. py:function:: add_short_names_column(df: pandas.DataFrame, id_column: str, id_type: str | None = None, target_column: str = 'name_short') -> pandas.DataFrame

   Add short names column to a dataframe

   :param df: the dataframe to which the column will be added
   :param id_column: the column containing the name, ISO3, ISO2, DAC code, UN code, etc.
   :param id_type: the type of ID used in th id_column. The default 'regex' tries to infer
                   using the rules from the 'country_converter' package. For the DAC codes,
                   "DAC" must be passed.
   :param target_column: the column where the short names  will be stored.

   :returns: the original DataFrame with a new column containing short names.
   :rtype: DataFrame


.. py:function:: clean_number(number: str | pandas.Series, to: Type = float) -> float | int

   Clean a string and return as float or integer.
   When selecting to=int, the default python round behaviour is used.

   :param number: the string to clean
   :param to: the type to convert to (int or float)


.. py:function:: clean_numeric_series(data: pandas.Series | pandas.DataFrame, series_columns: str | list | None = None, to: Type = float) -> pandas.DataFrame | pandas.Series

   Clean a numeric column in a Pandas DataFrame or a Pandas Series which is
   meant to be numeric. When selecting to=int, the default python round behaviour
   is used.

   :param data: it accepts a series or a dataframe. If a dataframe is passed, the column(s)
                to clean must be specified
   :param series_columns: optionally declared (only when _data is a dataframe). To apply to
                          one or more columns.
   :param to: the type to convert to (int or float)


.. py:function:: to_date_column(series: pandas.Series, date_format: str | None = None) -> pandas.Series

   Converts a Pandas series into a date series.
   The series must contain integers or strings that can be converted into
   datetime objects


.. py:function:: convert_id(series: pandas.Series, from_type: str = 'regex', to_type: str = 'ISO3', not_found: str | None = None, *, additional_mapping: dict = None) -> pandas.Series

   Takes a Pandas' series with country IDs and converts them into the desired type.

   :param series: the Pandas series to convert
   :param from_type: the classification type according to which the series is encoded.
                     Available types come from the country_converter package
                     (https://github.com/konstantinstadler/country_converter#classification-schemes)
                     For example: ISO3, ISO2, name_short, DACcode, etc.
   :param to_type: the target classification type. Same options as from_type
   :param not_found: what to do if the value is not found. Can pass a string or None.
                     If None, the original value is passed through.
   :param additional_mapping: Optionally, a dictionary with additional mappings can be used.
                              The keys are the values to be converted and the values are the converted values.
                              The keys follow the same datatype as the original values. The values must follow
                              the same datatype as the target type.


.. py:function:: date_to_str(series: pandas.Series, date_format: str = '%d %B %Y') -> pandas.Series

   Converts a Pandas' series into a string series.

   :param series: the Pandas series to convert to a formatted date string
   :param date_format: the format to use for the date string. The default is "%d %B %Y"


.. py:function:: format_number(series: pandas.Series, as_units: bool = False, as_percentage: bool = False, as_millions: bool = False, as_billions: bool = False, decimals: int = 2, add_sign: bool = False, other_format: str = '{:,.2f}') -> pandas.Series

   Formats a Pandas' numeric series into a formatted string series.

   :param series: the series to convert to a formatted string
   :param as_units: formatted with commas to separate thousands and the specified decimals
   :param as_percentage: formatted as a percentage with the specified decimals. This assumes
                         that the series contains numbers where 1 would equal 100%.
   :param as_millions: divided by 1 million, formatted with commas and the specified decimals
   :param as_billions: divided by 1 billion, formatted with commas and the specified decimals
   :param decimals: the number of decimals to use
   :param add_sign: add a plus sign to positive numbers
   :param other_format: Other formats to use. This option can only be used if all others
                        are false. Examples are available at:
                        https://mkaz.blog/code/python-string-format-cookbook/


.. py:function:: filter_by_continent(df: pandas.DataFrame, continent: str, id_column: str = 'iso_code', id_type: str = 'regex') -> pandas.DataFrame

   Filter a DataFrame by continent.
   :param df: the DataFrame to filter
   :param continent: the continent to filter by (e.g. "Africa", "Europe", "EU")
   :param id_column: the name of the column to use for the id (default: "iso_code")
   :param id_type: the type of id to use (default: "regex")

   :returns: A filtered copy of the DataFrame.


.. py:function:: filter_by_un_region(df: pandas.DataFrame, region: str, id_column: str = 'iso_code', id_type: str = 'regex') -> pandas.DataFrame

   Filter a DataFrame by UN region. This includes, for example, "Western Africa",
   "Eastern Africa", "Southern Asia", "Northern America", "Central America", "Eastern Asia".

   :param df: the DataFrame to filter
   :param region: the region to filter by (e.g. "Western Africa", "Eastern Africa", etc.)
   :param id_column: the name of the column to use for the id (default: "iso_code")
   :param id_type: the type of id to use (default: "regex")

   Returns:


.. py:function:: filter_eu_countries(df: pandas.DataFrame, id_column: str = 'iso_code', id_type: str = 'regex') -> pandas.DataFrame

   Filter a DataFrame to keep only European countries. The current list of members
   of the European Union is always used.

   :param df: the DataFrame to filter
   :param id_column: the name of the column to use for the id (default: "iso_code")
   :param id_type: the type of id to use (default: "regex")

   :returns: A filtered copy of the DataFrame.


.. py:function:: filter_african_countries(df: pandas.DataFrame, id_column: str = 'iso_code', id_type: str = 'regex') -> pandas.DataFrame

   Filter a DataFrame to keep only African countries.
   :param df: the DataFrame to filter
   :param id_column: the name of the column to use for the id (default: "iso_code")
   :param id_type: the type of id to use (default: "regex")

   :returns: A filtered copy of the DataFrame.


.. py:function:: filter_latest_by(data: pandas.DataFrame, date_column: str, value_columns: str | list | None = None, group_by: str | list | None = None) -> pandas.DataFrame

   Calculate the latest value of (a) column(s) over a period of time.

   :param data: a DataFrame with a date column (datetime or int) and one or more numeric columns
   :param date_column: the name of the date (datetime or int) column
   :param value_columns: one or more columns to calculate the average over
   :param group_by: Optionally, specify which columns to consider for the latest operation

   :returns: A DataFrame with the latest value of the specified columns


.. py:function:: set_bblocks_data_path(path)