{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# GETTSIM Personas: Example Households for Tax-Transfer Analysis\n",
    "\n",
    "[GETTSIM personas](https://github.com/ttsim-dev/gettsim-personas) provide example data\n",
    "to use with GETTSIM. The personas depict specific household structures and provide input\n",
    "data and tax-transfer targets for a given policy date.\n",
    "\n",
    "Personas are helpful if you are interested in exploring how a specific part of the\n",
    "tax-transfer system works (e.g. the income tax) using example data. As the input data\n",
    "provided by a persona can be overridden, you can easily vary GETTSIM's inputs and explore\n",
    "how this affects the results.\n",
    "\n",
    "Even if you already have some data at hand, personas are a great way to find out how to\n",
    "prepare it for using it with GETTSIM. Currently, there are almost 100 input columns\n",
    "necessary to compute current taxes and transfers covered by GETTSIM from the very basic\n",
    "state variables, so finding out which of them are important and which are not is crucial\n",
    "in any application using real data. If a persona exists that corresponds to your use\n",
    "case, it provides a minimal set of input data, overriding nodes of the tax-transfer\n",
    "system that are probably not relevant for your use case (e.g. the calculation of pension\n",
    "benefits when you're interested in the income tax).\n",
    "\n",
    "If no existing persona corresponds to your use case, feel free to open an\n",
    "[issue](https://github.com/ttsim-dev/gettsim-personas/issues) or\n",
    "[make a contribution](https://gettsim.readthedocs.io/en/stable/gettsim_developer/how-to-contribute.html)!\n",
    "\n",
    "## Installation\n",
    "\n",
    "[GETTSIM personas](https://github.com/ttsim-dev/gettsim-personas) is a separate package\n",
    "that is not included in the main GETTSIM package. To install it, run:\n",
    "\n",
    "```bash\n",
    "pip install git+https://github.com/ttsim-dev/gettsim-personas.git\n",
    "```\n",
    "\n",
    "Alternatively, clone the repository and install the package from the local directory:\n",
    "\n",
    "```bash\n",
    "git clone https://github.com/ttsim-dev/gettsim-personas.git\n",
    "cd gettsim-personas\n",
    "pip install -e .\n",
    "```\n",
    "\n",
    "## Basic example\n",
    "\n",
    "We first show the simplest case of loading a persona and running GETTSIM on it.\n",
    "\n",
    "Personas must be instantiated with a `policy_date`, as their content varies depending on\n",
    "the policy environment at that date.\n",
    "\n",
    "We start by importing the module with personas relevant to calculating net income for\n",
    "working-age people who do not take up (or qualify for) any means- or health-tested\n",
    "transfers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from gettsim_personas import einkommensteuer_sozialabgaben\n",
    "\n",
    "example_persona = einkommensteuer_sozialabgaben.Couple1Child(\n",
    "    policy_date_str=\"2025-01-01\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "`example_persona` is a `Persona` object with the following attributes, all of which can\n",
    "be passed directly to  GETTSIM's `main` function:\n",
    "\n",
    "- `description`: A description of the persona. Use this to check if the persona is\n",
    "  suitable for your use case. (A proper documentation of the available personas is not\n",
    "  yet implemented;\n",
    "  [contributions are welcome!](https://github.com/ttsim-dev/gettsim-personas/issues/9))\n",
    "- `policy_date`: The policy date of this persona, which is just the required input. The\n",
    "  policy date reflects the policy environment for which the persona was created.\n",
    "- `evaluation_date`: The evaluation date of this persona, i.e. the date at which taxes\n",
    "  and transfers should be computed. This is almost always the same as the policy date\n",
    "  and the two will coincide if you do not provide `evaluation_date`. However,\n",
    "  differentiating between the two can be useful if you need to calculate some of the\n",
    "  persona's input data columns endogenously, e.g. the age at retirement.\n",
    "- `input_data_tree`: The input data tree of this persona.\n",
    "- `tt_targets_tree`: The targets that can be computed for this persona.\n",
    "\n",
    "> Be careful when using personas outside their intended context. Many personas overwrite\n",
    "> GETTSIM's policy functions. Always check whether a persona is suitable for your use\n",
    "> case. Before using a persona, review its `description` field to ensure it fits your\n",
    "> needs. For example, the personas in the `einkommensteuer_sozialabgaben` package are\n",
    "> not suited to compute disposable income of low-income households, because the persona\n",
    "> inputs are set up in a way that all means- and health-tested transfers are assumed not\n",
    "> to be taken up (while in reality, some of those transfers have very high take-up rates\n",
    "> among low-income households).\n",
    "\n",
    "You can now compute taxes and transfers for the selected persona:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from gettsim import InputData, MainTarget, TTTargets, main\n",
    "\n",
    "result = main(\n",
    "    main_target=MainTarget.results.df_with_nested_columns,\n",
    "    policy_date=example_persona.policy_date,\n",
    "    input_data=InputData.tree(example_persona.input_data_tree),\n",
    "    tt_targets=TTTargets.tree(example_persona.tt_targets_tree),\n",
    "    include_warn_nodes=False,\n",
    ")\n",
    "\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "### Optional Arguments When Instantiating a Persona\n",
    "\n",
    "You can provide a grid of earnings levels to compute taxes and transfers across\n",
    "different earnings levels. For example, in our example persona, both parents have gross\n",
    "earnings of 3,000 Euro per month each. \n",
    "\n",
    "Now, we're interested in how the gross income of the secondary earner affects the\n",
    "household's disposable income. We assume both parents earn the same hourly wage\n",
    "regardless of their weekly working hours. At full-time work, this amounts to 4,000 Euro\n",
    "per month; we set the gross earnings of the primary earner to that amount. We now want\n",
    "to vary the secondary earner's earnings as he changes his weekly hours between 25% and\n",
    "75% of full-time work.\n",
    "\n",
    "To do this, create a `LinspaceGrid` object that specifies gross earnings (either as a\n",
    "constant or as a range) for each `p_id` in the original persona. Here, we set the income\n",
    "of the primary earner (`p_id=0` in the original persona) to 4,000 Euro and the income of\n",
    "the child (`p_id=2`) to 0. For the secondary earner (`p_id=1`), we create a\n",
    "`LinspaceRange` object that specifies the earnings range between 1,000 and 3,000 Euro\n",
    "per month."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from gettsim_personas import einkommensteuer_sozialabgaben\n",
    "\n",
    "persona_with_varying_income_of_secondary_earner = einkommensteuer_sozialabgaben.Couple1Child(\n",
    "    policy_date_str=\"2025-01-01\",\n",
    "    bruttolohn_m_linspace_grid=einkommensteuer_sozialabgaben.Couple1Child.LinspaceGrid(\n",
    "        p0=4000,\n",
    "        p1=einkommensteuer_sozialabgaben.Couple1Child.LinspaceRange(\n",
    "            bottom=1_000, top=3_000\n",
    "        ),\n",
    "        p2=0,\n",
    "        n_points=5,\n",
    "    ),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "Looking at the `(\"einnahmen\", \"bruttolohn_m\")` column, we see that the secondary\n",
    "earner's earnings are varied between 1,000 and 3,000 Euro per month in steps of 500\n",
    "Euro, while the primary earner's earnings are fixed at 4,000 Euro per month and the\n",
    "child's earnings are 0:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "persona_with_varying_income_of_secondary_earner.input_data_tree[\"einnahmen\"][\n",
    "    \"bruttolohn_m\"\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "result_varying_income_of_secondary_earner = main(\n",
    "    main_target=MainTarget.results.df_with_nested_columns,\n",
    "    policy_date=persona_with_varying_income_of_secondary_earner.policy_date,\n",
    "    input_data=InputData.tree(\n",
    "        persona_with_varying_income_of_secondary_earner.input_data_tree\n",
    "    ),\n",
    "    tt_targets=TTTargets.tree(\n",
    "        persona_with_varying_income_of_secondary_earner.tt_targets_tree\n",
    "    ),\n",
    "    include_warn_nodes=False,\n",
    ")\n",
    "\n",
    "# Set a nice index to make the structure clear.\n",
    "tmp = result_varying_income_of_secondary_earner.reset_index().set_index(\n",
    "    [(\"hh_id\", pd.NA, pd.NA, pd.NA), \"p_id\"]\n",
    ")\n",
    "tmp.index.names = [\"hh_id\", \"p_id\"]\n",
    "tmp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "You can also specify LinSpaceRanges for multiple `p_id`s at once. If you want to vary\n",
    "anything else than `bruttolohn_m`, use the upserting mechanism described below. \n",
    "\n",
    "Personas support different evaluation dates. Personas fix some input columns to constant\n",
    "values (e.g. age at 30 years) and calculate the values of other input columns (e.g.\n",
    "birth year as evaluation date minus 30 years). In most cases, this will not matter much,\n",
    "but for many pension-related elements of taxes and transfers there often are birth\n",
    "year-based rules. If no evaluation date is provided, the policy date is used by default."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_persona_with_evaluation_date = einkommensteuer_sozialabgaben.Couple1Child(\n",
    "    policy_date_str=\"2025-01-01\",\n",
    "    evaluation_date_str=\"2026-01-01\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "### Advanced Usage: Upserting Input Data\n",
    "\n",
    "You can also vary persona input data across dimensions other than earnings. `Persona`\n",
    "objects have a method called `upsert_input_data` that creates a new `Persona` and lets\n",
    "you modify any dimension of its input data, while preserving the household structure of\n",
    "the original persona.\n",
    "\n",
    "> Upserting input data is only possible when the length of the user-provided\n",
    "> data is a multiple of the length of the persona's input data.\n",
    "\n",
    "Suppose you are interested in households that receive basic subsistence benefits for the\n",
    "unemployed (Bürgergeld, formerly known as Arbeitslosengeld 2). You want to vary their\n",
    "benefit entitlement by changing their gross rent excluding dwelling costs (a GETTSIM\n",
    "input variable).\n",
    "\n",
    "First, instantiate the base persona:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "from gettsim_personas import grundsicherung_für_erwerbsfähige\n",
    "\n",
    "basic_subsistence_benefit_persona = grundsicherung_für_erwerbsfähige.Couple1Child(\n",
    "    policy_date_str=\"2025-01-01\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "Next, define the input data to upsert:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "rent_to_upsert = {\n",
    "    \"wohnen\": {\"bruttokaltmiete_m_hh\": [600.0, 600.0, 600.0, 800.0, 800.0, 800.0]}\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "Alternatively, you can generate a range of rent levels using `numpy.linspace`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "rent_to_upsert = {\n",
    "    \"wohnen\": {\n",
    "        \"bruttokaltmiete_m_hh\": np.array(\n",
    "            [x for i in np.linspace(300, 1800, 601) for x in [i, i, i]]\n",
    "        ),\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "> **The order of input data matters!** GETTSIM uses pointers to `p_id`s in the input\n",
    "> data to depict household structures and relationships between individuals. In general,\n",
    "> `[0.0, 0.0, 4000.0, 0.0, 0.0, 6000.0]` will yield completely different results than\n",
    "> `[4000.0, 0.0, 0.0, 6000.0, 0.0, 0.0]`. Always check the persona's household structure\n",
    "> carefully before modifying input data. In the example above, inputs are on the\n",
    "> household level, so every household member should have the same value for\n",
    "> `bruttokaltmiete_m_hh`.\n",
    "\n",
    "Now, we create a new persona object based on the original one, but with the modified\n",
    "input data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "persona_with_varying_rent = basic_subsistence_benefit_persona.upsert_input_data(\n",
    "    input_data_to_upsert=rent_to_upsert,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "The new persona can then be used to compute taxes and transfers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "result_varying_rent = main(\n",
    "    main_target=MainTarget.results.df_with_nested_columns,\n",
    "    policy_date=persona_with_varying_rent.policy_date,\n",
    "    input_data=InputData.tree(persona_with_varying_rent.input_data_tree),\n",
    "    tt_targets=TTTargets.tree(persona_with_varying_rent.tt_targets_tree),\n",
    "    include_warn_nodes=False,\n",
    ")\n",
    "\n",
    "# Set a nice index to make the structure clear.\n",
    "tmp = result_varying_rent.reset_index().set_index(\n",
    "    [(\"hh_id\", pd.NA, pd.NA, pd.NA), \"p_id\"]\n",
    ")\n",
    "tmp.index.names = [\"hh_id\", \"p_id\"]\n",
    "tmp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}