GETTSIM Personas: Example Households for Tax-Transfer Analysis#
GETTSIM personas provide example data to use with GETTSIM. The personas depict specific household structures and provide input data and tax-transfer targets for a given policy date.
Personas are helpful if you are interested in exploring how a specific part of the tax-transfer system works (e.g. the income tax) using example data. As the input data provided by a persona can be overridden, you can easily vary GETTSIM’s inputs and explore how this affects the results.
Even if you already have some data at hand, personas are a great way to find out how to prepare it for using it with GETTSIM. Currently, there are almost 100 input columns necessary to compute current taxes and transfers covered by GETTSIM from the very basic state variables, so finding out which of them are important and which are not is crucial in any application using real data. If a persona exists that corresponds to your use case, it provides a minimal set of input data, overriding nodes of the tax-transfer system that are probably not relevant for your use case (e.g. the calculation of pension benefits when you’re interested in the income tax).
If no existing persona corresponds to your use case, feel free to open an issue or make a contribution!
Installation#
GETTSIM personas is a separate package that is not included in the main GETTSIM package. To install it, run:
pip install git+https://github.com/ttsim-dev/gettsim-personas.git
Alternatively, clone the repository and install the package from the local directory:
git clone https://github.com/ttsim-dev/gettsim-personas.git
cd gettsim-personas
pip install -e .
Basic example#
We first show the simplest case of loading a persona and running GETTSIM on it.
Personas must be instantiated with a policy_date
, as their content varies depending on the policy environment at that date.
We start by importing the module with personas relevant to calculating net income for working-age people who do not take up (or qualify for) any means- or health-tested transfers:
[1]:
import pandas as pd
from gettsim_personas import einkommensteuer_sozialabgaben
example_persona = einkommensteuer_sozialabgaben.Couple1Child(
policy_date_str="2025-01-01"
)
example_persona
is a Persona
object with the following attributes, all of which can be passed directly to GETTSIM’s main
function:
description
: A description of the persona. Use this to check if the persona is suitable for your use case. (A proper documentation of the available personas is not yet implemented; contributions are welcome!)policy_date
: The policy date of this persona, which is just the required input. The policy date reflects the policy environment for which the persona was created.evaluation_date
: The evaluation date of this persona, i.e. the date at which taxes and transfers should be computed. This is almost always the same as the policy date and the two will coincide if you do not provideevaluation_date
. However, differentiating between the two can be useful if you need to calculate some of the persona’s input data columns endogenously, e.g. the age at retirement.input_data_tree
: The input data tree of this persona.tt_targets_tree
: The targets that can be computed for this persona.
Be careful when using personas outside their intended context. Many personas overwrite GETTSIM’s policy functions. Always check whether a persona is suitable for your use case. Before using a persona, review its
description
field to ensure it fits your needs. For example, the personas in theeinkommensteuer_sozialabgaben
package are not suited to compute disposable income of low-income households, because the persona inputs are set up in a way that all means- and health-tested transfers are assumed not to be taken up (while in reality, some of those transfers have very high take-up rates among low-income households).
You can now compute taxes and transfers for the selected persona:
[2]:
from gettsim import InputData, MainTarget, TTTargets, main
result = main(
main_target=MainTarget.results.df_with_nested_columns,
policy_date=example_persona.policy_date,
input_data=InputData.tree(example_persona.input_data_tree),
tt_targets=TTTargets.tree(example_persona.tt_targets_tree),
include_warn_nodes=False,
)
result
[2]:
einkommensteuer | kindergeld | sozialversicherung | ||||
---|---|---|---|---|---|---|
betrag_y_sn | betrag_y | arbeitslosen | kranken | pflege | rente | |
NaN | NaN | beitrag | beitrag | beitrag | beitrag | |
NaN | NaN | betrag_versicherter_y | betrag_versicherter_y | betrag_versicherter_y | betrag_versicherter_y | |
p_id | ||||||
0 | 7120.0 | 3060 | 468.0 | 3078.0 | 648.0 | 3348.0 |
1 | 7120.0 | 0 | 468.0 | 3078.0 | 648.0 | 3348.0 |
2 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 |
Optional Arguments When Instantiating a Persona#
You can provide a grid of earnings levels to compute taxes and transfers across different earnings levels. For example, in our example persona, both parents have gross earnings of 3,000 Euro per month each.
Now, we’re interested in how the gross income of the secondary earner affects the household’s disposable income. We assume both parents earn the same hourly wage regardless of their weekly working hours. At full-time work, this amounts to 4,000 Euro per month; we set the gross earnings of the primary earner to that amount. We now want to vary the secondary earner’s earnings as he changes his weekly hours between 25% and 75% of full-time work.
To do this, create a LinspaceGrid
object that specifies gross earnings (either as a constant or as a range) for each p_id
in the original persona. Here, we set the income of the primary earner (p_id=0
in the original persona) to 4,000 Euro and the income of the child (p_id=2
) to 0. For the secondary earner (p_id=1
), we create a LinspaceRange
object that specifies the earnings range between 1,000 and 3,000 Euro per month.
[3]:
from gettsim_personas import einkommensteuer_sozialabgaben
persona_with_varying_income_of_secondary_earner = einkommensteuer_sozialabgaben.Couple1Child(
policy_date_str="2025-01-01",
bruttolohn_m_linspace_grid=einkommensteuer_sozialabgaben.Couple1Child.LinspaceGrid(
p0=4000,
p1=einkommensteuer_sozialabgaben.Couple1Child.LinspaceRange(
bottom=1_000, top=3_000
),
p2=0,
n_points=5,
),
)
Looking at the ("einnahmen", "bruttolohn_m")
column, we see that the secondary earner’s earnings are varied between 1,000 and 3,000 Euro per month in steps of 500 Euro, while the primary earner’s earnings are fixed at 4,000 Euro per month and the child’s earnings are 0:
[4]:
persona_with_varying_income_of_secondary_earner.input_data_tree["einnahmen"][
"bruttolohn_m"
]
[4]:
array([4000., 1000., 0., 4000., 1500., 0., 4000., 2000., 0.,
4000., 2500., 0., 4000., 3000., 0.])
[5]:
result_varying_income_of_secondary_earner = main(
main_target=MainTarget.results.df_with_nested_columns,
policy_date=persona_with_varying_income_of_secondary_earner.policy_date,
input_data=InputData.tree(
persona_with_varying_income_of_secondary_earner.input_data_tree
),
tt_targets=TTTargets.tree(
persona_with_varying_income_of_secondary_earner.tt_targets_tree
),
include_warn_nodes=False,
)
# Set a nice index to make the structure clear.
tmp = result_varying_income_of_secondary_earner.reset_index().set_index(
[("hh_id", pd.NA, pd.NA, pd.NA), "p_id"]
)
tmp.index.names = ["hh_id", "p_id"]
tmp
[5]:
einkommensteuer | kindergeld | sozialversicherung | |||||
---|---|---|---|---|---|---|---|
betrag_y_sn | betrag_y | arbeitslosen | kranken | pflege | rente | ||
NaN | NaN | beitrag | beitrag | beitrag | beitrag | ||
NaN | NaN | betrag_versicherter_y | betrag_versicherter_y | betrag_versicherter_y | betrag_versicherter_y | ||
hh_id | p_id | ||||||
0 | 0 | 4774.0 | 3060 | 624.000000 | 4104.000000 | 864.000000 | 4464.000000 |
1 | 4774.0 | 0 | 95.933518 | 630.947368 | 132.831025 | 686.293629 | |
2 | 0.0 | 0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
1 | 3 | 5930.0 | 3060 | 624.000000 | 4104.000000 | 864.000000 | 4464.000000 |
4 | 5930.0 | 0 | 203.966759 | 1341.473684 | 282.415512 | 1459.146814 | |
5 | 0.0 | 0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
2 | 6 | 7120.0 | 3060 | 624.000000 | 4104.000000 | 864.000000 | 4464.000000 |
7 | 7120.0 | 0 | 312.000000 | 2052.000000 | 432.000000 | 2232.000000 | |
8 | 0.0 | 0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
3 | 9 | 8470.0 | 3060 | 624.000000 | 4104.000000 | 864.000000 | 4464.000000 |
10 | 8470.0 | 0 | 390.000000 | 2565.000000 | 540.000000 | 2790.000000 | |
11 | 0.0 | 0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
4 | 12 | 9862.0 | 3060 | 624.000000 | 4104.000000 | 864.000000 | 4464.000000 |
13 | 9862.0 | 0 | 468.000000 | 3078.000000 | 648.000000 | 3348.000000 | |
14 | 0.0 | 0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
You can also specify LinSpaceRanges for multiple p_id
s at once. If you want to vary anything else than bruttolohn_m
, use the upserting mechanism described below.
Personas support different evaluation dates. Personas fix some input columns to constant values (e.g. age at 30 years) and calculate the values of other input columns (e.g. birth year as evaluation date minus 30 years). In most cases, this will not matter much, but for many pension-related elements of taxes and transfers there often are birth year-based rules. If no evaluation date is provided, the policy date is used by default.
[6]:
example_persona_with_evaluation_date = einkommensteuer_sozialabgaben.Couple1Child(
policy_date_str="2025-01-01",
evaluation_date_str="2026-01-01",
)
Advanced Usage: Upserting Input Data#
You can also vary persona input data across dimensions other than earnings. Persona
objects have a method called upsert_input_data
that creates a new Persona
and lets you modify any dimension of its input data, while preserving the household structure of the original persona.
Upserting input data is only possible when the length of the user-provided data is a multiple of the length of the persona’s input data.
Suppose you are interested in households that receive basic subsistence benefits for the unemployed (Bürgergeld, formerly known as Arbeitslosengeld 2). You want to vary their benefit entitlement by changing their gross rent excluding dwelling costs (a GETTSIM input variable).
First, instantiate the base persona:
[7]:
from gettsim_personas import grundsicherung_für_erwerbsfähige
basic_subsistence_benefit_persona = grundsicherung_für_erwerbsfähige.Couple1Child(
policy_date_str="2025-01-01"
)
Next, define the input data to upsert:
[8]:
rent_to_upsert = {
"wohnen": {"bruttokaltmiete_m_hh": [600.0, 600.0, 600.0, 800.0, 800.0, 800.0]}
}
Alternatively, you can generate a range of rent levels using numpy.linspace
:
[9]:
import numpy as np
rent_to_upsert = {
"wohnen": {
"bruttokaltmiete_m_hh": np.array(
[x for i in np.linspace(300, 1800, 601) for x in [i, i, i]]
),
}
}
The order of input data matters! GETTSIM uses pointers to
p_id
s in the input data to depict household structures and relationships between individuals. In general,[0.0, 0.0, 4000.0, 0.0, 0.0, 6000.0]
will yield completely different results than[4000.0, 0.0, 0.0, 6000.0, 0.0, 0.0]
. Always check the persona’s household structure carefully before modifying input data. In the example above, inputs are on the household level, so every household member should have the same value forbruttokaltmiete_m_hh
.
Now, we create a new persona object based on the original one, but with the modified input data:
[10]:
persona_with_varying_rent = basic_subsistence_benefit_persona.upsert_input_data(
input_data_to_upsert=rent_to_upsert,
)
The new persona can then be used to compute taxes and transfers:
[11]:
result_varying_rent = main(
main_target=MainTarget.results.df_with_nested_columns,
policy_date=persona_with_varying_rent.policy_date,
input_data=InputData.tree(persona_with_varying_rent.input_data_tree),
tt_targets=TTTargets.tree(persona_with_varying_rent.tt_targets_tree),
include_warn_nodes=False,
)
# Set a nice index to make the structure clear.
tmp = result_varying_rent.reset_index().set_index(
[("hh_id", pd.NA, pd.NA, pd.NA), "p_id"]
)
tmp.index.names = ["hh_id", "p_id"]
tmp
[11]:
bürgergeld | einkommensteuer | kindergeld | kinderzuschlag | sozialversicherung | wohngeld | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
betrag_m_bg | betrag_m_sn | betrag_m | betrag_m_bg | arbeitslosen | kranken | pflege | rente | betrag_m_wthh | |||
NaN | NaN | NaN | NaN | beitrag | betrag_m | beitrag | beitrag | beitrag | NaN | ||
NaN | NaN | NaN | NaN | betrag_versicherter_y | NaN | betrag_versicherter_y | betrag_versicherter_y | betrag_versicherter_y | NaN | ||
hh_id | p_id | ||||||||||
0 | 0 | 978.833795 | 0.0 | 255 | 0.0 | 95.933518 | 0.0 | 630.947368 | 132.831025 | 686.293629 | 0.0 |
1 | 978.833795 | 0.0 | 0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | |
2 | 978.833795 | 0.0 | 0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | |
1 | 3 | 981.333795 | 0.0 | 255 | 0.0 | 95.933518 | 0.0 | 630.947368 | 132.831025 | 686.293629 | 0.0 |
4 | 981.333795 | 0.0 | 0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
599 | 1798 | 2476.333795 | 0.0 | 0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 |
1799 | 2476.333795 | 0.0 | 0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | |
600 | 1800 | 2478.833795 | 0.0 | 255 | 0.0 | 95.933518 | 0.0 | 630.947368 | 132.831025 | 686.293629 | 0.0 |
1801 | 2478.833795 | 0.0 | 0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | |
1802 | 2478.833795 | 0.0 | 0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 |
1803 rows × 10 columns
[ ]: