GETTSIM Personas: Example Households for Tax-Transfer Analysis#

GETTSIM personas provide example data to use with GETTSIM. The personas depict specific household structures and provide input data and tax-transfer targets for a given policy date.

Personas are helpful if you are interested in exploring how a specific part of the tax-transfer system works (e.g. the income tax) using example data. As the input data provided by a persona can be overridden, you can easily vary GETTSIM’s inputs and explore how this affects the results.

Even if you already have some data at hand, personas are a great way to find out how to prepare it for using it with GETTSIM. Currently, there are almost 100 input columns necessary to compute current taxes and transfers covered by GETTSIM from the very basic state variables, so finding out which of them are important and which are not is crucial in any application using real data. If a persona exists that corresponds to your use case, it provides a minimal set of input data, overriding nodes of the tax-transfer system that are probably not relevant for your use case (e.g. the calculation of pension benefits when you’re interested in the income tax).

If no existing persona corresponds to your use case, feel free to open an issue or make a contribution!

Installation#

GETTSIM personas is a separate package that is not included in the main GETTSIM package. To install it, run:

pip install git+https://github.com/ttsim-dev/gettsim-personas.git

Alternatively, clone the repository and install the package from the local directory:

git clone https://github.com/ttsim-dev/gettsim-personas.git
cd gettsim-personas
pip install -e .

Basic example#

We first show the simplest case of loading a persona and running GETTSIM on it.

Personas must be instantiated with a policy_date, as their content varies depending on the policy environment at that date.

We start by importing the module with personas relevant to calculating net income for working-age people who do not take up (or qualify for) any means- or health-tested transfers:

[1]:

import pandas as pd
from gettsim_personas import einkommensteuer_sozialabgaben

example_persona = einkommensteuer_sozialabgaben.Couple1Child(
    policy_date_str="2025-01-01"
)

example_persona is a Persona object with the following attributes, all of which can be passed directly to GETTSIM’s main function:

description: A description of the persona. Use this to check if the persona is suitable for your use case. (A proper documentation of the available personas is not yet implemented; contributions are welcome!)
policy_date: The policy date of this persona, which is just the required input. The policy date reflects the policy environment for which the persona was created.
evaluation_date: The evaluation date of this persona, i.e. the date at which taxes and transfers should be computed. This is almost always the same as the policy date and the two will coincide if you do not provide evaluation_date. However, differentiating between the two can be useful if you need to calculate some of the persona’s input data columns endogenously, e.g. the age at retirement.
input_data_tree: The input data tree of this persona.
tt_targets_tree: The targets that can be computed for this persona.

Be careful when using personas outside their intended context. Many personas overwrite GETTSIM’s policy functions. Always check whether a persona is suitable for your use case. Before using a persona, review its description field to ensure it fits your needs. For example, the personas in the einkommensteuer_sozialabgaben package are not suited to compute disposable income of low-income households, because the persona inputs are set up in a way that all means- and health-tested transfers are assumed not to be taken up (while in reality, some of those transfers have very high take-up rates among low-income households).

You can now compute taxes and transfers for the selected persona:

[2]:

from gettsim import InputData, MainTarget, TTTargets, main

result = main(
    main_target=MainTarget.results.df_with_nested_columns,
    policy_date=example_persona.policy_date,
    input_data=InputData.tree(example_persona.input_data_tree),
    tt_targets=TTTargets.tree(example_persona.tt_targets_tree),
    include_warn_nodes=False,
)

result

[2]:

	einkommensteuer	kindergeld	sozialversicherung
	betrag_y_sn	betrag_y	arbeitslosen	kranken	pflege	rente
	NaN	NaN	beitrag	beitrag	beitrag	beitrag
	NaN	NaN	betrag_versicherter_y	betrag_versicherter_y	betrag_versicherter_y	betrag_versicherter_y
p_id
0	7120.0	3060	468.0	3078.0	648.0	3348.0
1	7120.0	0	468.0	3078.0	648.0	3348.0
2	0.0	0	0.0	0.0	0.0	0.0

Optional Arguments When Instantiating a Persona#

You can provide a grid of earnings levels to compute taxes and transfers across different earnings levels. For example, in our example persona, both parents have gross earnings of 3,000 Euro per month each.

Now, we’re interested in how the gross income of the secondary earner affects the household’s disposable income. We assume both parents earn the same hourly wage regardless of their weekly working hours. At full-time work, this amounts to 4,000 Euro per month; we set the gross earnings of the primary earner to that amount. We now want to vary the secondary earner’s earnings as he changes his weekly hours between 25% and 75% of full-time work.

To do this, create a LinspaceGrid object that specifies gross earnings (either as a constant or as a range) for each p_id in the original persona. Here, we set the income of the primary earner (p_id=0 in the original persona) to 4,000 Euro and the income of the child (p_id=2) to 0. For the secondary earner (p_id=1), we create a LinspaceRange object that specifies the earnings range between 1,000 and 3,000 Euro per month.

[3]:

from gettsim_personas import einkommensteuer_sozialabgaben

persona_with_varying_income_of_secondary_earner = einkommensteuer_sozialabgaben.Couple1Child(
    policy_date_str="2025-01-01",
    bruttolohn_m_linspace_grid=einkommensteuer_sozialabgaben.Couple1Child.LinspaceGrid(
        p0=4000,
        p1=einkommensteuer_sozialabgaben.Couple1Child.LinspaceRange(
            bottom=1_000, top=3_000
        ),
        p2=0,
        n_points=5,
    ),
)

Looking at the ("einnahmen", "bruttolohn_m") column, we see that the secondary earner’s earnings are varied between 1,000 and 3,000 Euro per month in steps of 500 Euro, while the primary earner’s earnings are fixed at 4,000 Euro per month and the child’s earnings are 0:

[4]:

persona_with_varying_income_of_secondary_earner.input_data_tree["einnahmen"][
    "bruttolohn_m"
]

[4]:

array([4000., 1000.,    0., 4000., 1500.,    0., 4000., 2000.,    0.,
       4000., 2500.,    0., 4000., 3000.,    0.])

[5]:

result_varying_income_of_secondary_earner = main(
    main_target=MainTarget.results.df_with_nested_columns,
    policy_date=persona_with_varying_income_of_secondary_earner.policy_date,
    input_data=InputData.tree(
        persona_with_varying_income_of_secondary_earner.input_data_tree
    ),
    tt_targets=TTTargets.tree(
        persona_with_varying_income_of_secondary_earner.tt_targets_tree
    ),
    include_warn_nodes=False,
)

# Set a nice index to make the structure clear.
tmp = result_varying_income_of_secondary_earner.reset_index().set_index(
    [("hh_id", pd.NA, pd.NA, pd.NA), "p_id"]
)
tmp.index.names = ["hh_id", "p_id"]
tmp

[5]:

		einkommensteuer	kindergeld	sozialversicherung
		betrag_y_sn	betrag_y	arbeitslosen	kranken	pflege	rente
		NaN	NaN	beitrag	beitrag	beitrag	beitrag
		NaN	NaN	betrag_versicherter_y	betrag_versicherter_y	betrag_versicherter_y	betrag_versicherter_y
hh_id	p_id
0	0	4774.0	3060	624.000000	4104.000000	864.000000	4464.000000
	1	4774.0	0	95.933518	630.947368	132.831025	686.293629
	2	0.0	0	0.000000	0.000000	0.000000	0.000000
1	3	5930.0	3060	624.000000	4104.000000	864.000000	4464.000000
	4	5930.0	0	203.966759	1341.473684	282.415512	1459.146814
	5	0.0	0	0.000000	0.000000	0.000000	0.000000
2	6	7120.0	3060	624.000000	4104.000000	864.000000	4464.000000
	7	7120.0	0	312.000000	2052.000000	432.000000	2232.000000
	8	0.0	0	0.000000	0.000000	0.000000	0.000000
3	9	8470.0	3060	624.000000	4104.000000	864.000000	4464.000000
	10	8470.0	0	390.000000	2565.000000	540.000000	2790.000000
	11	0.0	0	0.000000	0.000000	0.000000	0.000000
4	12	9862.0	3060	624.000000	4104.000000	864.000000	4464.000000
	13	9862.0	0	468.000000	3078.000000	648.000000	3348.000000
	14	0.0	0	0.000000	0.000000	0.000000	0.000000

You can also specify LinSpaceRanges for multiple p_ids at once. If you want to vary anything else than bruttolohn_m, use the upserting mechanism described below.

Personas support different evaluation dates. Personas fix some input columns to constant values (e.g. age at 30 years) and calculate the values of other input columns (e.g. birth year as evaluation date minus 30 years). In most cases, this will not matter much, but for many pension-related elements of taxes and transfers there often are birth year-based rules. If no evaluation date is provided, the policy date is used by default.

[6]:

example_persona_with_evaluation_date = einkommensteuer_sozialabgaben.Couple1Child(
    policy_date_str="2025-01-01",
    evaluation_date_str="2026-01-01",
)

Advanced Usage: Upserting Input Data#

You can also vary persona input data across dimensions other than earnings. Persona objects have a method called upsert_input_data that creates a new Persona and lets you modify any dimension of its input data, while preserving the household structure of the original persona.

Upserting input data is only possible when the length of the user-provided data is a multiple of the length of the persona’s input data.

Suppose you are interested in households that receive basic subsistence benefits for the unemployed (Bürgergeld, formerly known as Arbeitslosengeld 2). You want to vary their benefit entitlement by changing their gross rent excluding dwelling costs (a GETTSIM input variable).

First, instantiate the base persona:

[7]:

from gettsim_personas import grundsicherung_für_erwerbsfähige

basic_subsistence_benefit_persona = grundsicherung_für_erwerbsfähige.Couple1Child(
    policy_date_str="2025-01-01"
)

Next, define the input data to upsert:

[8]:

rent_to_upsert = {
    "wohnen": {"bruttokaltmiete_m_hh": [600.0, 600.0, 600.0, 800.0, 800.0, 800.0]}
}

Alternatively, you can generate a range of rent levels using numpy.linspace:

[9]:

import numpy as np

rent_to_upsert = {
    "wohnen": {
        "bruttokaltmiete_m_hh": np.array(
            [x for i in np.linspace(300, 1800, 601) for x in [i, i, i]]
        ),
    }
}

The order of input data matters! GETTSIM uses pointers to p_ids in the input data to depict household structures and relationships between individuals. In general, [0.0, 0.0, 4000.0, 0.0, 0.0, 6000.0] will yield completely different results than [4000.0, 0.0, 0.0, 6000.0, 0.0, 0.0]. Always check the persona’s household structure carefully before modifying input data. In the example above, inputs are on the household level, so every household member should have the same value for bruttokaltmiete_m_hh.

Now, we create a new persona object based on the original one, but with the modified input data:

[10]:

persona_with_varying_rent = basic_subsistence_benefit_persona.upsert_input_data(
    input_data_to_upsert=rent_to_upsert,
)

The new persona can then be used to compute taxes and transfers:

[11]:

result_varying_rent = main(
    main_target=MainTarget.results.df_with_nested_columns,
    policy_date=persona_with_varying_rent.policy_date,
    input_data=InputData.tree(persona_with_varying_rent.input_data_tree),
    tt_targets=TTTargets.tree(persona_with_varying_rent.tt_targets_tree),
    include_warn_nodes=False,
)

# Set a nice index to make the structure clear.
tmp = result_varying_rent.reset_index().set_index(
    [("hh_id", pd.NA, pd.NA, pd.NA), "p_id"]
)
tmp.index.names = ["hh_id", "p_id"]
tmp

[11]:

		bürgergeld	einkommensteuer	kindergeld	kinderzuschlag	sozialversicherung					wohngeld
		betrag_m_bg	betrag_m_sn	betrag_m	betrag_m_bg	arbeitslosen		kranken	pflege	rente	betrag_m_wthh
		NaN	NaN	NaN	NaN	beitrag	betrag_m	beitrag	beitrag	beitrag	NaN
		NaN	NaN	NaN	NaN	betrag_versicherter_y	NaN	betrag_versicherter_y	betrag_versicherter_y	betrag_versicherter_y	NaN
hh_id	p_id
0	0	978.833795	0.0	255	0.0	95.933518	0.0	630.947368	132.831025	686.293629	0.0
	1	978.833795	0.0	0	0.0	0.000000	0.0	0.000000	0.000000	0.000000	0.0
	2	978.833795	0.0	0	0.0	0.000000	0.0	0.000000	0.000000	0.000000	0.0
1	3	981.333795	0.0	255	0.0	95.933518	0.0	630.947368	132.831025	686.293629	0.0
1	4	981.333795	0.0	0	0.0	0.000000	0.0	0.000000	0.000000	0.000000	0.0
...	...	...	...	...	...	...	...	...	...	...	...
599	1798	2476.333795	0.0	0	0.0	0.000000	0.0	0.000000	0.000000	0.000000	0.0
599	1799	2476.333795	0.0	0	0.0	0.000000	0.0	0.000000	0.000000	0.000000	0.0
600	1800	2478.833795	0.0	255	0.0	95.933518	0.0	630.947368	132.831025	686.293629	0.0
	1801	2478.833795	0.0	0	0.0	0.000000	0.0	0.000000	0.000000	0.000000	0.0
	1802	2478.833795	0.0	0	0.0	0.000000	0.0	0.000000	0.000000	0.000000	0.0

1803 rows × 10 columns

[ ]:

Previous topic

Next topic

GETTSIM Personas: Example Households for Tax-Transfer Analysis#

Installation#

Basic example#

Optional Arguments When Instantiating a Persona#

Advanced Usage: Upserting Input Data#

This Page