ABM generation of a synthetic population

Activity-based models describe demand at the level of individual persons, not person groups as in conventional models. A data set of households and persons is therefore required that represents the entire population in the modeling region.

The input is the result of a generally national random sample household survey with travel diaries and various statistics on personal and household attributes at different aggregation levels (User Manual: ABM population synthesizer), for example:

Number of persons per location
Age distribution per traffic zone
Number of cars per zip code area
Number of employees per municipality

Similar to aggregated demand models, the household survey does not have to originate from the planning area. It will be adjusted accordingly anyway with the help of the socio-economic data of the planning area. The spatial resolution at which the socio-economic data is used is not predefined. It depends on the data situation and may be available at different levels.

The procedure selects suitable households from the sample for each location in such a way that all statistics are as accurate as possible and deviate as little as possible from the original sample population. The selected households are "cloned" into the location together with their household members and, if applicable, their tours.

The procedure works its way from the highest level of aggregation to the most detailed level. In each stage, the population of the previous stage is distributed as appropriately as possible to the objects of the current stage. In the example mentioned, a total population is first generated for the planning area, which is then suitably distributed to the municipalities, then for each municipality to the zip code areas below, from there to the traffic zones, and finally to the locations. The base households from which the synthetic population is to be generated must exist in the model, including the associated persons and, if applicable, their tours. The activity location where these households are located is irrelevant, as the final location is only determined during the procedure.

In the model, the statistical key data is kept at locations and surface objects, i.e. at (main) zones, territories, and/or surface POIs. These objects form a hierarchy. The allocation of an object to an object above it in the hierarchy is either defined in the data model - for example, zones have an allocation to their main zone - or it is calculated geometrically, such as the allocation of locations to a territory. Locations form the lowest level in the hierarchy. There must always be at least one constraint for them, usually the number of persons or number of households.

The constraints are sorted according to their importance: The location condition is at the top, and at the same time the most important. The algorithm tries to adhere to these as far as possible. The less important a condition is, the more likely it is that a violation will be accepted in favor of more important conditions. With the exception of the first condition, the other conditions - and therefore their importance - can be shifted in any order. In particular, they do not have to follow the hierarchy of their geographic levels.

When defining the constraints, the target value, an attribute at the objects of a geographic level, is compared with a household attribute or a household formula. The resulting condition is: The sum of the household attribute across all households within a geographical object should (as far as possible) be identical to the target value, i.e. the attribute value of this object.

Target and household values are always integers. Otherwise they are rounded before calculation.