The Hidden Complexities in Fachlichkeit

I want to propose that there are two different kinds of Fachlichkeit; I encountered those in two recent projects. These two kinds differ with regards to the complexities in the domain. I don’t yet have catchy names for the two categories, so if you have suggestions, please let me hear them :-)

Language 1: Medical Algorithms

The first project is the DSL we developed with Voluntis. It is described extensively in the SOSYM paper. The language is used by doctors and other healthcare professionals to describe diagnostics and medication algorithms. These algorithms are at the core of so-called companion apps that run on phones and help patients go through treatments by monitoring side-effects and recommending behaviors or medications.

The algorithms are complicated. They make decisions based on many inputs, they take into account the passage of time (“if this happens more than three times in 5 days …”) and deal with all kinds of exceptional cases. The language uses decision tables and decision trees as one means to represent the decisions in an understandable way.

Lots of effort goes into validating that the algorithms are functionally correct because people’s health is at risk if the algorithms are faulty. The overall execution infrastructure (generators, interpreters) is verified extensively to make sure that the functionally correct algorithm is executed faithfully.

Language 2: Salary/Tax Calculation

The second language lives in the space of salary and wage calculation: it is used to calculate things like income tax, deductions for employees who have a company car, or the Solidaritätszuschlag, a particular tax introduced as part of Germany’s reunification in the 90s. Each of the calculations can essentially be expressed in a few dozen lines of code and a few value lookups in tables. The details of the calculations and the values in the lookup tables are governed by law and other “external” regulations.

So is this a complex domain? I just said that each of the calculations can expressed in a few dozen lines of code. Is it worth implementing a DSL?

Upon closer inspection of the domain we learn a few things. First, data changes over time. For example, your religious affiliation might change from one month to the next (or even within a month!), which of course affects the tax you pay. So the calculations become more complicated because they work on temporal data.

Second, the calculations themselves change over time, because they reflect evolving laws and regulations. And it is necessary to be able to recalculate “old” calculations for several years, concurrently with new calculations. So just checking out an old version from git and rebuild/package/run is not feasible.

The calculations are also often different for each of Germany’s 16 states, which introduces a second dimension of variability in addition to the evolution over time.

Ok, so how does a DSL help? In the system we are currently building, the DSL has a couple of features specifically designed to help with these non-apparent complexities:

The DSL has native temporal data. A temporal value is a list of {d, v}-pairs, expressing that after a particular date d, a value is v. The language overrides the basic arithmetic and comparison operators to work with such tuples, and has special operators to reduce temporal values to primitive values (for example, by averaging the daily wage within a month).

The language has direct support for versions of data structures and calculations that represent their evolution over time. The visibility rules and the type checker ensure static correctness and report error messages in terms of these versions (and not general, low-level type-y stuff). A library-based solution using a general-purpose language cannot do that. The IDE also has dedicated support for versioning: if the user selects a data structure, the IDE shows all overrides of the calculations that compute this structure in all versions of the system.

The second dimension of variability, the 16 German states (and in some cases, other geographic/governmental structures), is also supported first-class. Implementing this variability with conditionals in the calculation code quickly gets out of hand.

Finally, you can imagine that testing such systems isn’t easy, because, as part of a test case, you have to set up timelines of changing data. Which is why the DSL has special syntax for this, taking into account the special scoping and static type checking.

Two more concerns drive the use of a DSL, although these are really motivated technically: since the execution model is incremental (when data becomes available or changes, the dependent results have to be (re-)calculated), we have to ensure that all computations are expressed functionally with clear dependencies, at least at the top level. The language enforces this. Finally, the system (or parts of it) should be able to run on different platforms. So expressing the Fachlichkeit independent of particular programming languages and technologies is crucial; this is one of the generic benefits of using DSL in general.

Take-Away Points

Here is the crucial observation: the complexity in the salary/tax domain does not lie in the calculations themselves. Instead it hides behind a set of “non-functional” requirements. I am putting “non-functional” in quotes because these are non-functional requirements that are not technically motivated (like performance, scalability or data privacy). They are an inherent part of the domain, but they are not as apparent like those complex decisions in the medical companion apps.

The complexity may hide so well that the people in the domain aren’t necessarily aware of it. I know of a dev team that did a “Java prototype” that, in terms of Fachlichkeit, literally multiplied three numbers. To be fair, the prototype was about infrastructure (micro-services, Spring vs. JavaEE, web-technologies), but nonetheless, it failed to address the (hidden) herd of elephants in the room.

Two final remarks. Once again it turns out that the act of building a DSL that makes these hidden complexities apparent is a great way of discovering, understanding, discussing and trade-off-ing these complexities. It’s probably worth doing even if we threw away the language itself later!

And a word about language development effort. If the domain is so complex that you need special data types and temporal arithmetics and versioning and complicated visibility rules and domain-specific type checkers and incremental computation … isn’t it a bit of a stretch to think you can implement a DSL for the domain? After all, no general purpose language has such “advanced” features. Well, of course, that is the point with DSLs: you can compromise in other areas of the language, making the overall effort acceptable. We spent only a few weeks on building the basic temporal/versioning infrastructure. And what’s the alternative? Relying on domain experts and programmers to use Word and Java to tackle this complexity without tool support? Hardly!