The Hidden
Complexities in Fachlichkeit
I want to propose that there
are two different kinds of Fachlichkeit; I encountered those in two recent
projects. These two kinds differ with regards to the complexities in the
domain. I don’t yet have catchy names for the two categories, so if you have
suggestions, please let me hear them :-)
Language 1: Medical
Algorithms
The first project is the DSL
we developed with Voluntis. It is described extensively in the SOSYM
paper. The language is used by doctors and other healthcare professionals
to describe diagnostics and medication algorithms. These algorithms are at the
core of so-called companion apps that run on phones and help patients go
through treatments by monitoring side-effects and recommending behaviors or medications.
The algorithms are
complicated. They make decisions based on many inputs, they take into account
the passage of time (“if this happens more than three times in 5 days …”) and
deal with all kinds of exceptional cases. The language uses decision tables and
decision trees as one means to represent the decisions in an understandable
way.
Lots of effort goes into
validating that the algorithms are functionally correct because people’s health
is at risk if the algorithms are faulty. The overall execution infrastructure
(generators, interpreters) is verified extensively to make sure that the
functionally correct algorithm is executed faithfully.
Language 2: Salary/Tax
Calculation
The second language lives in
the space of salary and wage calculation: it is used to calculate things like
income tax, deductions for employees who have a company car, or the
Solidaritätszuschlag, a particular tax introduced as part of Germany’s
reunification in the 90s. Each of the calculations can essentially be expressed
in a few dozen lines of code and a few value lookups in tables. The details of
the calculations and the values in the lookup tables are governed by law and
other “external” regulations.
So is this a complex domain?
I just said that each of the calculations can expressed in a few dozen lines of
code. Is it worth implementing a DSL?
Upon closer inspection of the
domain we learn a few things. First, data changes over time. For example, your
religious affiliation might change from one month to the next (or even within a
month!), which of course affects the tax you pay. So the calculations become
more complicated because they work on temporal data.
Second, the calculations
themselves change over time, because they reflect evolving laws and
regulations. And it is necessary to be able to recalculate “old” calculations
for several years, concurrently with new calculations. So just checking out an
old version from git and rebuild/package/run is not feasible.
The calculations are also
often different for each of Germany’s 16 states, which introduces a second
dimension of variability in addition to the evolution over time.
Ok, so how does a DSL help?
In the system we are currently building, the DSL has a couple of features
specifically designed to help with these non-apparent complexities:
The DSL has native temporal
data. A temporal value is a list of {d, v}-pairs, expressing that after
a particular date d, a value is v. The language overrides the
basic arithmetic and comparison operators to work with such tuples, and has
special operators to reduce temporal values to primitive values (for example,
by averaging the daily wage within a month).
The language has direct
support for versions of data structures and calculations that represent their
evolution over time. The visibility rules and the type checker ensure static
correctness and report error messages in terms of these versions (and not
general, low-level type-y stuff). A library-based solution using a
general-purpose language cannot do that. The IDE also has dedicated support for
versioning: if the user selects a data structure, the IDE shows all overrides
of the calculations that compute this structure in all versions of the system.
The second dimension of
variability, the 16 German states (and in some cases, other geographic/governmental
structures), is also supported first-class. Implementing this variability with
conditionals in the calculation code quickly gets out of hand.
Finally, you can imagine that
testing such systems isn’t easy, because, as part of a test case, you have to
set up timelines of changing data. Which is why the DSL has special syntax for
this, taking into account the special scoping and static type checking.
Two more concerns drive the
use of a DSL, although these are really motivated technically: since the
execution model is incremental (when data becomes available or changes, the
dependent results have to be (re-)calculated), we have to ensure that all
computations are expressed functionally with clear dependencies, at least at
the top level. The language enforces this. Finally, the system (or parts of it)
should be able to run on different platforms. So expressing the Fachlichkeit
independent of particular programming languages and technologies is crucial;
this is one of the generic benefits of using DSL in general.
Take-Away Points
Here is the crucial
observation: the complexity in the salary/tax domain does not lie in the
calculations themselves. Instead it hides behind a set of “non-functional”
requirements. I am putting “non-functional” in quotes because these are
non-functional requirements that are not technically motivated (like
performance, scalability or data privacy). They are an inherent part of the
domain, but they are not as apparent like those complex decisions in the
medical companion apps.
The complexity may hide so
well that the people in the domain aren’t necessarily aware of it. I know of a
dev team that did a “Java prototype” that, in terms of Fachlichkeit, literally
multiplied three numbers. To be fair, the prototype was about infrastructure
(micro-services, Spring vs. JavaEE, web-technologies), but nonetheless, it
failed to address the (hidden) herd of elephants in the room.
Two final remarks. Once again
it turns out that the act of building a DSL that makes these hidden
complexities apparent is a great way of discovering, understanding, discussing
and trade-off-ing these complexities. It’s probably worth doing even if we
threw away the language itself later!
And a word about language
development effort. If the domain is so complex that you need special data
types and temporal arithmetics and versioning and complicated visibility rules
and domain-specific type checkers and incremental computation … isn’t it a bit
of a stretch to think you can implement a DSL for the domain? After all, no
general purpose language has such “advanced” features. Well, of course, that is
the point with DSLs: you can compromise in other areas of the language, making
the overall effort acceptable. We spent only a few weeks on building the basic
temporal/versioning infrastructure. And what’s the alternative? Relying on
domain experts and programmers to use Word and Java to tackle this complexity
without tool support? Hardly!