Perspectives on DSL:
Knowledge Management
Domain specific languages are
a very particular approach to tackle a problem, and for many people it is
unclear what a DSL would do for them. To make this clearer, I have decided to
write a few posts that relate DSLs to various approaches known in computer
science and IT. I start in this post with knowledge management.
What is knowledge
What is knowledge, from the
perspective of an organization? Here’s how I would define it. The world is full
of information, data and stories about almost every imaginable domain. But if a
company wants to develop software in that domain, they have to make sense of
all of this stuff. They have to scope the domain (define what’s in and out for
their purposes), they have to structure in some systematic way, they have to
make it accessible to the people in the organization, they have to evolve it
over time, and ultimately, they have to do something with it,
typically encode it into software as part of their products. So:
Knowledge is an
organization’s understanding of relevant information and data that is available
in the world or built proprietarily by the organization.
Let’s look at a few examples.
If you develop payroll software, you have to be aware and understand all the
laws and regulations that concern payroll in your target markets. If you are
the government agency that calculates taxes for your citizens, you have to make
sense of the laws and regulations that represent the governments decisions
relative to taxes, and probably lots of court rulings as well. If you’re a
healthcare company that develops digital therapeutics apps, you have to distill
the medical expertise that governs a certain therapy or medication into a
repeatable and deterministic algorithm.
In most cases, all the data
and information you need is out there. In some cases you have to built it
yourself. But always the question is: how do you capture this knowledge in your
company to make it actionable, knowledge that you can package into
your products and sell to your customers.
Representing Knowledge as
Text and Code
Of course, you will employ a
bunch of experts, people who have experience in the respective domain. Often
they are called analysts, and their job is to build up knowledge from data and
information, with the help of their own skills and domain expertise. But again:
what is their work product?
The traditional approach is
to write things down as prose, with a couple of explanatory diagrams, sometimes
formulas, and often lots of tables full of numbers. You will also find the
occasional decision table or decision tree, but mostly the relevant knowledge
is encoded as text. Text is useful because it can be consumed by anyone — at
least superficially, because you need experience in the domain to make sense of
the writing. But a prose representation does not let you do anything
with the knowledge. You cannot check it for completeness or consistency (which
is why many texts become a mess after they have been evolved and changed a
number of times), you cannot transform it into different representations, and
of course you cannot execute it. You can only display, print and read.
The other mainstream approach
is to encode the knowledge directly in (programming language) source code. Of
course this lets you execute it, and by writing tests, you have a way of
cross-checking the encoding. Tests also help with evolution because they
provide a safety net that makes changes less risky. But once you have encoded
knowledge in source code, execution is pretty much the only thing you
can do with it. It is very hard to reverse engineer the domain-level
semantics, which makes meaningful analysis hard. Source code is also not very
understandable to non-programmers such as your analysts and domain experts.
They might revert back to writing text — now called requirements — which is
hopefully understood correctly by programmers so they can encode it. It also
requires a lot of discipline to not accidentally mix your
domain-knowledge-as-code with code for the technical concerns required to make
it run. And it is effectively hopeless to try to extract the knowledge from
program text and transform it into a different representation, such as source
code in a different programming language. Really, encoding knowledge in source
code effectively buries it there.
Encoding Knowledge in DSL
Models
Domain-specific languages
combine the best of both worlds. Knowledge encoded as DSL models can be
executed through interpretation or code generation. It is also completely
independent of the actual execution technology, so porting to another
technology is easy. It is even possible to port the models to a language
implemented on a different language workbench. Well designed DSLs supports
analysis relevant to the domain. And while a DSL is not as trivially
approachable as prose (to the degree technical prose is), a good one can be
very much learned and used by non-programmers. Simulators, and other ways of
bringing the knowledge to life directly in the DSL IDE also help a lot.
But, you might ask, do I
really have to build my own language? Can’t I just encode my domain knowledge
with an existing modeling language? Maybe you can. There is a long tradition of
“analysis modeling” (at least in books and academia; haven’t seen too many in
the real world). But you absolutely need a well-defined language, otherwise the
semantics are unclear and you can’t execute. And using UML, for example, in a
way that is precise enough for execution, is cumbersome (but possible if you
try hard). Think about it: every domain, has its own jargon, its own
conventions, and often its own notations. Math and chemistry come to mind. You
don’t want to encode it in jargon-free English. Similarly for models: you want
a language that fits the things you want to express.
In addition, the process of
building the language is itself extremely useful. Because it helps you
understand the jargon, conventions and notations for the domain, and it forces
you to nail down a precise meaning. In some sense, the DSL definition is
meta-knowledge, knowledge that is relevant to the whole of your domain. In
fact, it can be seen as the authoritative definition of your
domain. Don’t risk that benefit by trying to shoehorn your knowledge into a
semi-formal and imprecise general-purpose modeling language.
Domain specific languages,
both their use and their development, are a great way to define a domain and
capture relevant knowledge in an actionable way.