Perspectives on DSL: Knowledge Management

Domain specific languages are a very particular approach to tackle a problem, and for many people it is unclear what a DSL would do for them. To make this clearer, I have decided to write a few posts that relate DSLs to various approaches known in computer science and IT. I start in this post with knowledge management.

What is knowledge

What is knowledge, from the perspective of an organization? Here’s how I would define it. The world is full of information, data and stories about almost every imaginable domain. But if a company wants to develop software in that domain, they have to make sense of all of this stuff. They have to scope the domain (define what’s in and out for their purposes), they have to structure in some systematic way, they have to make it accessible to the people in the organization, they have to evolve it over time, and ultimately, they have to do something with it, typically encode it into software as part of their products. So:

Knowledge is an organization’s understanding of relevant information and data that is available in the world or built proprietarily by the organization.

Let’s look at a few examples. If you develop payroll software, you have to be aware and understand all the laws and regulations that concern payroll in your target markets. If you are the government agency that calculates taxes for your citizens, you have to make sense of the laws and regulations that represent the governments decisions relative to taxes, and probably lots of court rulings as well. If you’re a healthcare company that develops digital therapeutics apps, you have to distill the medical expertise that governs a certain therapy or medication into a repeatable and deterministic algorithm.

In most cases, all the data and information you need is out there. In some cases you have to built it yourself. But always the question is: how do you capture this knowledge in your company to make it actionable, knowledge that you can package into your products and sell to your customers.

Representing Knowledge as Text and Code

Of course, you will employ a bunch of experts, people who have experience in the respective domain. Often they are called analysts, and their job is to build up knowledge from data and information, with the help of their own skills and domain expertise. But again: what is their work product?

The traditional approach is to write things down as prose, with a couple of explanatory diagrams, sometimes formulas, and often lots of tables full of numbers. You will also find the occasional decision table or decision tree, but mostly the relevant knowledge is encoded as text. Text is useful because it can be consumed by anyone — at least superficially, because you need experience in the domain to make sense of the writing. But a prose representation does not let you do anything with the knowledge. You cannot check it for completeness or consistency (which is why many texts become a mess after they have been evolved and changed a number of times), you cannot transform it into different representations, and of course you cannot execute it. You can only display, print and read.

The other mainstream approach is to encode the knowledge directly in (programming language) source code. Of course this lets you execute it, and by writing tests, you have a way of cross-checking the encoding. Tests also help with evolution because they provide a safety net that makes changes less risky. But once you have encoded knowledge in source code, execution is pretty much the only thing you can do with it. It is very hard to reverse engineer the domain-level semantics, which makes meaningful analysis hard. Source code is also not very understandable to non-programmers such as your analysts and domain experts. They might revert back to writing text — now called requirements — which is hopefully understood correctly by programmers so they can encode it. It also requires a lot of discipline to not accidentally mix your domain-knowledge-as-code with code for the technical concerns required to make it run. And it is effectively hopeless to try to extract the knowledge from program text and transform it into a different representation, such as source code in a different programming language. Really, encoding knowledge in source code effectively buries it there.

Encoding Knowledge in DSL Models

Domain-specific languages combine the best of both worlds. Knowledge encoded as DSL models can be executed through interpretation or code generation. It is also completely independent of the actual execution technology, so porting to another technology is easy. It is even possible to port the models to a language implemented on a different language workbench. Well designed DSLs supports analysis relevant to the domain. And while a DSL is not as trivially approachable as prose (to the degree technical prose is), a good one can be very much learned and used by non-programmers. Simulators, and other ways of bringing the knowledge to life directly in the DSL IDE also help a lot.

But, you might ask, do I really have to build my own language? Can’t I just encode my domain knowledge with an existing modeling language? Maybe you can. There is a long tradition of “analysis modeling” (at least in books and academia; haven’t seen too many in the real world). But you absolutely need a well-defined language, otherwise the semantics are unclear and you can’t execute. And using UML, for example, in a way that is precise enough for execution, is cumbersome (but possible if you try hard). Think about it: every domain, has its own jargon, its own conventions, and often its own notations. Math and chemistry come to mind. You don’t want to encode it in jargon-free English. Similarly for models: you want a language that fits the things you want to express.

In addition, the process of building the language is itself extremely useful. Because it helps you understand the jargon, conventions and notations for the domain, and it forces you to nail down a precise meaning. In some sense, the DSL definition is meta-knowledge, knowledge that is relevant to the whole of your domain. In fact, it can be seen as the authoritative definition of your domain. Don’t risk that benefit by trying to shoehorn your knowledge into a semi-formal and imprecise general-purpose modeling language.

Domain specific languages, both their use and their development, are a great way to define a domain and capture relevant knowledge in an actionable way.