Why documents shouldn’t be the basis of a domain analysis

DSL design requires that you first understand the domain for which you want to build the language, so the analysis of how stuff works is a huge part of my work. And of course this “understanding” part is not limited to building DSLs, you have to do this to build any kind of software that is specific to a particular problem set.

In many cases, customers give me documents, existing software and other artifacts they have available to help me build an understanding: “Here, read this, and you’ll know what you have to build!” This is generally a bad idea, I will explain in this article why this is the case.

A disclaimer: it is absolutely useful to look at documents and other artifacts to get a rough overview over a domain. After all, we all read books and presentations and stuff all the time to learn about things. But getting a rough overview about something is quite different from building a detailed, structured and formal understanding of how something should work. And books are usually written with the express purpose of teaching, and aren’t “random documents” people wrote for some reason or another.

This brings me to the first reason why domain analysis from documents is a bad idea: Most documents (and yes, there are exceptions) are not written with a lot of love; somebody had to write the documentation, and they tried to do that as fast as possible so they could get back to something that is more fun. This shows in the documents and makes them hard / unpleasant to read.

Second, company-internal documents are usually written for insiders. They use a lot of jargon. Now, jargon is fine, it is often the basis of a DSL, but if you are an external language designer new to a domain or organization, you cannot just read a jargon-rich document as an introduction to the domain. You have to learn the jargon first — usually not through such documents.

A more fundamental problem is that documents are often outdated and/or incomplete. Or there are multiple documents that disagree. It’s the usual problem with documentation: because it doesn’t “run”, there is not much incentive to keep it current. And so usually it isn’t. You’ll get a not-so-useful perspective on the domain from reading it.

There is an even more fundamental problem with documents: even if they are up-to-date and well written, they describe the current state of the domain, warts and all. When designing a new process/tool/language/DSL, you often want to clean things up, you want to refactor, optimize, and get rid of “historical accidents”. So even if you understood everything correctly from the domain, you’d just replicate the status quo in a new tool. That’s often not what you want to do.

Lastly, you cannot interact with a document. You cannot ask questions if you don’t understand something. You cannot judge the relative importance of things described in the documents. The aforementioned warts aren’t obvious. Different parts of a document won’t suddenly start talking to each other, disagreeing about some aspect of what is written there.

So what else should you do? It’s probably obvious from that last sentence: talk to people. Find experts in the domain and let them explain what they do. Build strawmen, mental models (and ultimately, prototype tools), challenge them and potentially tear them down again, replacing them with something better. Talk to different people and let them disagree. If things look fishy, challenge them. Often, this happens in the form of analysis workshops, I wrote about how to run those before.

Documents can play a role in this context: you can use them as completeness checks, and as reminders of which things to talk about in a workshop. Going through some of them with the domain experts is sometimes a good exercise. But making documents the primary source — without access to people — doesn’t work.

The drawback? You gotta find those people. They do exist in all the organizations I have ever worked with, but there usually aren’t many people who really fully grok how a domain works in total and in detail. And often — because they are experts — these folks are busy. So it does make sense to organize the overall process in a way where these people are not unnecessarily burdened. But if you want to build a DSL that really captures the domain you gotta get at the brain of these people. And that requires their time. There’s no way around it.