The Language Testing Triangle

I often get the question: how do you test a language — or more specifically, a language implementation in a tool like MPS or Xtext. That’s a rich question, and the answer isn’t easily given in a few lines here. I have written extensively on this, both specifically for MPS as well as for the case of safety-critical systems.

But there is an important detail to this discussion for those cases, where your DSL allows the DSL users to write tests for the models they create, something every self-respecting, subject matter expert targetted DSL should do.

For example, in a healthcare DSL, the subject matter experts describe the system basically as a state machine, but they also have the opportunity to write scenario-based tests to verify that their state machine works correctly. Similarly for tax calculations: the tax experts describe the calculations as a tree, and then write tests to verify that their calculation logic is correct. In both cases, the users write the tests not on the level of the generated code, using JUnit or Cucumber or whatever. Instead, the DSLs have specific syntax for expressing tests on the abstration level of the domain. Here’s a screenshot from the tax example:

The question is: as a language engineer, can you leverage this test infrastructure for testing the language itself?

Testing Models

Let’s investigate how the subject matter expert sees the world when they write and run tests as part of their day-to-day work. Their goal is to verify whether the model is correct. They do this by writing tests, of which they assume that they are correct in the sense that they state the correct expectations for the inputs they specify. This assumption is usually justified because usually the tests are simpler than the system/model under test, because they contain specific scenarios and not the complete algorithm (not always of course, a test can be faulty as well).

However, there’s another assumption here, which is that the language (with all its execution infrastructure) works correctly. The subject matter expert does not verify this, they simply trust. Again, valid assumption from their perspective. The subject matter experts are done with testing once the model — our system under test here — is covered sufficiently, where “sufficient” is a metric that must be defined for the particular situation and the risks that materialize if faults go undetected.

Testing the Language

How does this picture look when the language engineer wants to test the language using that same infrastructure? Here’s the triangle:

In this case the language engineer trusts that the model and the tests are correct, and if tests fail, it is the language — in particular, its interpreter or generator — that is faulty. They write tests using the same testing syntax as the subject matter experts, but they are only done when the language implementation reaches 100% coverage (or whatever else is your magic number).

So who writes tests?

Initially, while the language implementation is new and unproven, the subject matter expert’s assumption that the language is correct is not justified — if tests fail, it is likely that the language needs fixing. This is why early in the project it is the language engineer who writes tests. This also forces them to put in place the language constructs to express those tests.

Later in the project when the language becomes more stable, the subject matter expert writes more tests, uses the same testing syntax that is by now developed and tuned by the language engineer through their test writing. Occasionally, if a test fails, it will still be the language implementation and the subject matter expert will be puzzled. So the two have to talk and figure it out.

Even in the long run, the language engineer will continue to write tests so that they can get to their coverage goal, reaching corner cases that the subject matter experts perhaps don’t encouter right away. There are also aspects of the language you cannot test this way, for example, the type system or generators that produce non-executable artifacts such as documents. For those and other infrastructural aspects the language engineer is still in charge of testing. See that previously mentioned paper.

Conclusion

Here’s the important thing: all the tests written by the subject matter expert to test their models also count towards testing the language itself! Overall, this reduces the testing effort significantly. The language engineers’ using the same infrastructure (test syntax) also means that you don’t have to build a separate language testing infrastructure. All in all, the approach allows test-driven development for both the subject matter expert and the language engineer.