When is something a
(domain-specific) language?
Customers often ask me: what
is a DSL? How is a language different from … and then they mention all kinds of
other terms. This is a great topic for discussions at academic workshops or
evenings over drinks (the latter probably more productive), so I could write
pages and pages here. Let me try something succinct, and you tell me if it
makes sense and where you disagree.
I will sort the ideas in
order of increasing languaginess and start with glossaries.
Glossaries
When trying to figure out a
domain, one usually starts with a glossary. This defines the most important
terms used (and agreed upon!) in the domain and explains them in prose. It’s a
list of words, with a few definitional sentences for each. It is very useful to
make verbal and whiteboard conversations about the domain more structured and
less ambiguous, for example, when you’re trying to understand the domain to
build a DSL.
Obvious, but I’ll say it: a
glossary is not a DSL!
Structured Glossary
When you want to be one step
more precise, you’ll start defining the relationships between the glossary
terms with a set of well-defined relationship types. These types are familiar
for anyone with even a basic object-oriented background: contains, refers
to and is-a. You can also write sentence fragments at the contains and refers arrows
that charaterize the relationship. Often it makes sense to represent the
relationship network in some graphical, UML-ish notation.
I call something like this a
structured glossary. It starts to look suspiciously like a (meta)model, but the
crucial thing is that this thing is intended for human consumption, not for
processing with a tool. Again: useful, but not a DSL.
The term ubiquitous language
from the space of DDD is IMHO somewhere between the glossary and the structured
glossary.
Domain models and
metamodels
Now lets move on to the term
domain model. It is more or less similar to the structured glossary, but it is intended
for tools (in the widest sense). This means that the amount of semantics
conveyed by the sentence fragments on the structured glossary should be moved
into additional model constructs and/or program code. A domain model can be
implemented as a bunch of classes in a programming language and then serve as
the backbone for capturing data or building a UI.
If it is intended to be used
in a modeling infrastructure, such a formal domain model is usually called a
metamodel. A meta model can be implemented in many kind of technical
formalisms: an XML schema, a JSON schema, an EMD Ecore file, MPS structure
definitions and yes, also Java classes, although I usually don’t call it a
meta-model but a domain model.
Attention pet peeve alarm:
any model can be a metamodel. “meta” characterizes a relationship (to another
model) and is not an inherent property of the model (not true for some
technical spaces, but I digress).
We are slowly moving into the
domain of languages. A meta-model defines the abstract syntax of a language —
aka, the kinds of things you have available for defining sentences (or models).
A meta-model is not a DSL, but it is one of the ingredients of a DSL.
Even when using the
meta-model standalone (without the other ingredients of languages) it serves as
a well-defined “truth” regarding the structure of the domain. It can be the
basis for the definition of data exchange formats. It’s definitely useful.
Validations
However, only the most
trivial domains can be defined with structure only. You usually have to define
validations on them. The simplest ones are cardinality constraints (which are
kinda structural), but there are validations that check name uniqueness and
rules of the kind
if the X contains a Y, then
this A over there cannot have more than 2 children of type B.
All validations on a model
instance must be true for a model to be a valid instance of the metamodel. Any
tool that writes or reads models must know and be able to process such
validations.
Some people call a meta-model
with validations a language (or DSL, if it is specific to a domain). I don’t.
But this is not a value judgement — this thing is useful, and as I have said,
it is a part of a DSL. But I want to be clear about terms: so no, it’s not a
DSL.
Syntax
The next ingredient of a
language is the syntax. This is a topic that is a bit hard to grasp. Is XML a
syntax? If you encode your data in an XML document that is compliant to your
metamodel expressed in XML schema, are you using a syntax? Same question for
Json. Many people will argue yes. And technically they are probably right.
However, that syntax is not metamodel-specific. It is meta-metamodel-specific.
This means that everything defined in your metamodel is encoded the same way,
based on rules defined for the meta-metamodel. To go back to XML: all XML
elements you define with your schema are expressed in the well known
nested-angle-bracket-plus-attributes syntax.
I call such a
meta-metamodel-defined syntax a serialization format. What makes a “real”
syntax different is that you define a specific syntax for each (or groups of)
your metamodel elements (aka metaclasses). It does not matter whether that
syntax is textual, tabular, graphical, form-like or a mix of all of these, as
we like to do with MPS.
Just to drive home this difference,
let’s assume we have a language that contains functions calls with arguments
and plus operators with two arguments. With a real
(metamodel-specific) syntax, you would perhaps write these two as
As you can see, we use the
familiar parens-and-comma syntax for function calls and an infix notation for
plus; a syntax specific for each concept. If we were to use a
meta-metamodel-specific syntax, every concept would use the same syntax; we use
XML here:
You can see two things:
first, each concept uses the same approach for encoding (angle bracket, concept
name, properties as attributes, children as nested XML), nothing specific (aka
different) for PlusOp or FunCall. You can also see
that this kind of syntax is useless for human consumption except for very
simple configuration languages (even if you use a less verbose syntax than XML)
such this less noisy one:
I repeat: a meta-model with a
serialization format is not a language. It’s just a metamodel with a
serialization syntax. It’s useful, but distinct from a language.
Type Systems
There are two more things we
need to discuss. Type systems and semantics (bear with me for a more precise
definition of the terms). Type systems first.
Are a bunch of validation
rules a type system? Some people argue yes. And again, maybe they are
technically right. For me, validation rules are just validation rules. But how
is a type system different? My criterion for distinction is that, as soon as
your validations are so complicated that you compute additional data structures
(ie., types) for your model elements and then perform computations on these
data structures, then you have a type system. Validations don’t really do this,
usually, they just inspect the structure and values in the model.
Are type systems needed for
something to be a language? IMHO not. There are meaningful languages that don’t
need a type systems (and can make do with validations). But many (interesting,
useful) languages do have a type system, because many (interesting, useful)
languages require expressions. And expressions require type checking (unless
you defer to runtime type checking, which for reasons I don’t want to go into
here are at odds with DSLs).
Semantics
Talking about runtime: the
final ingredient to a language is the (formal) definition of semantics. While
“formal” sounds like greek letters and deduction and proofs, in practice this
is typically achieved by transforming (generating, compiling) your language to
some of other language or formalism whose semantics is known, or by writing an
interpreter. Note that the goal of the semantics is not necessarily execution,
it might also be some form of sophisticated analysis (which is why type
checking is a form of semantics …), but many DSLs indeed have execution semantics
because you want to “run” the program (again, by generation or interpretation).
Wrap up
So where does this leave us?
When do we have a DSL?
· Glossary? No.
· Structured Glossary? No.
· Metamodel? No.
· Metamodel + Validations? No.
· Metamodel + Validations + Metamodel-specific
Syntax? Yes!
· Metamodel + Type System + Metamodel-specific
Syntax + Execution Semantics? Double yes!
There’s a bit of a caveat to
my nice, incrementally building story: what about metamodel + type system +
execution semantics, but no metamodel-specific syntax? Many people will argue
that this is a language, they consider a formal semantics (plus the necessary
metamodel) the main ingredient. And indeed, many languages, especially those
used internally in tools, e.g., as intermediate representations in compilers,
don’t really need a “real” syntax because no human ever writes them, they are
just used by tools. I understand the perspective. However, I avoid the term
language for those. I call them model, intermediate format, whatever.
So where does this leave us?
Essentially, I think a metamodel-specific syntax is the deciding ingredient to
make something a language. Because this makes them useful to human consumption.
And that’s the core thing: a language is a formalism that can be written, read
and understood by humans and computers, not just by only
computers or only humans.