The minimum
infrastructure for running languages and models
In my last
post I wrote about what constitutes a language, and what you might
want to call the data structures that don’t. In this one I want to discuss what
the minimum infrastructure is necessary to “implement” one or more languages.
Here is why I think this is a
relevant question. If you have decided that you want to solve some kind of problem
with a language — metamodel + validation or type checks + custom syntax + some
form of execution — then the next question is: how do I implement this? How do
I actually do this.
Language Workbenches — or
maybe not
Ideally, you have the
opportunity to use a language workbench like MPS (or others, but that’s not the
point here.) They come with all the tooling necessary to implement your
language. Just learn MPS, and you’re golden :)
But using MPS (or one of its
language workbench brethren) has drawbacks, too. One is the “learning” part.
Some of these tools are very powerful and therefore not so easy to learn. They
are also very opinionated about deployment. For example, MPS is a (relatively
fat) Java application. So is Eclipse/EMF/Xtext. The others aren’t much
“thinner”. These days you might want to run your language in the browser (or
potentially in a mobile app). There are increasingly more language workbenches
(or at least first steps towards them) that are web native, but (a) they are
all still work in progress, and (b) they also come with quite a bit of
infrastructure. Examples include Modelix, ProjectIt, plus a whole bunch of academic prototypes.
Finally, you might already
have an existing software ecosystem into which you have to integrate the
“language stuff”.
A Robust M3 Layer
So where do you start? The
most important building block is an ability to uniformly process models. This
means that you have to implement a bunch of classes to
· represent models in memory,
· persist them somehow using a
metamodel-specific serialization format (not a syntax, see my last post),
· provide an API to read, traverse and modify
models,
· and to support a a rudimentary but generic
way of editing them.
All of this must be
independent of your actual language, at least if you plan to develop multiple
languages. In other words, you have to implement your own M3 layer, your own
meta meta model.
Since this M3 layer does not
know about any particular M2 (metamodel), the model access will be reflective
(as seen from the metamodel). Typical operations found in these APIs are
To get a better understanding
you can check out EMF Ecore and its Java mapping or the MPS structure language
and the SModel API. A nice, simple and clean example is the API provided by Modelix. If you want to torture yourself,
you can alternatively read the OMG MOF standard :-)
In principle you now have all
you need: you can work with any model, expressed in any language, and then, for
example, implement type checkers or editing frameworks on top of it. However,
in practice, you will want to do one more thing: provide a convenient way to
define metamodels.
Definition of Metamodels
The problem with only
defining an M3 is that access to all models is necessarily reflective, because
there is no tool-processable definition of a particular language’s metamodel
(ie., the structure of models expressed with that language).
Existing M3s like MPS or EMF
allow you to express metamodels declaratively (could be a Json format in the
simplest case) and then generate typed APIs for the metamodel. Internally,
these typed APIs are implemented using the generic, reflective ones. So if we
use our typical state machine DSL as an example, you could have a
metamodel-specific API like this:
To make references work
reasonably, your M2-definition approach should provide a way to define the
scope of references, i.e., which type-compatible nodes are valid reference
targets.
Scaling and Notifications
Except in the simplest use
cases, you have to make sure that the system scales to large and/or many
models. If you store your models as a node graph in a database, you have to
ensure that appropriate lazy loading and unloading of nodes is supported,
ideally transparent for the user of the M3 API. If you store in files (or Json
Blobs in a database), then your M3 layer must support some kind of way to
define the granularity of these files or blobs, and likely some notion of
model-file-import is needed. Some of this can be non-trivial, and encapsulating
it in a well-defined M3 layer is a major reason for having one.
I would say that this
concludes the minimum reasonable infrastructure. You now have an API onto which
you can build — for example — type checkers, editors or interpreters. Sure, if
you build many of these, then you will certainly build additional frameworks
and libraries to let you implement those more efficiently, but such additional
infrastructure can be built iteratively, as needed.
One piece of infrastructure
that is worth mentioning explicitly here is notifications, where clients (type
checkers, editors, interpreters) can subscribe to changes of the model in order
to react to them. This is crucial if you want to make your model editors and
processing services integrate seamlessly with a modern web application where
users expect immediate synchronization between users and incremental update of
data derived from the model.
Wrap Up
What I describe here as the
minimum might sounds like a lot. But in fact it is just a couple of thousand
lines, usually, that you can even adapt from the linked examples. It is therefore
not a coincidence that Modelix started
with exactly this. It is now considerably more than a few thousand lines
because it addresses many non-functional concerns, but what I describe here was
the rationale behind the development roadmap.
In contrast, if you don’t
start this kind of infrastructure, you will not have any integration layer
across your models, and your language development endeavour will descend into
ad-hoc-ness and improvisation.
Acknowledgements
Thanks to Kolja Dummann and
Sascha Lisson for useful input on a previous version of the document. And to
some of my customers for making me think about the topic.