The Philosophy behind
Language Engineering with MPS
Introduction
Over the last few years, we
have built a lot of interesting different DSLs with MPS. In this post, I
explain the thinking behind language engineering with MPS, and why the
languages one typically builds are different from languages built with other
tools.
Sure, it is possible to use
MPS to define programming languages that work like any other one: a relatively
small set of language constructs designed for letting the user define their own
abstractions plus a large standard library on which users can build. For
example, MPS ships with an implementation of Java (called BaseLanguage) which
is essentially unchanged from regular Java. The whole JDK is available for
users to build on. While some extensions are available, users can do regular
Java programming with MPS’ BaseLanguage.
However, when exploiting MPS’
unique characteristics, the resulting languages look very different.
Differences to “normal”
Language Engineering
Syntactic Forms
Because of MPS’ projectional
editor, it is possible to use a wide range of notations (see figure
below). Direct support exists for structured and unstructured text, tables,
box-and-line diagrams and math. But it is also possible to define completely
custom notations that do not fit any of these paradigms. The notations can also
be mixed (nesting one in another, using them next to each other in the same
“file“). Since this MPS is unique in this respect among industry-strength
language workbenches, it is not uncommon that MPS is specifically selected for
a language because of this feature. However, even a language that is
fundamentally textual, like KernelF,
exploits decision tables and trees, has an extension for math syntax.
Language Modules instead of Libraries
In general-purpose
programming languages, new abstractions are provided through libraries (and
frameworks, which we consider a form of library in this article), developed
with the language itself. This is possible because GPLs are built for defining
abstractions. However, as a means of providing new abstractions for
programmers, libraries are limited in the sense that they cannot
extend the language syntax, type system and IDE support (this is a
slightly over-generalized statement because, depending on the language and its
meta programming facilities, new abstractions can provide their own syntax and
type system and IDE support, however, generally the statement is true).
In idiomatic use of MPS,
additional abstractions are provided through language extensions, defined
outside the language, using MPS’ language definition facilities. A
language extension can be seen as a library plus syntax, plus type system and
plus IDE support (and a semantics definition via an interpreter or
generator). The structure definition of languages is object-oriented, and
many of the design patterns relevant for libraries and frameworks can also be
found in MPS languages (examples include the Adapter/Bridge/Strategy
patterns or the separation of the construction of a data structure from its
subsequent interpretation or execution). This approach fits extremely well with
DSLs, which, because of their purpose and target audience, often do not come
with sophisticated means of buildingcustom abstractions.
One very nice feature of
libraries is that, in general, they can be composed. For example, you can use
the collections from the Java standard library together with the Joda Time library
for date and time handling and the Spring framework for developing server-side
applications. There is no need to explicitly combine the frameworks, the
combination “just works”. While this composability is not true for language
composition in general (primarily because of syntactic ambiguities), it is true
with MPS: for all intents and purposes, language extensions can be composed
modularly, just like libraries. The composition also has the same limitations:
one cannot statically proof that it will work. And the set of
libraries/language extensions might not fit well in terms if their style.
However, if language extensions are developed in a coordinated, but still
modular way, as stack of extensions, these limitations do not apply. mbeddr is a very
comprehensive example of this approach.
To illustrate the library vs.
language extension point, I provide two examples. The first one concerns the
collection in KernelF. Consider the following code:
// type inferred to list<int>
val l1 = list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)// type inferred to
list<real>; results in type error for l2
val l2: list<int> = l1.where(|it > 5|).select(|it
/ 2|)
As you can see, the
collections are generic: the list type carries the type
of its elements, either explicitly specified (l2 ) or inferred (l1). However,
KernelF does not generally support generic types. For example, users cannotwrite
the following:
fun<type T1, type T2> typedPair(v1: T1, v2: T2): [T1,
T2] = [v1, v2]
Generics are not generally
necessary for DSLs. In fact, their exposure to the user will be often be
confusing, and it will make the job of the language extender harder, because he
has to take into account generics for all extensions. However, for collections,
an explicit specification of their element type is useful and intuitive. This
is why the language extension for collections supports it. In the list example
above you can also see the whereand select operators;
they are also language extensions, available on listtypes. These
could have been implemented with extension functions in a standard library.
However, because they have to work with the collections’ type parameters and
because they use a particular kind of type inference for the it argument,
not generally supported by KernelF, these are also built using language
extension.
As a second example, take a
look at the state machine example below. State machines come with a rich
syntax, specific type checks, and dedicated IDE support. In the future, model
checking will be available.
The second example is
probably more convincing to you; it is hard to imagine how the state machines
could be implemented as a library, even in a language with meaningful meta
programming facilities. For the collections and their operations a language with
more powerful type system could provide them as a library with the same
end-user visible features. However, as mentioned above, it would lead to
complications in the end-user experience in other places and for the language
extender. This is why those have been implemented as language extensions as
well.
Because of the ease of
developing languages in a modular way, we try to separate generally useful
KernelF extensions from actual customer-specific extensions when we run
projects; the generally useful parts become a customer-independent KernelF
extension — if you will, the equivalent of a standard library, but as
languages.
The last point of comparison
between libraries and language extensions is the effort to create them. For
an experienced MPS developer, the development of a language extension is
not significantly more effort than the effort to write a library. In addition,
because language development and language use in MPS happens in the same
environment, turn-around time is very quick, supporting iterative, and
example-driven language development, just as if you develop a library together
with representative examples of its use.
More First-class Concepts
As a consequence of the
heavier reliance on language extensions, a (stack of) MPS language(s) will typically
be more keyword- heavy than non-MPS languages. While this may offend the sense
of style of some developers, this has two distinct advantages.
First, because more
concepts are first-class, the IDE can know the semantics of those concepts and
provide better support in terms of analyses. This, in turn, can be used to
create meaningful error messages that align with the particular semantics of an
extension. For example, in state machines, if the user creates a transition to
the start state (assuming scoping allows this in the first place), an error
message could read Start states cannot be used as
the target of a transition, and in smaller font, below, Start
states are pseudo states that are only used internally during startup of the
machine. In a library-based
solution, or one that relies on meta programming, very likely this problem
cannot be determined statically at all, and would lead to a runtime error.
Alternatively, the error message would perhaps be much more generic, as in Type StartState is not a subtype of State or something, which is also not very helpful
to the end user.
Second, the language is
easier to explore, primarily because code completion has more sensible things
to show. In a minimal language like Scheme, the contents are essentially the
completions for the basic syntactic forms such as atoms, lists or functions,
plus the calls to existing functions. This makes it harder for the user to
explore the things they can do with a language.
Focus on Evolution
Because languages and their
extensions contain comparatively many first-class concepts, and many reflect a
business domain that evolves, the languages we build with MPS also evolve
quickly. Evolution in this context can mean one of two things. First, we may
build additional languages on top of a core language, while keeping the
core language stable; we grow the stack of languages into one or more domains.
The other notion of evolution
is the actual invasive evolution of the language itself (to make it
concrete: you’ll ship a new version of kernelf.jar , whereas in the
extension case above you ship additional jars that rely on an unchanged kernelf.jar). If the new
version is compatible with the previous version, this case is simple: just
deploy the new version of the language, and users now have more features. If
the new version is not backward compatible, then existing programs become
invalid. For this case, MPS supports explicit language versioning. As the
language developer makes a breaking change to a language, they increase the
version counter and provide a migration script. When language users open
existing models after the new version has been deployed into their IDE, the
scripts run automatically, bringing the model up-to-date. If no algorithmic
migration is feasible (because the user has to make a semantic decision not
previously necessary), the recommended approach is to keep the old construct
around, deprecate it, and output an error message that tells the user that he
has to make a decision and migrate.
Note how this is a much
more robust infrastructure for dealing with program migration than what is
possible with libraries: an incompatible change prompts a generic error from
the type checker or compiler, and automatic program migration is not available
(outside of experimental systems). All in all, iterative development of
languages is quite feasible, even when taking into account models that are “in
the wild” with language users.
Recasting IDE Tools as Languages
Traditional programming
systems consist of the language and libraries, the compiler and type checker,
and an IDE. Many added-value services, for example, those for program
understanding, testing, and debugging, are part of the IDE; more specifically,
they rely on tool windows and other service-specific UI elements (buttons, menus,
etc.). Because of MPS’ flexibility in how editors can be defined, we use
languages and language extensions for things that would be tool windows or
other IDE add-ons in classical languages and IDEs. Examples include the REPL
and its rendering of structured values (1 in the picture below), the overlay of
variable values over the program code during debugging (3), test coverage and
other assessment results, generated test vectors and their validity state (2)
and the diffing of mutated programs vs. their original in the context of
mutation testing (4).
As a consequence, the notion
of what constitutes a language is much broader in MPS, compared to the
traditional understanding. A side-effect of this approach is that the chrome of
the development environment — the set of windows, tabs, buttons, menus and such
— can be reduced, because “everything happens in the editor, through typing,
code completion and intentions”. Since complaints about MPS’ too cluttered tool
UI is among the most-heard complaints among our users, we consider this
side-effect an advantage.
More Reliance on the IDE
There is no standard for
the implementation of languages, which means that, once a language is
implemented with one particular language workbench, it cannot be ported to
another language workbench (unless one implements it completely from scratch).
This is all the more true for MPS, which, because of its particular style of
language implementation is unique among language workbenches. Specifically
because of its projectional editor, MPS languages cannot be used outside of the
MPS tool. While this can be seen as a drawback, the flip side is that one can
assume the IDE to always be present, and the language can be design assuming
the IDE and its services. A few examples:
Different projection modes: Instead of
making a design decision on which level of detail should be used for function
signatures, the user can switch (see figure below). This is useful because
users with different levels of proficiency will prefer different styles: the
newbie prefers the explicitly listed types, and once one gets more proficient,
one appreciates the conciseness of alternative (C).
Read-only editor contents: In many DSLs we
use them to create a more form-style editor experience, with non-editable labels.
In mbeddr, when a component implements an operation defined in an interface,
we use a read-only projection of the operation’s signature in the
implementation.
Intentions: These little in-place
transformations of the program are available from a drop down menu activated
with Alt-Enter (aka as Quick Fixes in Eclipse). In some languages, especially
non-textual ones, these are the only way to access certain constructs — you
can’t just type them. Many examples of this can be found in those languages
that recast traditional IDE services (see previous paragraph). While replying
on intentions might be unintuitive for text-focused programmers, we teach our
users to consider the intentions menu to be an integral part of the editor
experience.
Summary
Domain-specific languages in
general, and our approach in particular, are a hybrid between modeling and
software language engineering. From modeling we borrow declarativeness and
high-level, domain specific concepts; multiple integrated languages; meta
modeling for defining the structure of languages (named properties and links,
inheritance, actual references); notational freedom, and in particular,
diagrams. From the field of software language engineering we adopt a focus on
behavior and integration of fine-grained aspects, such as expressions; actual
type checking and not just constraint checks; powerful, productivity-focused
IDEs; and textual languages.
We like to think that the
approach combines the best of these two worlds and leads to convincing
outcomes.