The Philosophy behind Language Engineering with MPS

Introduction

Over the last few years, we have built a lot of interesting different DSLs with MPS. In this post, I explain the thinking behind language engineering with MPS, and why the languages one typically builds are different from languages built with other tools.

Sure, it is possible to use MPS to define programming languages that work like any other one: a relatively small set of language constructs designed for letting the user define their own abstractions plus a large standard library on which users can build. For example, MPS ships with an implementation of Java (called BaseLanguage) which is essentially unchanged from regular Java. The whole JDK is available for users to build on. While some extensions are available, users can do regular Java programming with MPS’ BaseLanguage.

However, when exploiting MPS’ unique characteristics, the resulting languages look very different.

Differences to “normal” Language Engineering

Syntactic Forms

Because of MPS’ projectional editor, it is possible to use a wide range of notations (see figure below). Direct support exists for structured and unstructured text, tables, box-and-line diagrams and math. But it is also possible to define completely custom notations that do not fit any of these paradigms. The notations can also be mixed (nesting one in another, using them next to each other in the same “file“). Since this MPS is unique in this respect among industry-strength language workbenches, it is not uncommon that MPS is specifically selected for a language because of this feature. However, even a language that is fundamentally textual, like KernelF, exploits decision tables and trees, has an extension for math syntax.

Language Modules instead of Libraries

In general-purpose programming languages, new abstractions are provided through libraries (and frameworks, which we consider a form of library in this article), developed with the language itself. This is possible because GPLs are built for defining abstractions. However, as a means of providing new abstractions for programmers, libraries are limited in the sense that they cannot extend the language syntax, type system and IDE support (this is a slightly over-generalized statement because, depending on the language and its meta programming facilities, new abstractions can provide their own syntax and type system and IDE support, however, generally the statement is true).

In idiomatic use of MPS, additional abstractions are provided through language extensions, defined outside the language, using MPS’ language definition facilities. A language extension can be seen as a library plus syntax, plus type system and plus IDE support (and a semantics definition via an interpreter or generator). The structure definition of languages is object-oriented, and many of the design patterns relevant for libraries and frameworks can also be found in MPS languages (examples include the Adapter/Bridge/Strategy patterns or the separation of the construction of a data structure from its subsequent interpretation or execution). This approach fits extremely well with DSLs, which, because of their purpose and target audience, often do not come with sophisticated means of buildingcustom abstractions.

One very nice feature of libraries is that, in general, they can be composed. For example, you can use the collections from the Java standard library together with the Joda Time library for date and time handling and the Spring framework for developing server-side applications. There is no need to explicitly combine the frameworks, the combination “just works”. While this composability is not true for language composition in general (primarily because of syntactic ambiguities), it is true with MPS: for all intents and purposes, language extensions can be composed modularly, just like libraries. The composition also has the same limitations: one cannot statically proof that it will work. And the set of libraries/language extensions might not fit well in terms if their style. However, if language extensions are developed in a coordinated, but still modular way, as stack of extensions, these limitations do not apply. mbeddr is a very comprehensive example of this approach.

To illustrate the library vs. language extension point, I provide two examples. The first one concerns the collection in KernelF. Consider the following code:

// type inferred to list<int>
val l1 = list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)// type inferred to list<real>; results in type error for l2
val l2: list<int> = l1.where(|it > 5|).select(|it / 2|)

As you can see, the collections are generic: the list type carries the type of its elements, either explicitly specified (l2 ) or inferred (l1). However, KernelF does not generally support generic types. For example, users cannotwrite the following:

fun<type T1, type T2> typedPair(v1: T1, v2: T2): [T1, T2] = [v1, v2]

Generics are not generally necessary for DSLs. In fact, their exposure to the user will be often be confusing, and it will make the job of the language extender harder, because he has to take into account generics for all extensions. However, for collections, an explicit specification of their element type is useful and intuitive. This is why the language extension for collections supports it. In the list example above you can also see the whereand select operators; they are also language extensions, available on listtypes. These could have been implemented with extension functions in a standard library. However, because they have to work with the collections’ type parameters and because they use a particular kind of type inference for the it argument, not generally supported by KernelF, these are also built using language extension.

As a second example, take a look at the state machine example below. State machines come with a rich syntax, specific type checks, and dedicated IDE support. In the future, model checking will be available.

The second example is probably more convincing to you; it is hard to imagine how the state machines could be implemented as a library, even in a language with meaningful meta programming facilities. For the collections and their operations a language with more powerful type system could provide them as a library with the same end-user visible features. However, as mentioned above, it would lead to complications in the end-user experience in other places and for the language extender. This is why those have been implemented as language extensions as well.

Because of the ease of developing languages in a modular way, we try to separate generally useful KernelF extensions from actual customer-specific extensions when we run projects; the generally useful parts become a customer-independent KernelF extension — if you will, the equivalent of a standard library, but as languages.

The last point of comparison between libraries and language extensions is the effort to create them. For an experienced MPS developer, the development of a language extension is not significantly more effort than the effort to write a library. In addition, because language development and language use in MPS happens in the same environment, turn-around time is very quick, supporting iterative, and example-driven language development, just as if you develop a library together with representative examples of its use.

More First-class Concepts

As a consequence of the heavier reliance on language extensions, a (stack of) MPS language(s) will typically be more keyword- heavy than non-MPS languages. While this may offend the sense of style of some developers, this has two distinct advantages.

First, because more concepts are first-class, the IDE can know the semantics of those concepts and provide better support in terms of analyses. This, in turn, can be used to create meaningful error messages that align with the particular semantics of an extension. For example, in state machines, if the user creates a transition to the start state (assuming scoping allows this in the first place), an error message could read Start states cannot be used as the target of a transition, and in smaller font, below, Start states are pseudo states that are only used internally during startup of the machine. In a library-based solution, or one that relies on meta programming, very likely this problem cannot be determined statically at all, and would lead to a runtime error. Alternatively, the error message would perhaps be much more generic, as in Type StartState is not a subtype of State or something, which is also not very helpful to the end user.

Second, the language is easier to explore, primarily because code completion has more sensible things to show. In a minimal language like Scheme, the contents are essentially the completions for the basic syntactic forms such as atoms, lists or functions, plus the calls to existing functions. This makes it harder for the user to explore the things they can do with a language.

Focus on Evolution

Because languages and their extensions contain comparatively many first-class concepts, and many reflect a business domain that evolves, the languages we build with MPS also evolve quickly. Evolution in this context can mean one of two things. First, we may build additional languages on top of a core language, while keeping the core language stable; we grow the stack of languages into one or more domains.

The other notion of evolution is the actual invasive evolution of the language itself (to make it concrete: you’ll ship a new version of kernelf.jar , whereas in the extension case above you ship additional jars that rely on an unchanged kernelf.jar). If the new version is compatible with the previous version, this case is simple: just deploy the new version of the language, and users now have more features. If the new version is not backward compatible, then existing programs become invalid. For this case, MPS supports explicit language versioning. As the language developer makes a breaking change to a language, they increase the version counter and provide a migration script. When language users open existing models after the new version has been deployed into their IDE, the scripts run automatically, bringing the model up-to-date. If no algorithmic migration is feasible (because the user has to make a semantic decision not previously necessary), the recommended approach is to keep the old construct around, deprecate it, and output an error message that tells the user that he has to make a decision and migrate.

Note how this is a much more robust infrastructure for dealing with program migration than what is possible with libraries: an incompatible change prompts a generic error from the type checker or compiler, and automatic program migration is not available (outside of experimental systems). All in all, iterative development of languages is quite feasible, even when taking into account models that are “in the wild” with language users.

Recasting IDE Tools as Languages

Traditional programming systems consist of the language and libraries, the compiler and type checker, and an IDE. Many added-value services, for example, those for program understanding, testing, and debugging, are part of the IDE; more specifically, they rely on tool windows and other service-specific UI elements (buttons, menus, etc.). Because of MPS’ flexibility in how editors can be defined, we use languages and language extensions for things that would be tool windows or other IDE add-ons in classical languages and IDEs. Examples include the REPL and its rendering of structured values (1 in the picture below), the overlay of variable values over the program code during debugging (3), test coverage and other assessment results, generated test vectors and their validity state (2) and the diffing of mutated programs vs. their original in the context of mutation testing (4).

As a consequence, the notion of what constitutes a language is much broader in MPS, compared to the traditional understanding. A side-effect of this approach is that the chrome of the development environment — the set of windows, tabs, buttons, menus and such — can be reduced, because “everything happens in the editor, through typing, code completion and intentions”. Since complaints about MPS’ too cluttered tool UI is among the most-heard complaints among our users, we consider this side-effect an advantage.

More Reliance on the IDE

There is no standard for the implementation of languages, which means that, once a language is implemented with one particular language workbench, it cannot be ported to another language workbench (unless one implements it completely from scratch). This is all the more true for MPS, which, because of its particular style of language implementation is unique among language workbenches. Specifically because of its projectional editor, MPS languages cannot be used outside of the MPS tool. While this can be seen as a drawback, the flip side is that one can assume the IDE to always be present, and the language can be design assuming the IDE and its services. A few examples:

Different projection modes: Instead of making a design decision on which level of detail should be used for function signatures, the user can switch (see figure below). This is useful because users with different levels of proficiency will prefer different styles: the newbie prefers the explicitly listed types, and once one gets more proficient, one appreciates the conciseness of alternative (C).

Read-only editor contents: In many DSLs we use them to create a more form-style editor experience, with non-editable labels. In mbeddr, when a component implements an operation defined in an interface, we use a read-only projection of the operation’s signature in the implementation.

Intentions: These little in-place transformations of the program are available from a drop down menu activated with Alt-Enter (aka as Quick Fixes in Eclipse). In some languages, especially non-textual ones, these are the only way to access certain constructs — you can’t just type them. Many examples of this can be found in those languages that recast traditional IDE services (see previous paragraph). While replying on intentions might be unintuitive for text-focused programmers, we teach our users to consider the intentions menu to be an integral part of the editor experience.

Summary

Domain-specific languages in general, and our approach in particular, are a hybrid between modeling and software language engineering. From modeling we borrow declarativeness and high-level, domain specific concepts; multiple integrated languages; meta modeling for defining the structure of languages (named properties and links, inheritance, actual references); notational freedom, and in particular, diagrams. From the field of software language engineering we adopt a focus on behavior and integration of fine-grained aspects, such as expressions; actual type checking and not just constraint checks; powerful, productivity-focused IDEs; and textual languages.

We like to think that the approach combines the best of these two worlds and leads to convincing outcomes.