Funclerative
Programming
Hello World! I have been
absent for a very long time, I was busy with another writing project that
should come to a close towards the end of this year. From then on I
will come back to writing more stuff here.
Today I want to discuss the
approach I’ve used to build the last couple of DSLs with my current customers.
It relies on the idea of “funclerative” programming, mixing functional and
declarative. It relies on KernelF, a functional base language we’ve developed
for MPS. KernelF is described in this rather
long paper.
First-class structures or
expressions?
One of the main tradeoffs
when developing DSLs is to decide on the amount of first-class domain
abstractions to use. Representing lots of aspects first class is useful for
several reasons. First, the semantics of the program is easy to analyze because
the structures and relationships of language constructs is directly encoded in
the domain-specific AST of the language. Correspondingly, code completion is
very precise and error messages can be phrased in a way that is very close to
the domain abstractions. However, the approach also has an important
disadvantage: because the structure is so specific to the domain, every change
to that structure requires the migration of existing models. While MPS has
migration facilities, this nonetheless creates a lot of friction, especially in
the early stages of language development where the understanding of the domain
evolves minute by minute and the language changes accordingly.
Functional programming is the
opposite. Except for a few top-level declarations (structs, functions, constants,
enums) everything is an expression. This highly orthogonal structure means that
you can more or less nest everything under everything. The type checker might
complain, but from a structural perspective, “everything” is an expression. So
if you extend or change the language, it is very likely that no structural
changes — and hence, no migrations — are required. Very nice, very flexible!
There is a drawback, of course: the user experience, at least for domain
experts who are not professional programmers, becomes worse: code completion
always shows a lot of stuff (because everything is structurally an expression)
and, because domain semantics is harder to analyze from a structurally more
flexible program, error messages then to become less meaningful.
So which approach do you use?
Since the customer is king, or, phrased more seriously, it’s not useful if the
language structure is convenient for the language engineer but the UX sucks for
the domain user, the option of going functional is not realistic. So you’re
forced to go with option one, with all its drawbacks? Well, not really.
Adding domain-specific
structures
I usually start with KernelF.
It is a full functional programming language. I briefly demonstrate this to the
developers and domain experts I collaborate with at my customer. I also show
them how we can directly run KernelF programs through interpreted test cases
and this way can get direct feedback on the correctness of our code.
At the time when we start
prototyping, we have usually already done a little bit of domain analysis, so
we agree on some of the main abstractions of the domain. These are often
structural: contracts, calculation trees, particular data structures or types.
So we add those as structural/declarative abstractions, often as KernelF
top-level elements (so you can write them right next to functions, structs or
enums). Inside those new structures we use of course the functional/expression
parts of KernelF: the built-in types, and all the arithmetic or comparative
expressions. I usually also immediately extend the testing framework, for
example with an expression to instantiate those domain-specific data structures
to “invoke” a calculation tree. After a couple of minutes we can execute
programs that involve the new domain-specific structures,
Restricting KernelF
However, because we use the
rather generic Type and Expression concepts
from KernelF in many places, code completion is full of “weird” stuff in the
opinion of the domain user. Some of these things are genuinely unnecessary for
the domain. For example, several DSLs built on top of KernelF do not use option
types. Or the built-in error types. We then use MPS’ can be ancestorconstraints to
prevent those from being visible to the users in code completion. This is
really important: because this constraint prevents users from entering these
concepts — by hiding them from the code completion menu — they are effectively
removed from the (user’s perception of the) language. Some of the “weird” stuff
is necessary for the domain, but the syntax is problematic. For example,
higher-order functions with their embedded lambda expressions are usually a not
acceptable, even though the functionality to filter, transform and group
collections is required in many domains. What I do in this situation is to
constrain out the default higher-order stuff but then replace it with a more
friendly (and domain-adapted) version.
Importantly, I make all of
these changes to the (user’s perception of the) language without changing the
overall structure! Everything is still made from Type and Expression. This is the
crucial point, for several reasons, which we discuss next.
Extension and Composition
First, I still have the full
power of KernelF available outside the domain-specific structures that use the can be ancestor constraints
to restrict the language. For example, I can write helper functions that use
all of KernelF. At least in the short term this is often useful to get
something to run, even though the syntax and the notion of functions might not
survive to the final version of the DSL.
Second, as users become more
familiar with the notion of a DSL and see the actual need for more
expressiveness, their objections to some of the “weird” stuff goes away. I can
then simply make the constraints more permissive to “reintroduce” concepts from
KernelF that I had previously constrained. No structural change though, no
migration necessary. Makes things very easy.
There’s a third reason why
this is cool: many of the existing KernelF extensions — for temporal types,
currencies, date and time or rational numbers — are basically just new kinds of Types and Expressions, plus the
occiasional declaration. In many cases these can then be used “as is” in the
customer’s DSL. There’s nothing like replying “well, let’s see, we already have
something …” when the customer asks for some of these extensions.
Of course, sometimes a
particular DSL will need domain-specific abstractions on top of those
extensions. For example, at one current customer we have the notion of a {monthly} function.
It is implicitly executes its body 12 times, once for each month of a
year. If you access a temporal value from within such a function, the temporal
value must be automatically reduced to a single value using a reduction
strategy defined in the data type. Sounds like gobbledigook without more
context, I know, but the point is that such special treatment can still be
built “around” the reused existing extensions.
Wrap Up
So, to summarize: use a
functional language as much as possible. Use constraints to “simplify” it for
your users instead of building separate structures because the constraining can
be undone as users become more proficient, and it allows the modular
composition of existing extensions.