When I introduce the notion of DSLs to people I often hear
the objection that they are too complicated to be used by non-programmers --
the intended target audience -- and that they'd prefer to use prose to tell the
computer what it should do. I am not totally convinced of the idea because even
if we could just write requirements in prose and then have the computer
"run" them, we'd still have to be able to express ourselves
precisely. With prose. And as we all know from human-to-humam conversations as
well as prose requirements documents, that's not so easy. In fact, one big
advantage of DSLs is that subject matter experts are able to express themselves
precisely and not just write prose that is then misunderstood by developers
when they code it up. But I digress.
In any case, ChatGPT has certainly demonstrated the ability
to come up with reasonably correct programs based on prompts written in prose.
This works well for simple problems where ChatGPT can figure out what we mean
and we are able to write complete and correct "requirements" in
prose. Plus, since ChatGPT is stateful, we can supplement additional requirements
when we see that the generated code is not exactly what we expect. We can even
run it to find out if it does what we intended.
Will this work for non-programmers? Well yes and no. They
might be able to express tax calculations or drug trial protocol specs or
milling machine control as prose, but this idea of then "looking at the
code" to see if is correct won't fly. Source code is too technical, at the
wrong level of abstraction, inaccessible to non-programmers.
That is if we generate programming language source code.
Enter DSLs, again. Andreas
Mülder recently ran a very cool experiement (and then wrote
about it on Linkedin) in which he taught ChatGPT a DSL through examples.
And then let a non-programmer -- his wife in this case -- write prose to have
ChatGPT "generate code". But the generated code was not source code
in Java or whatever, it was a program expressed in the DSL he previously taught
ChatGPT. His post prompted me to write this little article. Here are a few
observations about his experiment and where this could go.
First of all, it is pretty cool that ChatGPT is able to
learn a new language after maybe an hour of training. I was under the
impression that ChatGPT can only write about the stuff they trained it on
originally; but apparently, users can train ChatGPT by example. That is pretty
neat.
Second, generating DSL programs has a big advantage over
generating source code because it is more plausible for the (non-programmer)
user to look at the generated DSL code and see if is correct, at least if the
DSL is closely aligned with the user's domain and uses a reasonable syntax.
These are of course both criteria for any good DSL. Reading something and
checking it for at least superficial correctness ("are all the things I've
written about at least mentioned in the code?") is much easier for
non-programmers than actually writing the DSL program. More generally, this
could also be a very good training aid for DSL users: initially, when they have
no experience with the DSL, they can just ask the AI to create examples.
Third, and most importantly, I suspect using a DSL instead
of source code as the target of AI-based-prose-programming is going to work
better than generating source code directly. Why is this? ChatGPT is a language
model. It works purely based on syntactic patterns and the statistics behind
that. A DSL is basically a reification of program semantics into syntactic
patterns. A DSL removes everything from the source code that is non-essential
to what the program should do. It is the "purest" formal
representation of some behavior. There is also much less syntactic variability
in a DSL than in programming languages. From the perspective of a language
model there is much less stuff to go wrong -- there's less accidental syntactic
complexity in the generated program. So I suspect that a language model can
generate larger and more complex DSL programs (compared to GPL code).
It is also much easier to automatically check the generated
program for correctness using DSL-specific structure checkers, type checkers or
analysers and simulators. Maybe we can even pipe well-written error messages
back into ChatGPT for it to then correct the program, just as a user might do
after looking at it and identifying a problem.
So here is an approach to prose programming for
non-programmers that might actually be successful:
·
analyse the domain of what you want to generate
code about
·
factor this into a textual DSL (plus generator
or interpreter, of course)
·
train an AI language model on this language
·
let users program in prose
·
users have a fighting chance to look at the code
and give feedback
·
users can interactively run the code and see if
it works correctly, feeding back problems to ChatGPT to fix
·
and we can even feed back error messages from
type checkers and the like.
Here are a few caveats why this might not work, or at least
might not work better than if ChatGPT generated programming language source
code.
First, there will likely be much fewer training examples for
ChatGPT to learn from compared to scraping the whole internet for Java code. I
am ultimately not sure about the tradeoff between a simpler target language
with less opportunities to make mistakes and the much larger body of examples
from which to train.
The second issue is how it scales with complexity, in terms
of the DSL itself, the size and intricateness of the generate program and the
user's ability to describe real-world problems as prose. Interactive stepwise
correction through user feedback probably helps, but I am still unsure about
this. Maybe, once the user has learned the DSL by looking and correcting AI
generated code, they go back to directly writing DSL programs?
The third one is the well-known problem with ChatGPT: making
shit up and then writing about it confidently :-) But again, considering that
the generated DSL program can be more easily reviewed by the user, must comply
with a formal grammar, pass the type checker and so on might be a good way of
constraining the output, maybe even automatically.
EDIT: There's another thought that I think is worth
mentioning; the comment by Mike
Vogel below triggered this. It kinda gets back to the beginning where I
express my skepticism about whether prose programming is a worthwhile goal. The
thing is: since you're "programming" in prose, the there's no IDE
that can give you code completion. You're on your own. And you probably have to
be quite precise and consistent to deterministically make the AI understand it
correctly. I can see users creating cheat sheets of how you have to phrase
things to get particular outcomes ... which remind me of DSLs in the first
place :-)
Summing up, and despite my edit above, I actually think this could work. At the very least, we should try it out systematically. IMHO this is a very interesting field of research. Anybody want to run a research project on this? Or has somebody already started one? Let me know what you think!