Wednesday, October 22, 2025

Notes on My Little Lisp Golem

Almost 20 years ago, I wrote a very basic Lisp interpreter. It was basically Lex, Yacc, and the most basic eval. It kind of worked. However, I quickly ran into problems I didn't know how to solve without a lot of research and time. It always annoyed me that I never finished.

This past spring, I got inspired to write a basic instruction set and a stack-based virtual machine in C. It was fun thinking at that level. But I really wanted to write a basic Lisp-based machine. So I did that instead ... or at least I did the fun parts.

Writing a basic Lisp parser isn't very difficult. Designing the rest of the system is quite difficult, mostly because, like any new task, you have to make naive decisions that often have unintended or far-reaching consequences. I made a lot of messes for myself. I spent a lot of time cleaning them up. I spent a lot of time arguing and researching with ChatGPT.

Here's the result of my hobby project. It's basically just an AST-tree walker (scripting language) that drives a SECD (stack, environment, control, and dump) machine. I'm going to write some some things I learned in no particular order.

Make is a $@$<$^.

C is really hard, but I love it. There's no pleasure quite like debugging C code ... 

C++ is a very natural reaction to C. Implementing encapsulation in C is very hard. Changing anything global in C or even trying to encapsulate functionality or organize a medium size program in C is hard.

Flex and Bison are awesome, but the documentation is very hard to follow. Also, seemingly benign decisions, such as whether the grammar is left or right-recursive, ripple through the entire design. Right-recursive was the more intuitive approach, but resulted in a less efficient grammar and unnecessary list reversals during evaluation. 

Algorithms are essential rabbit holes. I spent way too much time implementing open address hashing, lists, stacks, balanced trees, reversing linked lists, etc.. I'd likely rely on GLib for the basics if I were going to do it again.

There's a vast, enormous divide between a basic enum-typed AST node/cell and a full-blown object system. I didn't go down this rabbit hole but staring down at it gave me a new appreciation for how magical modern object-based scripting languages really are compared to a basic type system. Also, I'd love to mess with GObject someday.

Object memory allocation and management strategies greatly influences the structure of your code. I went with a hierarchical, pool allocator and a mark and sweep garbage collector (mostly because I had always wanted to do a mark and sweep GC). The mark and sweep collector works well with the SECD machine design.

Variable scope is essential to a programming languages design. I think I intellectually knew that, but I got a new appreciation after implementing closures (in the VM), define, set, let, and lambda. Let and lambda are twins.

Map is awesome. 

Unrolling eval into a state machine was hard. I see why it makes sense to have a clean separation between the AST and the state machine or VM.

Last, the next level up from here is insanely impressive and humbling. The number of abstractions and implementations that have to go right to make it happen is mind bending. Additionally, the confidence that a solo creator must have to pursue the uncertain goal of designing a new programming language is inspiring, to say the least. I read a little bit about Rich Hickey's experience creating Clojure by himself and maybe got a small taste of his experience. That being said, the folks who design new programming languages are creative geniuses. 

No comments:

Post a Comment