Racket reader macros

Ronie Uliana
5 min readFeb 14, 2021

--

Here’s how you create a regexp in Racket:

#px”\\w+”

That suffers from the “double backslash” curse that plagues every language where regular expressions are second class citizens.

I’d like to fix that and bring them to the first class, like this:

/\w+/

For that, I need a reader macro. It controls how parsing is done in Racket. I’m learning how to do it and I’d like to share what I’ve learned so far.

First, let’s create a new package:

> raco pkg new regexp-reader

Then, let’s link the new project as a locally installed package:

> cd regexp-reader
> raco pkg install

Now, we can use the package regexp-reader as it was installed. The way we did made it linked, so every change on it is going to be reflected as soon you save the file.

Now, how to build it?

First, let’s make-believe our package is ready and write how we plan to use it.

Create a file with the following line:

#lang regexp-reader

That says that we are going to use the languageregexp-reader for our (almost empty) test file. If you execute the file, the following error message appears:

standard-module-name-resolver: collection not found
for module path: regexp-reader/lang/reader
collection: "regexp-reader/lang"
in collection directories:
...

Racket is trying to find our “language” inside the folder in bold above. Let’s not disappoint it! Create the folder lang inside regexp-readerand put reader.rkt inside it:

regexp-reader/lang/reader.rkt

The content will be just this for now:

#lang racket

Rerun your test file, and the error message now is different:

dynamic-require: name is not provided
name: 'read-syntax
module: #<resolved-module-path:"/regexp-reader/lang/reader.rkt">

The message could be a little more explicit, it means read-syntax is not provided . Let’s provide it:

regexp-reader/lang/reader.rkt

#lang racket
(provide read-syntax)

Run the test file again and let’s go for the next error:

Module Language: only a module expression is allowed, either
#lang <language-name>
or
(module <name> <language> ...)
in: #<eof>

Quite cryptic 🤔. It says it was expecting #lang something, but found the end of the file. Well, let’s edit our test file and add what it wants before #<eof> , like this:

#lang regexp-reader
#lang racket

And run it again… No errors this time 🙌

So far so good. The next step is to expand our test file to get more errors.

#lang regexp-reader
#lang racket
(regexp-match* /\w+/ "abc-def-ghi")

It says unbound identifier. That’s fair. We haven’t written any code to let it know what to do when finding the slash symbol, so Racket believes it’s just a variable’s name. We know it expects a custom read-syntax … let’s see what it does.

This link about read-syntax tells us that there is more to read in other sections. Well, down the rabbit hole we go… but I’m going to spare you from tons of reading: we need to mess with something called readtable .

What’s a readtable ?

It’s a table that tells Racket which functions to call when it’s reading each char in the program file. When reading the program, it reads a single char, looks up at the table looking for what to do with that, then applies the function using the rest of the program as an argument. The function returns something Racket can understand and the reading continues for everything that the function didn’t consume.

To be completely honest, it just asks for the readtable when reading the first char of a datum (or atom), that means the first non-space char in a word.

That’s perfect for our purposes. If the first char is a / , then we read everything until we find another slash and convert that to a regular expression.

Ok, so we need a custom readtable when the program is being read for the first time. And the function that does that is… read-syntax , exactly the one we provided but haven’t modified, yet. Perfect!

So… how to glue all that together?

I’ll tell you, it’s not as easy as I wanted. Let’s start:

In our regexp-reader/lang/reader.rkt , delete (provide read-syntax) and do this:

#lang racket(provide (rename-out [my-read-syntax read-syntax]))(define (my-read-syntax src in)
(parameterize ([current-readtable my-readtable])
(read-syntax src in)))

Forget rename-out for a moment and let’s focus on my-read-syntax . It seems complex, but it’s actually pretty dumb: we’ve created a custom readtable that calls the original read-syntax with exactly the same parameters we receive. The only difference is that it’s going to be executed using our custom readtable instead of the default one. The “magic” here is in parameterize, which is a pretty handy feature. In layman terms, it replaces the variable current-readtable by my-readtable during the execution of the originalread-syntax.

Now we can go back to rename-out , what it does is simple. Externally, our my-read-syntax will be read-syntax.

Now, we need to create our own customreadtable :

#lang racket(provide (rename-out [my-read-syntax read-syntax]));; === New here: creating our custom readtable ===
(define my-readtable
(make-readtable (current-readtable)
#\/ 'non-terminating-macro
my-reader))
(define (my-read-syntax src in)
(parameterize ([current-readtable my-readtable])
(read-syntax src in)))

We are not going to create it from scratch. The best way is just to extend the current table to do something different when it encounters / . For that, we use the function make-readtable, using (current-readtable) as the base and tell it to use my-reader when finding a slash.

The only odd thing here is 'non-terminating-macro , it just allows / to be used inside other identifiers. Read the doc for more info, it’s not that crazy, I promise 😬

We are almost there! We just need to create my-reader to parse the string after the slash and convert it to a regular expression:

#lang racket(provide (rename-out [my-read-syntax read-syntax]));; === New here: use a regexp to read the regexp ===
(define (my-reader ch in file line col pos)
(define expr-match (regexp-match #px"^(.*?)\\/" in))
(define my-regexp (bytes->string/utf-8 (second expr-match)))
(datum->syntax #f `(pregexp ,my-regexp)))
(define my-readtable
(make-readtable (current-readtable)
#\/ 'non-terminating-macro
my-reader))
(define (my-read-syntax src in)
(parameterize ([current-readtable my-readtable])
(read-syntax src in)))

Our function my-reader matches everything after the / and creates a pregexp with it. We pretty much ignore every parameter in the input, except in which is the input stream with ALL the rest of the program after the slash (not including it). So we match everything up to the next slash, convert to a UTF-8 string and return the regular expression as a syntax object.

And we are done!

I wouldn’t call what we did “production-ready”, but it’s a good start. There are tons and tons of concepts to digest here. In general, Racket is easy and has very good documentation, but it also has a lot of concepts that are unique to it and build that knowledge in our heads is not an easy task. It’s better to take small steps and improve for there.

To be honest, I’m still learning, so I’m probably doing several things wrongly here. But, step by step I get there.

--

--

No responses yet