Jeremy W. Sherman

stay a while, and listen

sam's Structural Regular Expressions

This is the first post in a series of quotes from papers. These are the great turns of phrase, the intriguing idea I’ve run into nowhere else, the start of something that could have been great but probably fizzled.

First up: structural regular expressions, as introduced in the GUI text editor sam:

In other UNIX programs, regular expressions are used only for selection, as in the sam g command, never for extraction as in the x or y command. For example, patterns in awk are used to select lines to be operated on, but cannot be used to describe the format of the input text, or to handle newline-free text. The use of regular expressions to describe the structure of a piece of text rather than its contents, as in the x command, has been given a name: structural regular expressions. When they are composed, as in the above example, they are pleasantly expressive. Their use is discussed at greater length elsewhere.

x extracts every chunk of text matching the regex provided to it. Each chunk has the rest of the editing pipeline run on it. Want to change every n in a hunk of text to an m? Select it all in the window with button 1, focus the sam command window with button 1, and type in:

x/n/ c/m/

Hit return and this command pipeline runs on the (implicit range) dot, also known as “the current selection.” x grabs an n, c then changes it to an m. You can layer on more commands, including g (guard) as a by-the-way if statement. The command text stays in the command window in case you want to run it again.

Boom! Instant macro, no memorization of registers required. Take that, vim qX…q @X.

This search for symbiosis between mouse and keyboard is what led to sam. Most UNIX editors bolt mouse input onto an established keyboard-centric paradigm. Sam rethinks editing to make the mouse an integral part of it. (Acme would later take this mouse integration to new heights. We’ll get to acme in time.)

Back to structural regular expressions now. Pike has a whole paper on the topic that I will doubtless get to eventually. But just this little bit is tantalizing enough.

I mean, think about awk, think about how you use regular expressions there, or how you use them in your editor du jour. Are these tools really making the most of regular expressions? awk and friends just perform record splitting on a set of separator characters. Imagine how limited your regexes would be if all you got to do was specify what to stick between two character class braces: [your characters here]{1,}. That’s all you get with this simple record separator construction.

And it’s not like we’ve made great strides: Search-and-replace in an IDE like Xcode or Eclipse gives you even less expressiveness.

I look forward to reading more about structural regular expressions in future. For more on sam, see: