Regular expressions scare some people. They’re really quite warm and cuddly, or at least, conceptually very neat and tidy. If you don’t feel that way, this post is for you! Here’s how I think about regexen, in a nutshell.
I use this conception on a regular basis; when it comes to writing regex,
I think about what I want to do in this model, then translate it into whatever
regex notation the system I’m using at the time gives me.
(I do the same thing with distributed version control and relation databases,
but let’s stick to regexen for now.)
Regex Is Tiny Machines!
Regular expressions are a compact description of a symbol-matching machine.
Like, “If you see an a, then maybe a b, and then one or more c, it’s
a match!” for ab?c+.
But the machines can nest, so you can instead say stuff like, “If you see
one thing matched by this machine, then maybe one thing matched by that one,
followed by one or more things matched by that other, it’s a match!” So the
a, b, and c from the last bit could actually be bigger regular
But you have no variables in regex! So, instead, you plop the whole machine
descriptions in there in parentheses, like (…)(…)?(…)+
And repeat the description if you need the same machine twice.
Pitch in self-referentiality - “if you see exactly the same thing as you ended
up matched back there” - by using backrefs to parenthesized machines,
you’re in our modern world of extended “regular” expressions.
At that point, what we’re talking about is no longer actually expressions
describing what’s technically known as
a regular language,
but they’re exceedingly useful extensions of the notation, so no-one cares. ;)
Compact Notation, Effective Expressive Power
What makes regular expressions so useful is:
Reach: A lot of stuff we want to match against can actually be described
by them, especially when you pitch in a lot of the extended power-ups
Compactness: They’re a marvelously compact notation for what would
otherwise be a lot of very boring code! Instead of writing that code,
we dash of a regex, and we leave the translation into code to the
regular expression engine.
For the More Curious
If that’s whet your whistle, Friedl’s Mastering Regular Expressions
is excellent. And, as a bonus, you can probably just read the first few chapters and emerge enlightened. :)
P.S. You can also look at regular expressions as definitions of regular
languages - as generators rather than consumers of text. Running them
backwards like this can be a good way to think about whether a regex you’re
writing captures exactly what you’re aiming at, or whether it might include
a bit more than you intended!
P.P.S. And if you think about them in terms of machines, it’s really easy to
start thinking about how to write fast regular expressions.
Despite adopting such countermeasures, aimed at reducing researchers' bias
[when replicating a prior, baseline study], we confirmed the
baseline results: TDD does not affect testing effort, software external
quality, and developers’ productivity.
All coding is debugging
Work in small steps
Stay grounded in observed outputs
Keep good notes (tests or REPL session logs)
TDD won’t slow you down at steady state
Changing how you code to be more intentional and iterative might to
What will definitely slow you down:
Learning your tooling and the impact of that iterative approach on the
code you produce (expose those probe points for external testing! add
indicator LEDs via assertions!)
Now that we’ve got the conclusion out of the way, keep reading to see how I got
My family recently had some holiday photos taken.
The photographer was using Zenfolio to host their photos.
I loved the photos and wanted to archive the originals on my laptop (and NAS, and Amazon Photos, and Time Machine, and Carbon Copy Cloner clone, and…).
But every time I tried to download an original – of one photo, of all the
photos, makes no difference – the server always sent me an empty zipfile!
I emailed the photographer to let them know, but I wasn’t going to wait.
Rather than work around this manually by visiting each page and right-clicking
to Save As each photo – and I’m not sure that would show me the full-size image
, anyway! – I figured Zenfolio would have an API.
Sure enough, there’s a well-enough documented Zenfolio API. I was in business!
I was able to lash together some shell commands to grab my full photoset. To save you some fumbling, here’s how I did it.
The Swift standard library introduces some unfamiliar concepts if you’re coming
from Obj-C and Cocoa. map is one thing, but for some, flatMap seems
a bridge too far. It’s a question of taste, and of background, if something
comes across as a well-chosen, expressive phrase or if it just seems like
status signaling, high-falutin' bullshit.
Well, I’m not going to sort that all out, but I did find myself rewriting an
expression using a mix of if let/else into a flatMap chain recently, so
I thought I’d share how I rewrote it and why.
If you’re mystified by Optional.flatMap, read on, and you should have a good
feel for what that does in a couple minutes.
I think my big ??? is that I don’t get how to test a functional pipeline.
It seems to not having any of the seams you’d usually rely on.
Testing FP Code
Separate out pure code from impure.
Use PBT for the pure code.
Use typeclasses or protocols or similar dynamic binding methods to swizzle
I guess, use acceptance testing to check that you got the wiring to impure
stuff correct? That issue seems mostly ignored in favor of the much happier
“pure functions are easy to test” story.
In practice, I think I’m now foundering on the mess that is object-functional
blending. You’d hope that the Scala folks might have something good to stay on
that, but that’ll have to be a later round of The Internet Speaks.
Unit-testing needs seams, seams is where we prevent the execution of normal
code path and is how we achieve isolation of the class under test. seams work
through polymorphism, we override/implement class/interface and than wire
the class under test differently in order to take control of the execution
flow. With static methods there is nothing to override.
Recommends converting static methods to instance methods:
If your application has no global state than all of the input for your static
method must come from its arguments. Chances are very good that you can move
the method as an instance method to one of the method’s arguments. (As in
method(a,b) becomes a.method(b).) Once you move it you realized that that is
where the method should have been to begin with.
Says not to even consider leaf methods as OK as static, because they tend
not to remain leaves for long.
The problem manifests when we want to do the equivalent of injecting stubs
and mocks in higher-level functions: there are no seams where we can
substitute collaborator functions with stubbed ones, useful for testing. If
my function calls printf(), I cannot stub that out specifying a different
implementation (unless maybe I recompile everytime and play a lot with the
Outlines, in theory, what they would do, but have not done, for FP code:
Pass in functions to parameterize behavior:
So instead of injecting collaborators in the constructor we could provide
them as arguments, earning the ability to pass in fake functions in tests.
The upper layers can thus be insulated without problems (with this sort of
dependency injection) and there are no side effects that we have to take care
of in the tear down phase
Omits stack and logic paradigms. No surprise there.
Sums up a conversation that happens across several blogs. Weirdly omits any
links to primary sources. Yuck.
OO is rife with seams that are easy to exploit, so Feathers likes it.
Where you need a seam is a design issue:
Another blogger, Andrew, highlights that if “code isn’t factored into methods
that align with the needs of your tests”, the implementation will need to be
changed to accommodate the test. Hence, he argues as well that “thoughts
about “seams” are really just getting at the underlying issue of design for
testability”, i.e. the proper placement of seams.
But not all systems are always so designed (putting it nicely), so
“recoverability” matters: being able to make something testable in spite of
According to Feathers, even though there are alternative modules to link
against in functional languages, “it’s clunky”, with exception of Haskel
where “most of the code that you’d ever want to avoid in a test can be
sequestered in a monad”
Then there’s an argument that pushing the impurity to the edges makes
things testable. No-one addresses validating correct composition of
verified components, though. :(