Jeremy W. Sherman

stay a while, and listen

The Gist of Regex

Regular expressions scare some people. They’re really quite warm and cuddly, or at least, conceptually very neat and tidy. If you don’t feel that way, this post is for you! Here’s how I think about regexen, in a nutshell.

I use this conception on a regular basis; when it comes to writing regex, I think about what I want to do in this model, then translate it into whatever regex notation the system I’m using at the time gives me. (I do the same thing with distributed version control and relation databases, but let’s stick to regexen for now.)

Regex Is Tiny Machines!

Regular expressions are a compact description of a symbol-matching machine. Like, “If you see an a, then maybe a b, and then one or more c, it’s a match!” for ab?c+.

But the machines can nest, so you can instead say stuff like, “If you see one thing matched by this machine, then maybe one thing matched by that one, followed by one or more things matched by that other, it’s a match!” So the a, b, and c from the last bit could actually be bigger regular expressions themselves.

But you have no variables in regex! So, instead, you plop the whole machine descriptions in there in parentheses, like (…)(…)?(…)+ And repeat the description if you need the same machine twice.

Pitch in self-referentiality - “if you see exactly the same thing as you ended up matched back there” - by using backrefs to parenthesized machines, you’re in our modern world of extended “regular” expressions. At that point, what we’re talking about is no longer actually expressions describing what’s technically known as a regular language, but they’re exceedingly useful extensions of the notation, so no-one cares. ;)

Compact Notation, Effective Expressive Power

What makes regular expressions so useful is:

  • Reach: A lot of stuff we want to match against can actually be described by them, especially when you pitch in a lot of the extended power-ups
  • Compactness: They’re a marvelously compact notation for what would otherwise be a lot of very boring code! Instead of writing that code, we dash of a regex, and we leave the translation into code to the regular expression engine.

For the More Curious

If that’s whet your whistle, Friedl’s Mastering Regular Expressions is excellent. And, as a bonus, you can probably just read the first few chapters and emerge enlightened. :)

P.S. You can also look at regular expressions as definitions of regular languages - as generators rather than consumers of text. Running them backwards like this can be a good way to think about whether a regex you’re writing captures exactly what you’re aiming at, or whether it might include a bit more than you intended!

P.P.S. And if you think about them in terms of machines, it’s really easy to start thinking about how to write fast regular expressions.

P.P.P.S. Hat-tip to @bazbt3 over at App.Net. What is dead can never die!

Iterative Development

“At last, my current practice of writing no automated tests has the blessing of science! See, TDD doesn’t do anything!” That’s how Fucci et al.’s 2016 conference paper An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach was introduced to me.

And, indeed, it concludes like so:

Despite adopting such countermeasures, aimed at reducing researchers' bias [when replicating a prior, baseline study], we confirmed the baseline results: TDD does not affect testing effort, software external quality, and developers’ productivity.

Takeaways:

  • All coding is debugging
    • Work in small steps
    • Stay grounded in observed outputs
    • Keep good notes (tests or REPL session logs)
  • TDD won’t slow you down at steady state
    • Changing how you code to be more intentional and iterative might to start.
    • What will definitely slow you down: Learning your tooling and the impact of that iterative approach on the code you produce (expose those probe points for external testing! add indicator LEDs via assertions!)

Now that we’ve got the conclusion out of the way, keep reading to see how I got there. :)

How to Work Around an Empty Zenfolio Zip File

My family recently had some holiday photos taken. The photographer was using Zenfolio to host their photos. I loved the photos and wanted to archive the originals on my laptop (and NAS, and Amazon Photos, and Time Machine, and Carbon Copy Cloner clone, and…). But every time I tried to download an original – of one photo, of all the photos, makes no difference – the server always sent me an empty zipfile!

I emailed the photographer to let them know, but I wasn’t going to wait.

Rather than work around this manually by visiting each page and right-clicking to Save As each photo – and I’m not sure that would show me the full-size image , anyway! – I figured Zenfolio would have an API.

Sure enough, there’s a well-enough documented Zenfolio API. I was in business!

I was able to lash together some shell commands to grab my full photoset. To save you some fumbling, here’s how I did it.

A Practical Example of FlatMap

The Swift standard library introduces some unfamiliar concepts if you’re coming from Obj-C and Cocoa. map is one thing, but for some, flatMap seems a bridge too far. It’s a question of taste, and of background, if something comes across as a well-chosen, expressive phrase or if it just seems like status signaling, high-falutin' bullshit.

Well, I’m not going to sort that all out, but I did find myself rewriting an expression using a mix of if let/else into a flatMap chain recently, so I thought I’d share how I rewrote it and why.

If you’re mystified by Optional.flatMap, read on, and you should have a good feel for what that does in a couple minutes.

The Internet Speaks: Testing FP Code

One problem I have writing Swift is that I’m not really sure how to tackle testing FP-ish code using XCTest.

I did some quick Internet research. If you read it on the Internet, it must be true. This is a distillation of those great Internet truths.

The Context: Data Persistence

But first, some context. Why did I care about this?

I ran into this in the context of sorting out how to persist and restore some app data at specific “app lifecycle” hooks.

Specifically:

  • When the app backgrounds, start a background task, then serialize and write to disk, then end the task.
    • Inputs: data store, serialization strategy, where to write to
    • Outputs: updated file on disk (side effect)
  • When the app launches, block the main thread till we’ve loaded the data from disk and unpacked it. This should be fast enough. Anything else will lead to folks seeing a not-yet-ready UI.
    • Inputs: serialization strategy, where we wrote to
    • Outputs: We can see the restored DataStore (side effect)

This is very much “app lifecycle” stuff, so we want the App Delegate to do it.

What’s the cleanest code we could imagine?

1
2
3
4
5
6
bracket startBackgroundTask endBackgroundTask $
    dataStore |> serialize |> write location

deserialize(location)
|> fromJust seedDataStore
|> set dataStoreOwner .dataStore

I think my big ??? is that I don’t get how to test a functional pipeline. It seems to not having any of the seams you’d usually rely on.

Testing FP Code

Summarizing:

  • Separate out pure code from impure.
  • Use PBT for the pure code.
  • Use typeclasses or protocols or similar dynamic binding methods to swizzle impure actions.

I guess, use acceptance testing to check that you got the wiring to impure stuff correct? That issue seems mostly ignored in favor of the much happier “pure functions are easy to test” story.

In practice, I think I’m now foundering on the mess that is object-functional blending. You’d hope that the Scala folks might have something good to stay on that, but that’ll have to be a later round of The Internet Speaks.

Static Methods Are Death to Testability

http://misko.hevery.com/2008/12/15/static-methods-are-death-to-testability/

Recapitulates the problem I identified:

Unit-testing needs seams, seams is where we prevent the execution of normal code path and is how we achieve isolation of the class under test. seams work through polymorphism, we override/implement class/interface and than wire the class under test differently in order to take control of the execution flow. With static methods there is nothing to override.

Recommends converting static methods to instance methods:

If your application has no global state than all of the input for your static method must come from its arguments. Chances are very good that you can move the method as an instance method to one of the method’s arguments. (As in method(a,b) becomes a.method(b).) Once you move it you realized that that is where the method should have been to begin with.

Says not to even consider leaf methods as OK as static, because they tend not to remain leaves for long.

Unit Testing and Programming Paradigms

http://www.giorgiosironi.com/2009/11/unit-testing-and-programming-paradigms.html Identifies the same problem as you move away from leaf functions in the context of procedural programming:

The problem manifests when we want to do the equivalent of injecting stubs and mocks in higher-level functions: there are no seams where we can substitute collaborator functions with stubbed ones, useful for testing. If my function calls printf(), I cannot stub that out specifying a different implementation (unless maybe I recompile everytime and play a lot with the preprocessor).

Outlines, in theory, what they would do, but have not done, for FP code: Pass in functions to parameterize behavior:

So instead of injecting collaborators in the constructor we could provide them as arguments, earning the ability to pass in fake functions in tests. The upper layers can thus be insulated without problems (with this sort of dependency injection) and there are no side effects that we have to take care of in the tear down phase

Omits stack and logic paradigms. No surprise there.

Recoverability and Testing: OO vs FP

https://www.infoq.com/news/2008/03/revoerability-and-testing-oo-fp

Sums up a conversation that happens across several blogs. Weirdly omits any links to primary sources. Yuck.

OO is rife with seams that are easy to exploit, so Feathers likes it. Where you need a seam is a design issue:

Another blogger, Andrew, highlights that if “code isn’t factored into methods that align with the needs of your tests”, the implementation will need to be changed to accommodate the test. Hence, he argues as well that “thoughts about “seams” are really just getting at the underlying issue of design for testability”, i.e. the proper placement of seams.

But not all systems are always so designed (putting it nicely), so “recoverability” matters: being able to make something testable in spite of itself.

According to Feathers, even though there are alternative modules to link against in functional languages, “it’s clunky”, with exception of Haskel where “most of the code that you’d ever want to avoid in a test can be sequestered in a monad”

Then there’s an argument that pushing the impurity to the edges makes things testable. No-one addresses validating correct composition of verified components, though. :(

SO: Testing in Functional Programming

https://stackoverflow.com/questions/28594186/testing-in-functional-programming

Answers point out:

  • Function composition builds units, in that you can test them quickly.
  • QuickCheck/SmallCheck dodge the combinatorial explosion of codepaths that you get by composing functions.
  • Coding against a typeclass that you can swizzle out for a test one lets you stub out IO-like functions. (Or just manually pass in a dictionary type.)