Jeremy W. Sherman

Turning a PDF into a Coptic-bound book

Fri, 25 Mar 2022 15:54:31 +0000

It’s sometimes convenient to turn an ebook into a paper book. My running example will be Skirmish: Wallet Friendly Wargaming by Biscuit Fund Games, which I wanted to have on hand at the table.

But what’s a satisfying way to do this ebook to book conversion?

I wound up creating a Coptic-bound volume that lays flat using a couple free tools I hadn’t encountered before and some stuff I already had lying around the house. This post records my thoughts leading up to and notes during the process.

Options

Here are the options I considered:

Treat it like a short paper: Print, staple the corner, done.
- Pros:
  - Dead easy.
- Cons:
  - Can eat a lot of paper.
  - Not the most portable form-factor.
  - Could be hard to staple for larger works. Not terribly durable.
Treat it like a final paper, and stick that sucker in a plastic cover with snap-on binding.
- Pros:
  - Also pretty easy.
  - Much more durable.
- Cons:
  - Those covers are surprisingly expensive for what they are.
  - I don’t have any on hand.
  - They don’t seem to actually hold all that many pages - 12-20 max.
  - Still a huge letter-size volume.
  - That snap-on / slide thing can pop off, and then your papers get all mixed up.
Get fancier and do some bookbinding
- Pros:
  - Doesn’t really require much in the way of materials: If you have a sewing kit on hand for mending, you’re probably good to go.
  - You can readily wind up with a smaller, half-letter–size volume.
  - It’s not gonna fall apart on you.
  - You get to learn something new.
- Cons:
  - Printing gets a lot more complicated.
  - Probably not spillproof.
  - Takes more time.

I chose the bookbinding option. I actually had everything I needed on-hand already, and the crafting sounded fun.

Bookbinding Flavors

There’s a lot of ways to get a book to stick together. The main tradeoffs are around:

Do you need to do any sewing?
- Staple-bound (think of most magazines) and perfect-bound books (line ‘em up, run glue down one side - most paperbacks and thicker magazines like Asimov’s Science Fiction) can do without any sewing.
Is stuff held together using glue?
- Perfect-bound books rely entirely on glue.
- Case-bound books use sewing and gluing to create a very solid binding. But they tend to assume you have a bookpress on hand and special ribbon and book cloth for the covers and such.
What size book does it work for?
- Staple binding runs into limits on stapler size and power, for example.
Does the book lie flat?
- Most bindings don’t pull this off. But Coptic binding, which has an open back and knots the signatures together, does.

A couple decades ago, I did some very simple Japanese stab book binding of a volume I just printed off full letter-size double-sided using some random yard I had lying around and a drill press I found in my parent’s basement. The book had no cover, but it held up well. (I think I got the method from a Boy Scout magazine book at the time.)

But I really wanted the lie-flat behavior for this at-table reference work, and I wanted to do it using stuff I had lying around the house. So I wound up selecting Coptic binding.

Printing for Binding

The main complications of printing are generally:

Figuring out signature sizes
Rearranging the pages so they print 2-up in such a way to be in order when gathered into signatures. The outermost sheet has the first two pages and the last two pages of the signature on it. This might require adding extra blank pages to page at the end.

Carl McTague’s Signature Optimizer

Luckily, that is entirely handled for me in a very nice way for Coptic binding by Carl McTague’s Signature Optimizer.

If you weren’t doing Coptic binding, you could still use the LaTeX output as a pointer in the right direction to generate your signatures, but you’d probably go for regularly-sized signatures, rather than the variably-sized ones.

Briss

Unluckily, the source PDF for Skirmish is already 2-up, so I had to disassemble it before I could reassemble it for signature printing.

Fortunately, McTague mentions Briss in passing, and it proved to be just the thing to slice and dice the PDF back into 1-up for me. It was just a brew install briss away. The interface is a bit clunky; the key for me was realizing that you just click-and-sweep to add a new rect.

Reversed Pages?

I was worried after imposition using the LaTeX that the Signature Optimizer spit out that some pages were going to come out upside down. But a test printing of one signature showed that the usual long-edge flip printing worked just fine, so those fears proved unfounded.

I also was worried I’d messed something up when I found the odd-numbered pages were on the left, but that turned out to be the case in the source PDF as well, and I didn’t want to change it - there were a lot of well-designed two-page spreads that would only be a spread if you have the even numbers on the right-hand side. Unconventional, but not really a problem in the end.

LaTeX Errors?

I was also worried about a warning that was emitted over and over:

pdfTeX warning: pdflatex (file ./Skirmish-v1-1-1up-briss.pdf): PDF inclusion: m
ultiple pdfs with page group included in a single page
>] [4 <./Skirmish-v1-1-1up-briss.pdf> <./Skirmish-v1-1-1up-briss.pdf

It turned out to be an ignorable error in most cases, including this one, as determined by reviewing the output. (It would be an issue if the page groups were configured differently, which they weren’t.)

Folding the Signatures

I saw some advice to fold each sheet individually and then nest them. That sounded like a great way for me to wind up with a bunch of subtly different sheets within the signature. I just stacked ‘em up, squared ‘em up, and then folded the whole lot.

I did not trim the signature to have an even end (the inner ones poke out more the deeper in you go, since they’re stacked atop the outer). I don’t regret this in the finished product: it’s not terribly noticeable, since my cover overhangs, and the signatures slide around a bit due to the way the binding is done anyway, so even if the pages were cut within the signature, the signatures would not line up into a perfect textblock most of the time anyway.

Making the Cover

I was divided on even adding a cover. It definitely took more fiddling and time (overnight drying!) than I liked at the time, even with me cutting corners (like eyeballing the size and shape of the cardboard I chopped out of a spare cardboard box). But the firm covers contribute a lot to making the book feel like a book to me, so I think it was probably worth it.

I unfolded a signature, held it onto the side of a box, marked some points around it to provide some overhang by eye, used a straightedge to rule between the lines, then went at it with a pair of scissors. It was not actually square, but it was close enough. I then chopped it in half.

I tweaked some images from the book to make a front and back cover, then printend them on some nice report/resume paper still sitting around the house from college thesis times.

I stuck the cardboard in the middle of the paper, then cut lines straight up and down to the edge at the corners, then diagonally back and up. This is to enable folding it over onto the cardboard to wrap it up. (This would normally be done with book cloth, but I didn’t have any, and I wasn’t making any, either.) I made the cuts with a boxcutter without marking the lines first.

I used some spare kid gluesticks (the purple ones that dry clear) to coat the paper, dropped on the cardboard, folded the edges down, then smoothed it all out. One side of the cardboard was more obviously ribbed than the other, but I noticed this too late, and due to my cardboard rectangles being more trapezoidal, I couldn’t really fix this after the cuts had been made, so I just sucked it up. Luckily it was the backside, and also, it’s not really that big a deal, either.

I found some green cardstock to cover up the inside part, cut it in half with a boxcutter, coated it in glue, and pasted it on over the edges of the outer wrapper. The paper tried to curl, which made lining it up a bit hard. I then left these to dry overnight in a stack of books.

Pricking

At this point, it was time to prep for sewing. I decided to do 3 groups of 2 holes each. I eyeballed it based on where I’d placed binder clips before, moved the binder clips to the outside edges of the stacked-up signatures, then ruled a line along each intended hole position straight across the binding side. (I found out later that this hadn’t hit one of the signatures very well at all, and that one of the hole lines for some reason had scarcely marked any of them. It slowed down actually making the holes.)

I then transferred the markings to the edge of a spare piece of paper to use as a jig to mark the cover holes. The cover holes are inset a bit from the edge of the board, though not much, since I needed to ensure some overlap. The 1/8-inch inset has proven to be enough.

After that, I flipped each signature open in turn, lay it inside-down so the markings were visible, and pricked it straight through with a scratch awl down the crease. I used the paper jig to put the holes through the covers. I also flipped each signature over and made the holes from the other side, since the depth of the signatures was not great, so only the smallest bit of the scratch awl went through - not a very big hole.

Sewing

I’d been convinced that coating the thread in beeswax would make things a lot easier. (It did - the stiffer thread didn’t try to knot up against itself as badly as it would otherwise, and that’s important when you’re working with a thread that’s long enough to sew the whole binding to start with.) So I melted some beeswax pastilles we had around in a silicone cupcake liner in the microwave and let it set before I started.

Then I measured enough sewing thread length (in a nice green color to match the book cover insides) as long as each signature, plus as long as each cover. Then I doubled that, because I wanted to work with doubled thread for strength. I held it against the beeswax with my thumb, then pulled the whole length through between my thumb and the beeswax to coat it.

Thread the needle, knot the end, and then I just followed the guide I’d found, Sharilyn Miller’s The Coptic Stitch: Instructions and Illustrations (PDF). The last signature plus cover was a bit odd, and I deviated a bit in a way that mirrored more the first signature, with a separate knot between the signature and the cover, except for the first and last holes, which I did as directed.

I found I needed to take some care not to let the thread knot up on itself at first - it was a lot of thread to start! This got easier as more thread had been used up, and once I knew to watch out for it, I only fouled it up a bit a couple times more.

I sometimes missed one or two pages in the middle of the signature when poking through; it was easy to notice this and correct by just poking it through the missed holes. This is a pretty forgiving binding technique.

Using a curved needle proved to be sound advice. It made looping around the earlier stitches much, much easier.

The sewing felt like it went fairly quickly. It’s pretty easy to get into a rhythm, and aside from the start and end, it’s all the same thing over and over, so very straightforward and easy to leave and come back to.

Notes for Next Time

Cover:

It is worth doing a cover.
It’s probably worth taking a bit more time on the cover and making sure it’s square.
If I had a white box (like the kind Hallmark send their stuff in), I would probably feel OK not covering it and just pasting a label on.

Overall technique:

The one-needle approach works great; I don’t know what Make Magazine’s Coptic binding notebook project was on about with the two-needles per pair of holes thing. (Or the wood covers. That seems like the opposite of portable!)
Binding would go pretty quick (just folding, pricking, and sewing) if you didn’t do a cover; I can see why the author of the Signature Optimizer gets a lot of mileage out of this binding technique.
I don’t see much need for a bone folder; my fingernail works as well as ever it did when playing at origami.

Key Resources

Improving a testing-library test

Thu, 04 Feb 2021 13:12:31 +0000

Test clarity helps in understanding the claims being made and the various ways the test might fail. With JavaScript/TypeScript, asynchrony can make this syntactically more confusing.

So I saw this code recently:

// Test for official email address
await waitFor(() =>
  expect(screen.getByTestId("officialEmail").getAttribute("href")).toBe(
    "mailto:OFFICIAL_any_string@email.com"
  )
);

This is using testing-library‘s explicit polling waitFor to repeatedly test the predicate. If it keeps failing till a timeout, then it concludes the test failed.

Two problems:

it could do a better job of matching how users would see the content
it’s hard to read & slow

Matching how users would see the content

For this, I’ll point to Testing Library’s advice on which query to use, “About Queries: Priority”. The short version is “accessibility APIs, then visible stuff, then invisible stuff”. A test ID is firmly in the “invisible” category; this could be improved by instead searching by link, or by searching just for the text in question.

That’s not actually what I want to focus on here, though, and Testing Library covers that well enough.

Hard to read & slow

Yes, I’m counting this as one issue, because I’m blaming waitFor, and rewriting the test not to use it naturally leads to fixing both readability and slowness.

Nesting & Overhead

It’s hard to read because of the nesting. That’s a lot of syntax to do an attribute check.

Waiting for test success, not just element presence

It slow because it mixes up synchronizing with rendering (“is this thing here yet that we need to exist before the test makes sense?") and passing the test (“ok but is that thing right?"). It should only be waiting for the element to appear. But by including the test expectation within the waitFor'd predicate, when the element has rendered but the test concerning the element fails, waitFor will keep polling long past the time the test outcome could have changed: It’ll run out the clock on its timeout.

Switch to findBy

So, we separate syncing on the element’s presence from checking its content. The cleanest way to do this is to use one of the async findBy… queries, which handles the waiting on our behalf.

Syntactic gotcha WRT what to await

But there’s a syntactic gotcha; if you write:

/* DOES NOT COMPILE */
await waitFor(() =>
  expect(await screen.findByTestId('officialEmail').getAttribute('href')).toBe(
  'mailto:OFFICIAL_any_string@email.com'
  )
)
/* DOES NOT COMPILE */

then it won’t work. With TypeScript, it won’t even compile:

error TS2339: Property 'getAttribute' does not exist on type 'Promise<HTMLElement>'.

This error is informative, though:

findByTestId returns a Promise of some type. Promise does not have a getAttribute function. But the promised type does. So narrow the scope of what you’re awaiting to that particular expression using parentheses:

expect(
  (await screen.findByText("OFFICIAL_any_string@email.com")).getAttribute(
    "href"
  )
).toBe("mailto:OFFICIAL_any_string@email.com");

The more readable way to write this would be to extract the expression to a named variable:

const emailElement = await screen.findByText("OFFICIAL_any_string@email.com");
expect(emailElement.getAttribute("href")).toBe(
  "mailto:OFFICIAL_any_string@email.com"
);

Now it compiles, and we’ve fixed what we’re syncing on.

But we’re not done yet.

Getting the most out of test failure

The next step is to realize that, if this fails, it’s going to provide very poor context, because it’s just a string comparison.

TDD’s “watch it fail” step serves two purposes:

Make sure you’re actually sensing what you think you’re sensing, and that the test isn’t trivially passing.
Tune the test failure output so it’s obvious from the failure alone what went wrong, how, and where to fix it.

Those concerns apply whenever you’re writing tests; TDD just frontloads addressing them. In test-after coding, you need to ensure your automated test in fact catches what you were manually testing for before automating. With test-after, you watch it fail by breaking it on purpose:

Break the implementation and make sure your test detects it.
Trigger test failures so you can tune the failure output

I often find the test-failure tuning easier when corrupting the expectation in the test code rather than the actual implementation. It’s just easier to break all the links in the test chain that way, rather than doing it at a distance by breaking the implementation. If you started by breaking the implementation, you know the core claim of the test will be checked, so I feel it’s OK to just go straight to breaking the test itself when tuning output.

Use higher-level matchers to inject more context into the failure message

So back to the example. Currently, a failed expectation isn’t all that helpful:

expect(received).toBe(expected); // Object.is equality

Expected: "mailto:OFFICIAL_any_string!@email.com";
Received: "mailto:OFFICIAL_any_string@email.com";

This spawns some immediate questions:

Where did these strings come from?
Where in the app is the string wrong?
Why is the string wrong?

It’s a mailto: scheme, so you can think a bit and work out it’s probably an href attribute, but that takes thinking. You don’t want to spend time and effort inferring that. Push that context into the test!

Fix that by using a more contextual matcher:

expect(
  await screen.findByText("OFFICIAL_any_string@email.com")
).toHaveAttribute("href", "mailto:OFFICIAL_any_string@email.com");

And then it fails with enough information to start debugging just from the error alone:

expect(element).toHaveAttribute("href", "mailto:OFFICIAL_any_string!@email.com") // element.getAttribute("href") === "mailto:OFFICIAL_any_string!@email.com"
Expected the element to have attribute:
    href="mailto:OFFICIAL_any_string!@email.com"
Received:
    href="mailto:OFFICIAL_any_string@email.com"

You’ll note that I was triggering the failure case by intentionally corrupted the expected value, by injecting a ! into it.

Narrow the scope of element search so failed searches dump the DOM you care about

What if the element isn’t even there? How helpful is the test failure?

I corrupted the matcher and confirmed it provides some context to help, but probably I’d want to scope down to the specific component, so the HTML output is less likely to truncate before hitting the relevant part I’d want to see, and so the person debugging has less logspew to wade through.

Let’s say this was checking links in a “Contact Info” section on a “Profile” page.

In the context of a whole page, the clean way to focus the failure info would be to introduce a <section> and then pull it out using a search for the region role:

diff --git a/src/components/Profile/ProfileCard.tsx b/src/components/Profile/ProfileCard.tsx
index 9931d8d..19ab3d9 100644
--- a/src/components/Profile/ProfileCard.tsx
+++ b/src/components/Profile/ProfileCard.tsx
@@ -242,11 +242,15 @@ const ContactInfo: React.VoidFunctionComponent<{
   const addressLabel = formatAddress(address)

   return (
-    <div className={classes.contactInfo}>
+    <section
+      className={classes.contactInfo}
+      aria-labelledby="contactinfo-header"
+    >
       <Typography
         className={classes.contactInfoHeader}
         variant="h6"
         component="h3"
+        id="contactinfo-header"
       >
         {t('ProfileCard.ContactInfo.Header', 'Contact Information')}
       </Typography>
@@ -302,7 +306,7 @@ const ContactInfo: React.VoidFunctionComponent<{
           className={classes.listItem}
         />
       </List>
-    </div>
+    </section>
   )
 }

(That would probably benefit from using some flavor of unique ID generator in case multiple contact info sections got rendered, but let’s ignore that for now.)

(Another way to improve the error output would be to narrow the scope of the test: instead of testing the component as part of a whole page, test the component directly. Then, when screen.findBy barfs, the whole screen is precisely the info we want to see. You might want to do that if this component gets reused elsewhere, but for now, assume it’s an implementation detail.)

Now we can use a matcher scoped to just that region:

const contactInfo = within(
  await screen.findByRole("region", { name: /contact info/i })
);

expect(
  await contactInfo.findByText("OFFICIAL_any_string?@email.com")
).toHaveAttribute("href", "mailto:OFFICIAL_any_string!@email.com");

This uses within to narrow the queried region, which also narrows the “nothing found, here’s what is there” output usefully. Now, a failed element search dumps the entire Contact Info region to the test log, rather than the entire blessed page, so you can plainly see what’s what.

Conclusion

Be careful what you sync on, because this can needlessly slow your whole test suite.
- Syntactically, pay attention to what, precisely, you are awaiting.
Make sure the answers to your first-order debugging questions make it into your test failure messaging! (Tack stuff onto the failure exception message if worse comes to worst.)
- To do this, break your tests, observe the info they provide about how they broke, and then tune that output so all your immediate first questions are answered. Otherwise you’ll find yourself needing to dig up file and line and rummage around, or maybe you’ll be running a 15-minute build just to see new debug info because you can’t repro the issue locally. (You could also automate breaking your tests with mutation testing as with Stryker, but that’s a topic for some other day.)

A Month of Terraform

Sat, 21 Nov 2020 16:55:43 +0000

I took Heroku for granted, and a month into setting up my own infra, I now know how much it bought me.

A lot of my past work has been infrastructure-adjacent. I often find myself filling in the Build & Integration role - the person that gets continuous integration off the ground and keeps it actually continuing rather than falling flat on its face. But often I’ve just been building one of a constellation of services, so the core infrastructure was already there, or I’ve been targeting something like Heroku, where you basically pick your poison, git push, and bob’s your uncle.

This time, I’m putting the pieces together using the AWS toolkit. And to smoosh them all together, I’m using Terraform, because heck if I’m going to be hand-writing YAML or JSON and praying it’s formatted right. Plus there’s more I want to orchestrate than just AWS, like, say, GitLab.

I don’t wanna talk about AWS just now. It reminds me of learning Foundation & Cocoa - you look at one piece, and it can do so much, and then you gotta put all those individually deep & complex pieces together to do more stuff. I figure if I put in the hours reading docs, learning what’s all there, and getting stabbed by the pointy bits, it’ll probably all come out fine in the end.

So, Terraform.

The Good

It mostly works!
When it doesn’t, it generally fails in a useful way, and then I can fix it and try again.
There are docs for most things.
Autoformatting works great.
Linting works pretty well.
Terraform: Up & Running is excellent, and Terragrunt makes it even easier. Huge thanks to their team for providing the duct tape we need. 🙌

The Not So Good

terraform-lsp is supposed to provide autocomplete, but it mostly doesn’t, in my experience. First it flipped its lid that I dared to have a repo with multiple root modules in it, so I just aimed VS Code at the folder with a single root module. Then the language server says it’s all hunky dory AFAICT, and yet it autocompletes nothing beyond bare language syntax. As a result, I’m manually referencing docs and writing stuff down and wasting tons of time that tools like autocomplete and integrated linting ought to be saving me from.
State files contain secrets in plaintext. (You might enjoy the six-year-old GitHub issue about the plaintext secrets problem.) You can mark outputs as secret, so they don’t get printed at the end of applying your infra spec, but run terraform show instead of terraform apply, and there they are, staring back at you. At least you can lock down and encrypt the S3 bucket holding the state.
- Pulumi’s secrets management is far more satisfying. But Pulumi is even more cutting-edge than v0.whatever Terraform, and I expect Hashicorp to keep TF running for a good while, while I’m not so confident in Pulumi, so I’m using TF. (Hashicorp of course would recommend Vault.)
Annoying asymmetries in the language about how you *declare and reference things in slightly variant ways - I trip over these over and over as a beginner:
- You declare locals in a locals block, but you reference them as local.thing, not locals.thing.
- You declare a variable in a variable block, but you reference it as var.thing.
- You declare data sources as data "provider_thingy" "my_name_for_this_data", and then you have to access it as data.provider_thingy.my_name_for_this_data. (This is actually pretty darn consistent, at least. Though, like, why the quotes around the provider thingy?)
- You declare resources as resource "provider_thingy" "my_name". But you do NOT reference them as resource.provider_thingy.my_name. Nope, you just reference them as bare provider_thingy.my_name.
For that matter, there are other oddities as well. Pieces of syntax that seem like they should be orthogonal just aren’t. for_each stands out here:
- You can generate multiple resources by just dropping a for_each in the block: resource "provider_thing" "mine" {} becomes resource "provider_thing" "mine" { for_each = of_these }
- But nested argument blocks require conversion from like setting { namespace = "blah" } to dynamic "setting" { for_each = thingy; content { namespace = "blah" }}. Have fun looking that up a few times.
- And you can’t even use the for_each trick with module imports. It just isn’t supported. Sorry, sucks to be you.
Annoying gaps in the docs:
- Required vs optional parameters are not very clearly called out and are not at all segregated. So you get to play the game of “what is the minimal skeleton to declare this resource”. Actually running it a few times to see what you screwed up takes longer than just looking at the docs and puzzling it out, due to the lengthy iteration times in infra-land (see below).
- Types are not shown in the docs!!! All the outputs and arguments are typed. You have to declare those types. It’s right there in the code. But the docs don’t say what any of the types are. You just hit a type error at runtime. Fun fun!
- The HCL language is doc’d under the CLI tool, not in and of itself. It was really hard to actually find the docs since my first thought when I have syntax questions isn’t “let’s look at the docs for the tool.” It’d be like pulling up the manpage for GCC (carefully draw your triangle of art first) when you have a question about C syntax.
Annoying asymmetries in the AWS provider:
- Missing links: Sometimes you get into a “can’t get there from here” situation. Like trying to find the zone ID for an Elastic Beanstalk environment’s CNAME so you can aim a Route 53 alias at it. (Hint, you need a completely different resource, the aws_elastic_beanstalk_hosted_zone.)
- Irregular naming:
  - Sometimes something is zone_id, but other times it’s maybe just id.
  - Sometimes you can fish stuff out by arn, or maybe by id, or maybe it’s by name - good luck. Keep the docs close to hand.
  - (It’s totally possible this is inherited from the AWS APIs themselves, but the whole point of an abstraction layer is to make things better and more usable, dangit.)

The Different

Iteration times are way longer than with even mobile apps. Like, “you’re liable to task-switch while waiting to see plan output” longer.
Testing is a pain. I haven’t pulled in Terratest yet, because anyone maintaining this after me is unlikely to have Go experience, and my focus here isn’t building reusable infra anyway - it’s building this infra – so I’ve just been using bats and Bash shell scripts (with shellcheck, which is amazing) for some after-the-fact sanity checking using the AWS CLI. (Pro tip: Use the community-maintained fork bats-core rather than the no-longer-maintained sstephenson original.)
- Policy assertions feel like a different flavor of test, but the tooling here seems to be fairly immature, with perhaps the exception of if you’re targeting Kubernetes.

Summary

I expect I’ll get used to most of the rough edges of the syntax in another month. And Terraform is still v0, so hey, maybe some breaking changes will clear all this mess away. 🤞

I’m intentionally not getting sucked into hacking around the docs frustrations just now. Or even the very tempting open issue about silencing all the Terragrunt logspew.

I do plan to spend a bit of time trying to get autocomplete working for resource and data source types and their arguments/attributes from the language server, at least. That would be a huge help.

It still feels like magic to run a command and have infrastructure just…happen. You hit return, wait a bit, and suddenly servers are serving and domains are aliasing and a whole constellation of systems are interoperating. It kinda reminds me of the magic of home automation with blinkenlights, only without any of that messy “hardware” stuff to break on you.

Collected TILs

Sat, 10 Oct 2020 21:05:39 +0000

This post collects a number of “Today I Learned” messages I’d previously sent into a Slack workspace.

Git

TIL: Two ways to change git’s default comment handling so I can write Markdown headings:
- Use the –cleanup flag for git commit to set a different cleanup approach, like git commit –cleanup=scissors
- Change the comment character to something else with git config core.commentChar ; https://stackoverflow.com/questions/2788092/start-a-git-commit-message-with-a-hashmark
TIL: You can ask git to show a file as of a specific version REVSPEC, and less to add line numbers and jump right to your target line, like so:
```
git show REVSPEC:PATH_RELATIVE_TO_GIT_ROOT \
| less -N +164g
```
TIL: how to see a merge’s conflicts & resolutions: if hash is the merge commit, run either git diff hash hash^1 hash^2 or git diff-tree --cc hash. (for me, former will paginate, latter will not.)
TIL: I could have just aimed core.hooksPath to a shared hook folder, rather than linking-in my prepare-commit-message hook script to many folders. (Though I do wonder if Husky handles that bit of git configurability at all.)
TIL: git comes with a perl module, and git add –interactive (and so –patch) is a perl script.

why did i learn this? because i wondered how to sniff the comment char in use. even the git perl module punts and falls back to # when it’s auto, though.
TIL: git whatchanged is a thing. it’s like git log --name-status only less typing, and with file modes in your face.
TIL: VSCode’s GitLens has a “heat map” feature that shows Edit Wear in your editor.
TIL: to force an update with pull without it trying to do some derpy merge, i can git fetch then git reset --hard FETCH_HEAD, without needing to worry about what the actual branch name is that i’m on.
TIL: Mercurial has a nifty “ignore this reformatting” cue to its blame engine. throw # skip-blame reason in the commit message, and hg praise will ignore that commit.

a sub-issue of adding all the b prefixes and reformatting code is that it would break annotate/blame more than was tolerable. The latter issue was addressed by teaching Mercurial’s annotate/blame feature to skip revisions. The project now has a convention of annotating commit messages with # skip-blame <reason> so structural only changes can easily be ignored when performing an annotate/blame.

https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/
TIL: git blame since v2.23 can ignore a list of commits. pass the list one-by-one with --ignore-rev=REV, or en masse with --ignore-revs-file FILE, which admits comments. you can automate the process by configuring blame.ignoreRevsFile; a popular filename is .git-blame-ignore-revs. (i went looking for git functionality after the hg one and found this summary.)
TIL: git log at last applies the .mailmap rewrites previously only applied by shortlog. ah, a tidy history.
TIL: to list your branches ordered from most-recently committed to to least:
```
git branch --sort=-committerdate
```
(handy when you need to go 2 branches back, so can’t just use git checkout -)
TIL: how to word-wrap commit messages in Fork: Preferences > Show page guideline then Right click on commit description field -> Wrap Paragraph at Ruler https://github.com/ForkIssues/Tracker/issues/879

GitHub

TIL: you can use .gitattributes to fix up weird language use listings on GitHub https://github.com/github/linguist/blob/master/README.md#overrides
- fixes: x is language Y: someGlob linguist-language=Y
- ignores: treat x as vendored, generated, or docs: someGlob linguist-{vendored,generated,documentation}=true - or just skip detecting it for whatever reason: linguist-detectable=false
- missed: x should be considered detectable linguist-detectable=true
TIL: there are a lot of restrictions on code search on github, which is why its code search always seems so lousy next to cloning and searching locally https://help.github.com/articles/searching-code/#considerations-for-code-search
TIL: if you use AsciiDoc on GitHub, you can just write :toc: to get a table of contents. beats the pants off manual doctoc’ing!
TIL: github knows how to attribute commits to multiple authors via a \r\rCo-authored-by: NAME2 <EMAIL2>\rCo-authored-by: NAME3 <EMAIL3>\r… convention in the commit message body. https://help.github.com/en/articles/creating-a-commit-with-multiple-authors

via: https://tuple.app/pair-programming-guide/template#4-configure-git-to-share-credit
TIL: github has pretty nice web preview for CSV files. renders as a table, bolds the first line, and lets you search within the table.
TIL: github now supports multiple issue & pr templates. but really, i’m just here to link you to a CYOA template builder: https://www.talater.com/open-source-templates/#/
TIL: if you fork a GitHub repo using large file storage, the parent repo gets to pay for all the storage & bandwidth.
- note to self: never GLFS in a public repo.
In forks, bandwidth and storage usage count against the root of the repository network. (https://help.github.com/en/articles/about-storage-and-bandwidth-usage)

Trivia

TIL: The origin story of Hamcrest (an anagram of “matchers”): Folks wanted to use jMock2’s Matchers in production code, but felt weird having to pull in a test-centric lib as part of their prod code. http://www.natpryce.com/articles/000662.html
TIL: Lewis Carroll found a workaround for burning the midnight oil. “Carroll invented Nyctography because he was often awakened during the night with thoughts that needed to be written down immediately, and didn’t want to go through the lengthy process of lighting a lamp just to have to extinguish it shortly thereafter. (https://en.wikipedia.org/wiki/Nyctography)"
TIL: the US actually minted some half-cent coins https://en.wikipedia.org/wiki/Half_cent_(United_States_coin)
TIL: JIRA is a shortened form of “Gojira” derived after riffing on Bugzilla https://confluence.atlassian.com/pages/viewpage.action?pageId=223219957
TIL: Poor ventilation in conf rooms can have cognitive impacts:

three people quietly sitting in a mid-size conference room produced CO2 levels that within 60 minutes, reached concentrations high enough to impair their ability to make the right decisions. In a corporate world where Directors regularly pile into closed-door conference rooms for hours on end, making the most important planning decisions for their companies, this was a disturbing realization. (http://blog.gigabase.org/en/contents/132)

-TIL: inst, ult, and prox http://worldwidewords.org/weirdwords/ww-ult2.htm (this month, last month, next month). they were disturbingly common in 19th century business letters, and shorthand learning materials of the same period. “i am in receipt of your esteemed favor of 13th ult.”

TIL: you can get little bumpers for your keys to deafen the thunderous typing. everyone i video call with will benefit from this. >.<
TIL: “to be across” is Aussie biztalk for “know about, be informed of, understand” or similar https://english.stackexchange.com/questions/122379/usage-of-to-be-across
TIL: How to use Seyès paper: caps go up 3, loops go up 3, sticks go up 2, x-height is 1, stick descenders down 1, loop descenders down 2. https://www.crapaud-chameau.com/2017/04/apprendre-a-bien-ecrire-en-cursive.html (the vertical lines are tabstops, if you’re laying out tables)
TIL: sometimes a pug is a moth https://en.wikipedia.org/wiki/Foxglove_pug

ominous:

Soon after hatching it seals the mouth of the flower with silk and feeds on the reproductive parts of the flower.
TIL: “After many incidents of serious flooding throughout the 19th century, the owners of the two-, three-, and four-story buildings in Sacramento simply abandoned the ground floors and constructed raised sidewalks level with the first floor. Over the years the roadways were also raised to just below the new sidewalk level, effectively elevating the entire town.” https://www.atlasobscura.com/places/the-original-street-level-of-sacramento-sacramento-california

h/t David M. Deller
TIL: some new sales jargon:
- PBO: positive business outcome
- EB: economic buyer - the person who can release funding to pay for your services
- RC: required capabilities - skills/resources needed to reach a PBO
- proof point: a fact you can point to to support a claim about your capabilities and differentiators, such as a case study or customer reference
(most of these seem pretty self-explanatory to me, but that someone is motivated to abbreviate these specific things vs others makes me re-evaluate the importance of those terms, and my fluency with them, in sales thinking.)
TIL: https://en.wikipedia.org/wiki/Kin-Ball is a thing. 48-inch ball, three teams on the floor.
TIL: there’s a book of quotes roasting french authors, the “dictionary of literary insults” https://www.amazon.com/Dictionnaire-Injures-Litteraires-French-Chalmin/dp/2253162361/

“Mallarmé, untranslatable, even in French”
TIL: “ratioing”: when you get twice as many replies as retweets. https://www.merriam-webster.com/words-at-play/words-were-watching-ratio-ratioed-ratioing

(if anything ever said “twitter is not a platform for discussion”, considering replies as a negative may be that…)

CLI Fun

TIL: a bash for loop defaults to iterating over the positional arguments if you omit the in part. thanks, perplexing example from getopt(1)!

also, argparsing in bash is bad enough that https://argbash.io/ to generate all the boilerplate for you exists.

there’s also a bash-builtin getopts vs standalone getopt utility low-stakes holy war apparently. i’ll chuck that in the bin next to emacs vs vi and tabs vs spaces for a later date.
TIL: Bash will only load aliases in an interactive shell, not just a login shell:
- bash -lc "some-alias" gives “bash: some-alias: command not found”.
- bash -ilc "some-alias" works a treat.
TIL: the “run alpine/socat to bridge to an exposed but unpublished container port” Docker trick. lets you basically retroactively publish it without having to re-run the container with new config. See: https://stackoverflow.com/a/49903374/72508 (which may have my edits applied already)
TIL the xnu kernel has 3 flavors of use-after-free detection in addition to its own guard malloc support. neat subsystem. :)
TIL: If all else fails for entering something into vim, you can use Ctrl-V u codepoint to enter it interactively, or if you’re composing a regex, you can encode it using something like \%u1234. (That last one is handy for copy-paste or scripted use.)

and you can re-learn the first one in :help unicode or more specifically :help utf-8-typing, and the second one in :help regex, specifically :help pattern-overview. (as i have done several times now. here’s hoping TILing it makes it stick. >.<)
TIL: ld (the linker) still has cool tricks up its sleeve. ld can just…wrap a binary into a .o with start and end pointers as its symbols: embedding binary objects in c
TIL: Powershell aliases wget and curl to iwr aka Invoke-Web-Request. That plus help is enough to nudge you the right direction.

Postgres

TIL: DBFiddle is a thing. gives you an online workbench/demo of querying using a specific DB engine, like jsfiddle, but for databases.
TIL: allballs is a special value that Postgres hardcodes to the UTC time of day 00:00:justkeepwriting0 https://www.postgresql.org/docs/8.3/datatype-datetime.html#AEN5025 (edited)
TIL: using psql variables (h/t Liv Vitale for prompting me to review this)

\set variable [blah…] – create or update a variable. multiple args will be concatenated. no arg sets to empty string or, for control variables, is equivalent to \set VAR on. \echo :variable – view the value of that variable \set – view all variables and their values :{?variable} – returns TRUE if variable exists, FALSE if not :'variable' – quoted as a literal, such as a string or number :"variable" – quoted as an identifier, such as a table name \unset variable

NOTE: psql reserves variables comprising only uppercase alphanumerics. stick to lowercase and you’ll be fine.

Bulk insert trick:
```
\set content `cat my_file.txt`
INSERT INTO my_table VALUES (:'content');
```
References: https://www.postgresql.org/docs/11/app-psql.html#APP-PSQL-VARIABLES https://www.postgresql.org/docs/11/app-psql.html#APP-PSQL-INTERPOLATION

Webdev

TIL: TypeScript’s Record<KeyType, ValueType> can be used to statically require an enum lookup table be exhaustive. https://www.typescriptlang.org/docs/handbook/advanced-types.html#mapped-types

so this enum decl + incomplete map declared as a Record:
```
export enum EventCode {
  Click = 'email_click',
  Reply = 'email_reply',
}
export const Color: Record<EventCode, string> = {
  [EventCode.Click]: 'purple',
}
```
produces a compiler error due to the missing enum case:
EventCode.ts:24:14 - error TS2741: Property 'email_reply' is missing in type '{ [EventCode.Click]: string; }' but required in type 'Record<EventCode, string>'.
```
    24 export const Color: Record<EventCode, string> = {
                    ~~~~~
```
TIL: you can do all kinds of evil things with VSCode tasks (like rig ‘em up to ssh into a vagrant vm and run a docker exec command to trigger your tests)
TIL: android chrome, when typed into using google’s software keyboard, doesn’t on providing any meaningful info on the key pressed to onkeydown and onkeyup, which sure seem like they ought to be getting some info about a key. the events are still sent, but not anything about the keys involved. further eroding the foundations on which my precarious frontend web reality is built. https://bugs.chromium.org/p/chromium/issues/detail?id=118639#c261
TIL: typescript can smartly narrow the type after type-checking the discriminator in a discriminated union, but only if the discriminator is a single type in each branch. it’s not smart enough to handle a union in one case.

so this plays nice:
```
@typedef {
  {flavor: 'date', value: Date}
| {flavor: 'time', value: Date}
| {flavor: 'text', value: string}
} Filter
```
and you get this desirable smartcast behavior: if (it.flavor === 'date') { /* now it.value is a Date here */ }

but this otherwise equivalent version does not:
```
@typedef {
  {flavor: 'date' | 'time', value: Date}
| {flavor: 'text', value: string}
} Filter
```
in this case, you get: if (it.flavor === 'date') {/* it.value is still 'date' | 'time' | 'text' :( */ }
TIL: OpenAPI specs can handle XML pretty well these days. you can rename something by adding a child {xml: {name: 'whatevs'}}, and flag a property as an attribute with {xml: {attribute: true}}. https://swagger.io/docs/specification/data-models/representing-xml/
TIL: how YAML’s many string syntaxes play out in practice through a convenient single-issue site: https://yaml-multiline.info/

(In the past, I’ve looked at https://camel.readthedocs.io/en/latest/yamlref.html. That’s still really valuable for YAML in general, but for string formatting, this new site is much more to the point.)

Slack

TIL: Slack now reminds you of the other party’s current time in a DM. (Not in a group DM, though, so ping the heck out of a combined Pacific and Mountain Time crowd at 0900 Eastern if you want.)
TIL: In Slack, /remind snooze DURATION is a thing. should be handy for my next pto.
TIL: slack will keep a specific whitelist of apps alive even after the member who wired them into a workspace leaves. this whitelist includes mission-critical apps like CI, performance alerting, communication tooling, and the one your workspace clearly cannot survive without for even a second: giphy. >.> https://slack.com/help/articles/360000446446-Manage-deactivated-members-apps-and-integrations-Manage-deactivated-members-apps-and-integrations
TIL: that Slack finally added actual hyperlinks. (at least with their new text editor - not with markdown.) the niftiest way is to select a region of the message you’re editing, then paste a URL to linkify it.
TIL: creating a scheduled time and date workflow in Slack is dead easy. (but you can’t use the fancy new markup in the messages you send from a workflow, so no pretty hyperlinks, alas. and unlike with reminders, mentions don’t work, so no @here to wake up the channel for a scheduled meeting.)
TIL: the secret slack shortcut to pick up a search where you left off is Cmd-g:

When you find yourself in a situation where you need to return to a recent search, a function you may find useful would be to use the keyboard shortcut: CMD+G. This shortcut opens the previous search results screen if there was a cached search within the last 5 minutes. If there is no recent search cached, the search window will be pulled up blank for a quick clean search.
TIL: Cmd-. opens/closes the right sidebar in Slack. (Not what I would have guessed, being used to that keycombo’s “cancel current operation” classic Mac mapping.)

(There’s no menu item, but at least this one is actually doc’d in the in-app Cmd-? list, unlike the Cmd-g one to reopen the search dialog with the contents of a recent search.)

Other Software

TIL: If you click on a label swatch in Trello, they expand to show the text for the label on the cards. So much more readable at a glance!
TIL: Exchange encrypted messaging is darn clever. “Non-Office 365 message recipients can authenticate and read protected messages using their consumer Google or Yahoo accounts, in addition to a One-Time Passcode and a Microsoft account.” (you click a link in the email, get bounced to a webpage, auth, and boom, there’s your message) https://products.office.com/en-us/exchange/office-365-message-encryption
TIL: how to boss jira around with commit messages like i do pivotal: https://confluence.atlassian.com/jirasoftwarecloud/processing-issues-with-smart-commits-788960027.html

(didn’t know you could do that with pivotal, either? check out https://www.pivotaltracker.com/help/articles/github_integration/#using-the-github-integration-commits )

want to automate that by copying it in from the branch name? look into the prepare-commit-msg git hook
TIL: Vivaldi has named tab session support https://help.vivaldi.com/article/session-management/ (resembles tab groups in firefox when uing the simple tab groups extension)
TIL: i misunderstood SalesForce relative date filters - the relative times are ranges, not instants:
- mistaken instant interpretation: i thought “date < NEXT 3 WEEKS” meant “date is before 3 weeks from now”, which would include events happening during the next 3 weeks. :x:
- but actually “date < NEXT 3 WEEKS” means “date is before the time range beginning midnight of the first day of next week and stretching for the 3 weeks after”, so “< NEXT 3 WEEKS” actually means “does not happen during the next 3 weeks.” !
(= with these ranges acts like “falls in” or “is during”, so trading < out for <= fixed my reporting bug.)

Read more: https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_dateformats.htm
TIL: you can get focus follows mouse on macOS if you use voiceover. Open VoiceOver Utility, pick Navigation from the sidebar, then check “Synchronize keyboard focus and VoiceOver cursor” and pick “Mouse pointer: Moves VoiceOver cursor”.
TIL: zoom supports chatbots for zoom chat https://marketplace.zoom.us/docs/guides/chatbots/sending-messages
TIL: if you search for “followup:actionitems” in Drive, it’ll list all the docs with comments assigned to you for action.
TIL: Pro Mouse’s overlays survive screensharing, so if you regularly need to do callouts on your screen, that might be handy. h/t Jacob Bullock
TIL: how to submit a Cigna Vision claim online. “Find a Form” only gives you a paper PDF. instead:
- log into cigna.com
- select Coverage > Vision
- Visit Cigna Vision
- select tab Claims & Reimbursement
- expand the third accordion, “Customer Reimbursement Form”
- click “Continue”
if you’re lucky, this’ll just jump you straight to the start of the form: https://cigna.vsp.com/out-of-network1.html there’s an attachment limit of 3 files with a max of 5 MB per file and some filename restrictions (they don’t like $). but it’s way easier than mailing a paper form.
TIL: Pivotal Tracker can be told about deployments. https://www.pivotaltracker.com/blog/2020-01-24-ci-cd-integrations (The setup tool will write Jenkins Pipeline or Bash snippets for you, then you just paste ‘em into the right places.)

Bad Jokes

TIL: https://meet.google.com/ia-cthulhu-phtagn 404s. alas, He slumbers still.

Google Room and The Definitive Guide to SQLite

Sat, 01 Dec 2018 23:27:39 +0000

Why read a book on SQLite?

The short answer: Because I needed to write a migration for a Google Room database.

Limited past exposure

I’ve mostly worked on Mac & iOS apps:

iOS: The data all lives server-side, and persisting it is someone else’s problem.
macOS: A database, even an embedded one like SQLite, just never was the right tool.

What do you mean, not the right tool?

Some folks reach for a database straight out of the gate. (Most backend frameworks, like Rails, sure seem pretty keen on baking in that architectural decision.)

But simple file serialization is often a great choice:

easy to work with (cat, ls, rg)
simple to manage (rm, cp, mv, ed)
easy to backup and restore
no DBA headaches
good enough surprisingly often

Most static blog generators seem to agree with this sentiment.

Even in cases where you think you might eventually need a database, if you don’t jump to conclusions, you may just never get pushed into that corner. One example is as Robert Martin tells of the acceptance testing tool FitNesse: they put off picking a DB for a couple years, then ultimately realized the project didn’t need one after all. It shipped as, and still is, a flat-file wiki.

Yes, SQLite would still have DBA headaches

You’d think it wouldn’t. You might think the same thing about a Berkeley DB key–value store. But the problems still sneak up on ya. Don’t take my word for it – consider MJD’s thoughts at the end of a bug hunt undertaken to get his blog generator populating the “subtopics” sidebar again:

I am sick of DB files. I am never using them again. I have been bitten too many times. From now on I am doing the smart thing, by which I mean the dumb thing, the worse-is-better thing: I will read a plain text file into memory, modify it, and write out the modified version whem I am done. It will be simple to debug the code and simple to modify the database.

What about CoreData?

Cocoa’s CoreData defaults to using SQLite as an implementation detail in its quest to simultaneously provide object graph persistence. But that’s an implementation detail: the file format is undocumented and subject to change at Apple’s whim. (As a bonus, you get platform data-lock from this. I’ve had cases where CoreData would have made sense, except that the file format needed to work outside of Apple platforms. Ah, well.) And the file format issues are in addition to the fun with thread containment, though that’s gotten to be substantially less of a problem over the years. So I had no reason to get cozy with SQLite to date, due to CoreData abstracting it entirely, and due to even CoreData winding up not being the right tool for me entirely too often (did I mention I like flat files and avoiding data lock-in?).

But Now, Room

But Google’s blessed persistence framework is Room. Unlike CoreData, Room is explicitly a SQLite wrapper. It aims to smooth some rough edges (writing DDL, un/marshaling data between rows and POJOs) and codify best practices around using SQLite on a (maybe pretty crummy) mobile device.

Some Rough Edges Smoothed

It’ll write DDL for you, based on your entity classes, so you don’t have to. (You can even crib from this when writing your migration.)
It’ll check your query syntax at compile-time, so you don’t have to wait till runtime for an attempted query to blow up in your face.
It’ll save off the schema description for you as a simple numbered JSON file, so you can look at it, and so the provided test tooling can help you easily test your migrations.
It’ll un/marshal data between database rows and POJOs (or POKOs, I suppose, with Kotlin).
It’ll vend a reactive stream for your query (by watching the table, rerunning the query, and pushing out the new value), so you can easily bind your UI to the DB. You got your choice of receiving your reactive stream as the very rich RxJava 2 data types or as Google’s simpler LiveData.

Some Best Practices Codified

You can’t hang yourself with an N+1 query, because it just won’t do them for you.
Room will throw an exception if you try to work with the DB on the UI thread. No magic debugging preference is required; it just does this, all the time. (Though I’ll grant that its exception doesn’t have anywhere near the panache of __Multithreading_Violation_AllThatIsLeftToUsIsHonor__.)
Making sure you don’t accidentally change something and forget to bump your schema version and provide a migration

Some Rough Edges Remain

It’ll generate the schema for you, but to move between schemas, you’ve gotta write the up-migration by hand. (You don’t have to write a down-migration. Room doesn’t support down-migration. Ever onward!)

So that’s what led me to read through a book on SQLite over a couple evenings. What follows are the notes I jotted down when I finished using LogsIt.

Aside: Room’s Design Is Swell

I really like Room’s design. It builds on top of a rock-solid and well-understood piece of tech (SQLite), there’s zero magic, you can readily dump the DB and poke around using the sqlite3 CLI tool to test and explore your queries, it codifies rather than prescribes best practice…

My favoritest thing, though, is that it vends POJOs. You don’t have to wed the heart of your app to a vendor framework just to get easy queries and streaming UI updates for free. You don’t have thread-bound crash-bombs lobbing through 99% of your app, or fight to keep it at arm’s length (as Dave DeLong advocates in point 8 of “The Laws of Core Data”) to avoid that. There’s no “will it fault? will it boom?” concern. They’re just objects. Plain, simple objects.

Anyway, on to the reading notes.

Reading Notes

Read a Book: November 14, 2018 at 21:57 Notes: Mike Owens, Grant Allen. The Definitive Guide to SQLite, Second Edition. Apress, November 2010. 9781430232254. Via Safari.

Read for background when writing migrations for Android Room.

Skipped in-depth coverage of C API, of other language bindings, and of the iOS & Android walkthroughs. Also kinda glazed over the shared cache stuff.

New to me:

Manifest typing and coercions. Odd middle ground between strict and duck typing that ends up as “will store anything you throw at it, but acts like strict if you do too”.
Blobs degrade to a linked list when they exceed the db pagesize.
Gory details of locking schemes and failure modes. You can deadlock yourself if you use multiple connections. Fun times!
Limited use of indexes. Only plays in with equality tests in order. Skipping an indexed column or testing a different relation will push the rest to linear scans.
You prolly want to “begin immediate” for write workloads.
.dump to backup a db and .read to restore.
Some SQL I wasn’t super aware of: case, is/not null, nullif() and coalesce(), create table as select, everywhere you can ab/use subqueries, and the overall “fixed function pipeline” (to borrow an idea from older OpenGL) for queries, esp wrt group-by and having and aggregate functions. And how the temp_store supports that.
The execSql wrapper can take a list of semicolon-separated statements.
Enabling WAL can fail: Maybe the fs lacks a required feature.
WAL vs rollback journal tradeoffs. WAL reduces contention to writer-writer, though a checkpoint can trip a longer than desired pause, and an old but still open read transaction can prevent flushing data to the actual DB (and elevate read times due to seeking through scads of pages).
Don’t access a SQLite DB over a network filesystem. Just don’t.
Bump your cache to match largest write workload to keep exclusive lock used only for flushing. Use the analyzer to see how many pages that is.
Maybe match the DB pagesize to OS. It’s 1024 by default, or half a Darwin page.
You can easily throw in custom functions, aggregates, and collations. SQLite is even more flexible than I knew! (Pair with check constraints for superpowers.)
Android CursorWindow maxes out at 1 MB of data. You can crash if you try to work with too much data.
Harder to discover introspection features:
- sqlite_master_table (not listed in .tables)
- pragmas table_info, index_list, index_info, database_list
- hooks and especially sqlite_trace()
Many things about bound params:
- Manually numbered bound params: ?1, ?2. Lets you reuse an arg without having to bind it multiple times.
- Named bound params. :name or @name. You have to resolve them to numbers with an API call, but that’s way more readable. And plays nice with dictionaries.
- TCL bound params. $name. Says “capture from scope”. May only be used by the TCL library?

What could MPI's evolution say about when to use CSP-style channels?

Thu, 27 Sep 2018 14:17:11 +0000

The reasons MPI is moving from two-sided to one-sided communication may be interesting in light of programs that may use CSP-style channels for communication rather than synchronization/signaling:

The main communication paradigm for MPI point-to-point communication has been two-sided communication, where a send call at the source is matched by a receive call at the destination. This paradigm has weaknesses: The complex matching rules of sends to receives result in significant software overheads, especially for receive operations; overlap of communication and computation requires the presence of an asynchronous communication agent that can poll queues concurrently with ongoing computation; and send-receive communication either requires an extra copying of messages (eager protocol) or extra handshakes between sender and receiver (rendezvous protocol).

(Marc Snir in “Technical Perspective: The Future of MPI”, CACM Oct 2018)

Mixed-Language Testing: Fixing "fatal error: 'App-Swift.h' file not found"

Mon, 07 May 2018 18:05:21 +0000

If you have an Obj-C project that’s grown some Swift, you might find yourself with some Obj-C test code that needs to talk to some Swift code.

You’re likely to run into just one little problem. You can’t import the header that actually declares the Obj-C interfaces to all that Swift code.

More precisely, the compiler can’t find the header, even though you know it’s there, because Obj-C code in your main app is using it left and right.

You’re looking at an error like this:

In file included from
    /Users/jeremy/Downloads/MixedLanguageTesting/MixedLanguageTestingTests/ObjCTests.m:10:
/Users/jeremy/Downloads/MixedLanguageTesting/MixedLanguageTesting/ObjCClassUsingSwiftClass.h:15:9:
fatal error: 'MixedLanguageTesting-Swift.h' file not found
#import "MixedLanguageTesting-Swift.h"
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        1 error generated.

No good, that. Luckily, the Usual Fix works.

The Usual Fix: Fix your header search path!

A file is there. A compiler cannot find the file. You need to tell the compiler where to look to find the file. That is what search paths do.

In particular, you probably want USER_HEADER_SEARCH_PATH here, which tells the compiler what paths to hunt through when searching for user, rather than system, headers. (Those would be the ones you import like "user.h" versus <system.h>, hence the actual build flag’s name, -iquote.)

That path: The app target’s derived sources directory

You need your test target to have a look through the app target’s derived sources directory.

To do that, edit your project settings. (Or each and every test target’s settings. But that just sounds painful.) Search for the USER_HEADER_SEARCH_PATH setting, open up the setting, and add this line:

“$(CONFIGURATION_TEMP_DIR)/MixedLanguageTesting.build/DerivedSources"

Only, rewrite that “MixedLanguageTesting” bit to match your app target’s name.

Behind the Scenes

If you want to see how I arrived at this solution, I happen to have checkpointed my debugging steps using git. You’ll find my trail over at GitHub at jeremy-w/MixedLanguageTesting.

Downtime & Recovery: Bludgeoning DigitalOcean Arch into Working Again

Tue, 01 May 2018 15:09:10 +0000

This blog had some downtime yesterday.

I have a DigitalOcean Droplet that’s an Arch x86-64 system, from back before they dropped Arch.

When I went to poke certbot into renewing my LetsEncrypt cert, I found the system was down. Power cycle, and…it’s still down.

Console Spew: Not Promising

What follows are my notes from getting things working again. I ran into a couple dark corners where I found no search hits, so maybe this’ll help the next poor sap.

Goal: Blog Accessible from Phone

The main thing the VPS runs is my blog. The other main thing is ZNC (an IRC bouncer). I really just care about getting the blog back up immediately, though. As far as I’m concerned, that means my phone connected to the cellular network can resolve the domain and see the latest version of the blog content.

Culprit: Systemd Is Hosed

My systemd got hosed somehow when the system tried to boot back up. I reach this conclusion from the fact that all the console errors are from systemd, and that a systemd that freezes execution means a system bootstrap that froze execution.

This looks to be the same issue described by a DigitalOcean Community user.

Bad news: No-one had a solution beyond “nuke & pave”. Hooray.

For added fun, I’m traveling and away from my backups. All I have is an ancient snapshot from 2014.

Maybe that’s all I need?

Nuke & Pave? Ancient Droplet Strikes Again

Bah, enough with an unsupported platform! I don’t get any of the new DO goodies with Arch.

What about just switching to FreeBSD?

I tried to flash with FreeBSD, but turns out you can’t rebuild it with a modern Droplet image unless it was seeded with an SSH key to begin with. You get a fun flash message at a weird place on the screen (or eventually, off the screen, if you keep trying long enough):

“Data image requires at least one SSH key”

I sure as heck have an SSH key on file with them, and I confirmed the (MD5? not SHA-256? Huh) fingerprint as matching the one I have on disk here still.

But that wasn’t quite what was meant. As DigitalOcean support says:

Sadly unless the Droplet was originally created using the SSH key in your Cloud panel, you won’t be able to rebuild it in place with the FreeBSD image. Instead, you’ll have to create a new Droplet.

Guess the key got baked in somewhere deep. Or it didn’t, in my case. Lame sauce.

So back to Arch for now, because I don’t want to have to repoint my DNS and do a bunch more clicky-clicky to rig up an entirely new host.

Recovery: Successfully Time-Traveled to 2014

Restore from ancient snapshot of 2014-03-19 using Digital Ocean Web UI. (Yeah, probably should have taken a snapshot a bit more recently. Oh well.)

ssh in. I can’t install certbot.

Get Certbot Installed

Let’s try to update to a modern toolchain so I can follow those recommended steps.

System Update Fails: GPG Key Import Fails

Keys are out of date, so system update with pacman -Syyu fails after restoring from 4-year-old snapshot.

Let’s refresh those keys:

pacman-key --refresh-keys
pacman -S archlinux-keyring

Now try to update system again. Continues failing out when I tell it, “Sure, import the key.” I can’t find anything helpful searching the Web.

Yank pacman source and rip-grep for the error message and wind up staring at libalpm/signing.c:460:

/**
 * Import a key defined by a fingerprint into the local keyring.
 * @param handle the context handle
 * @param fpr the fingerprint key ID to import
 * @return 0 on success, -1 on error
 */
int _alpm_key_import(alpm_handle_t *handle, const char *fpr)
/* SNIP */
if(key_import(handle, &fetch_key) == 0) {
	ret = 0;
} else {
	_alpm_log(handle, ALPM_LOG_ERROR,
			_("key \"%s\" could not be imported\n"), fetch_key.uid);
}
/* SNIP */

Maybe requires a newer version of GPG or something to be able to import the key that I need to install that newer version of GPG. Hard to say: The error message has no other info than “import failed”, with no info about why or how it failed.

Disable Signature Verification

That’s a dead-end: There’s no clue what’s wrong, so there’s no clue how to move past it.

Bypass the failing step entirely with:

vi /etc/pacman.conf
/SigLevel
f=
d$
= Never
^[
:wq

System Update Fails: File Conflicts

Now we hit conflicting files:

error: failed to commit transaction (conflicting files)
ca-certificates-utils: /etc/ssl/certs/ca-certificates.crt exists in filesystem
lzo: /usr/include/lzo/lzo1.h exists in filesystem
lzo: /usr/include/lzo/lzo1a.h exists in filesystem
lzo: /usr/include/lzo/lzo1b.h exists in filesystem
lzo: /usr/include/lzo/lzo1c.h exists in filesystem
lzo: /usr/include/lzo/lzo1f.h exists in filesystem
lzo: /usr/include/lzo/lzo1x.h exists in filesystem
lzo: /usr/include/lzo/lzo1y.h exists in filesystem
lzo: /usr/include/lzo/lzo1z.h exists in filesystem
lzo: /usr/include/lzo/lzo2a.h exists in filesystem
lzo: /usr/include/lzo/lzo_asm.h exists in filesystem
lzo: /usr/include/lzo/lzoconf.h exists in filesystem
lzo: /usr/include/lzo/lzodefs.h exists in filesystem
lzo: /usr/include/lzo/lzoutil.h exists in filesystem
lzo: /usr/include/lzo/minilzo.h exists in filesystem
lzo: /usr/lib/liblzo2.so exists in filesystem
lzo: /usr/lib/liblzo2.so.2 exists in filesystem
lzo: /usr/lib/liblzo2.so.2.0.0 exists in filesystem
lzo: /usr/lib/libminilzo.so exists in filesystem
lzo: /usr/lib/libminilzo.so.0 exists in filesystem
lzo: /usr/share/doc/lzo/AUTHORS exists in filesystem
lzo: /usr/share/doc/lzo/COPYING exists in filesystem
lzo: /usr/share/doc/lzo/LZO.FAQ exists in filesystem
lzo: /usr/share/doc/lzo/LZO.TXT exists in filesystem
lzo: /usr/share/doc/lzo/LZOAPI.TXT exists in filesystem
lzo: /usr/share/doc/lzo/NEWS exists in filesystem
lzo: /usr/share/doc/lzo/THANKS exists in filesystem
Errors occurred, no packages were upgraded.

Let’s see what owns those:

[root@gateway-arch jeremy]# pacman -Qo /usr/include/lzo/lzo1.h
/usr/include/lzo/lzo1.h is owned by lzo2 2.06-3
[root@gateway-arch jeremy]# pacman -Qo /etc/ssl/certs/ca-certificates.crt
error: No package owns /etc/ssl/certs/ca-certificates.crt

But when when I try to pacman -U lzo2, it acts like no such thing exists. Sigh.

OK, the cert one is a known issue. Gotta love rolling releases. The fix is to delete before upgrading.

For lzo, I ultimately did pacman -S --force core/lzo. It runs a risk of clobbering the wrong thing and hosing all the things, but it seemed a calculated risk, since it’s basically “clobber lzo as installed with an older package name with lzo as installed with the new package name”. The risk paid off, so.

OK, Now Really, Update

Then I could pacman -Su. Finally. And all the things updated.

Rebuilding the man-db database took a century. I worried something had gone wrong, but it hadn’t.

Install Certbot

[root@gateway-arch jeremy]# pacman -S certbot-nginx
error: failed to initialize alpm library
(database is incorrect version: /var/lib/pacman/)
try running pacman-db-upgrade
[root@gateway-arch jeremy]# pacman-db-upgrade
==> Pre-4.2 database format detected - upgrading...
[root@gateway-arch jeremy]# pacman -S certbot-nginx

OK, now we can follow the docs on LetsEncrypt.

Run Certbot

Unicode Issues: Set LC_ALL=en_US.utf_8

Except that Unicode is fun fun fun:

[root@gateway-arch jeremy]# sudo certbot --nginx
Saving debug log to /var/log/letsencrypt/letsencrypt.log
An unexpected error occurred:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 10453: ordinal not in range(128)
Please see the logfiles in /var/log/letsencrypt for more details.

Matching issue: https://github.com/certbot/certbot/issues/5236

Looks like there are smart quotes in the default template for nginx.conf, perhaps? Not in my config that I can see.

And that byte is at a gibberish offset.

And I confirmed that Python is happy to read in my nginx.conf as ascii.

OK, whatever. Mucking around with PYTHONIOENCODING=utf8 didn’t help.

So instead, play with locale. locale reports we’re in C. locale -a gives us a utf-8 option. Set that.

Now it runs.

HTTPS Certificate Trust Anchor: Missing

Can’t get an HTTPS cert because of HTTPS certs. Delicious.

[root@gateway-arch jeremy]# certbot --nginx
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator nginx, Installer nginx
Enter email address (used for urgent renewal and security notices) (Enter 'c' to
cancel): ********
An unexpected error occurred:
OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt
Please see the logfiles in /var/log/letsencrypt for more details.

That path definitely exists. It is a symlink, though, to be fair.

Spewed out when requests is trying to check the cert on the connection to the ACME backend, per that debug logfile:

2018-05-01 03:46:44,110:DEBUG:certbot.plugins.selection:Selected authenticator <certbot_nginx.configurator.NginxConfigurator ob
ject at 0x7f11271de5c0> and installer <certbot_nginx.configurator.NginxConfigurator object at 0x7f11271de5c0>
2018-05-01 03:46:44,110:INFO:certbot.plugins.selection:Plugins selected: Authenticator nginx, Installer nginx
2018-05-01 03:48:40,618:DEBUG:acme.client:Sending GET request to https://acme-v01.api.letsencrypt.org/directory.

Oh, huh. The file pointed to by the symlink? That one doesn’t exist.

And pacman -Ql ca-certificates, which should list all the files installed by that package, shows zip. Nada. Nothing: ca-certificates is an empty package.

Install More Certificates

Luckily, it’s not the only ca-certificates package available:

pacman -S ca-certificates-cacert ca-certificates-mozilla
[root@gateway-arch jeremy]# pacman -Ss ca-cert
core/ca-certificates 20170307-1 [installed]
    Common CA certificates (default providers)
core/ca-certificates-cacert 20140824-4 [installed]
    CAcert.org root certificates
core/ca-certificates-mozilla 3.36.1-1 [installed]
    Mozilla's set of trusted CA certificates
core/ca-certificates-utils 20170307-1 [installed]
    Common CA certificates (utilities)

Now let’s try again.

Success!

Modulo the fact this is an old nginx.conf that wasn’t updated for http2. I’ll just fish those updates out of my backup later. (I don’t recall it being difficult at all to update from spdy to http2, but I’m out of steam at this point.)

[root@gateway-arch jeremy]# certbot --nginx
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator nginx, Installer nginx
Enter email address (used for urgent renewal and security notices) (Enter 'c' to
cancel): *******
/usr/lib/python3.6/site-packages/josepy/jwa.py:107: CryptographyDeprecationWarning: signer and verifier have been deprecated. Please use sign and verify instead.
  signer = key.signer(self.padding, self.hash)

-------------------------------------------------------------------------------
Please read the Terms of Service at
https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf. You must
agree in order to register with the ACME server at
https://acme-v01.api.letsencrypt.org/directory
-------------------------------------------------------------------------------
(A)gree/(C)ancel: a

-------------------------------------------------------------------------------
Would you be willing to share your email address with the Electronic Frontier
Foundation, a founding partner of the Let's Encrypt project and the non-profit
organization that develops Certbot? We'd like to send you email about EFF and
our work to encrypt the web, protect its users and defend digital rights.
-------------------------------------------------------------------------------
(Y)es/(N)o: n

Which names would you like to activate HTTPS for?
-------------------------------------------------------------------------------
1: jeremywsherman.com
2: www.jeremywsherman.com
-------------------------------------------------------------------------------
Select the appropriate numbers separated by commas and/or spaces, or leave input
blank to select all options shown (Enter 'c' to cancel):
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for jeremywsherman.com
http-01 challenge for www.jeremywsherman.com
2018/05/01 03:59:59 [warn] 18919#18919: invalid parameter "spdy": ngx_http_spdy_module was superseded by ngx_http_v2_module in /etc/nginx/nginx.conf:65
2018/05/01 03:59:59 [warn] 18919#18919: could not build optimal types_hash, you should increase either types_hash_max_size: 1024 or types_hash_bucket_size: 64; ignoring types_hash_bucket_size
2018/05/01 03:59:59 [notice] 18919#18919: signal process started
Waiting for verification...
/usr/lib/python3.6/site-packages/josepy/jwa.py:107: CryptographyDeprecationWarning: signer and verifier have been deprecated. Please use sign and verify instead.
  signer = key.signer(self.padding, self.hash)
Cleaning up challenges
2018/05/01 04:00:06 [warn] 18922#18922: invalid parameter "spdy": ngx_http_spdy_module was superseded by ngx_http_v2_module in /etc/nginx/nginx.conf:55
2018/05/01 04:00:06 [warn] 18922#18922: could not build optimal types_hash, you should increase either types_hash_max_size: 1024 or types_hash_bucket_size: 64; ignoring types_hash_bucket_size
2018/05/01 04:00:06 [notice] 18922#18922: signal process started
/usr/lib/python3.6/site-packages/josepy/jwa.py:107: CryptographyDeprecationWarning: signer and verifier have been deprecated. Please use sign and verify instead.
  signer = key.signer(self.padding, self.hash)
Deploying Certificate to VirtualHost /etc/nginx/nginx.conf
Deploying Certificate to VirtualHost /etc/nginx/nginx.conf
2018/05/01 04:00:11 [warn] 18924#18924: invalid parameter "spdy": ngx_http_spdy_module was superseded by ngx_http_v2_module in /etc/nginx/nginx.conf:55
2018/05/01 04:00:11 [warn] 18924#18924: could not build optimal types_hash, you should increase either types_hash_max_size: 1024 or types_hash_bucket_size: 64; ignoring types_hash_bucket_size
2018/05/01 04:00:11 [notice] 18924#18924: signal process started

Please choose whether or not to redirect HTTP traffic to HTTPS, removing HTTP access.
-------------------------------------------------------------------------------
1: No redirect - Make no further changes to the webserver configuration.
2: Redirect - Make all requests redirect to secure HTTPS access. Choose this for
new sites, or if you're confident your site works on HTTPS. You can undo this
change by editing your web server's configuration.
-------------------------------------------------------------------------------
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 1

-------------------------------------------------------------------------------
Congratulations! You have successfully enabled https://jeremywsherman.com and
https://www.jeremywsherman.com

You should test your configuration at:
https://www.ssllabs.com/ssltest/analyze.html?d=jeremywsherman.com
https://www.ssllabs.com/ssltest/analyze.html?d=www.jeremywsherman.com
-------------------------------------------------------------------------------

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/jeremywsherman.com/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/jeremywsherman.com/privkey.pem
   Your cert will expire on 2018-07-30. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot again
   with the "certonly" option. To non-interactively renew *all* of
   your certificates, run "certbot renew"
 - Your account credentials have been saved in your Certbot
   configuration directory at /etc/letsencrypt. You should make a
   secure backup of this folder now. This configuration directory will
   also contain certificates and private keys obtained by Certbot so
   making regular backups of this folder is ideal.
 - If you like Certbot, please consider supporting our work by:

   Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
   Donating to EFF:                    https://eff.org/donate-le

I declined the redirect because I already have that in place.

The new lines added to the server stanza are:

    ssl_certificate /etc/letsencrypt/live/jeremywsherman.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/jeremywsherman.com/privkey.pem; # managed by Certbot

Restart Nginx

systemctl restart nginx

And now my blog is up and running again.

Total time cost: 2 hours-ish. Guess that coulda been worse.

Addendum: Keeping Cron Running

I kept seeing cron failing out on me, and my LetsEncrypt certs not auto-renewing. (On the bright side, that’s how I found out everything was hosed this time.)

Let’s see if we can fix that.

Which Cron?

What cron are we running?

[root@gateway-arch jeremy]# pacman -Qs cron
local/cronie 1.5.1-1
    Daemon that runs specified programs at scheduled times and related tools

Cronie.

Who’s That to Systemd?

And how’s it wired into systemd?

[root@gateway-arch jeremy]# pacman -Ql cronie | grep systemd
cronie /usr/lib/systemd/
cronie /usr/lib/systemd/system/
cronie /usr/lib/systemd/system/cronie.service

As unit cronie.service.

Status Shows Errors

Let’s flip it on and check its status.

[root@gateway-arch jeremy]# systemctl enable cronie
[root@gateway-arch jeremy]# systemctl status cronie
* cronie.service - Periodic Command Scheduler
   Loaded: loaded (/usr/lib/systemd/system/cronie.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-05-01 02:57:13 UTC; 11h ago
 Main PID: 169 (crond)
   CGroup: /system.slice/cronie.service
           `-169 /usr/bin/crond -n

May 01 13:01:01 gateway-arch crond[22224]: PAM unable to dlopen(/usr/lib/security/pam_unix.so): /usr/lib/libpam.so.0: version >
May 01 13:01:01 gateway-arch crond[22224]: PAM adding faulty module: /usr/lib/security/pam_unix.so
May 01 13:05:01 gateway-arch crond[22250]: PAM unable to dlopen(/usr/lib/security/pam_unix.so): /usr/lib/libpam.so.0: version >
May 01 13:05:01 gateway-arch crond[22250]: PAM adding faulty module: /usr/lib/security/pam_unix.so
May 01 14:01:01 gateway-arch crond[22590]: PAM unable to dlopen(/usr/lib/security/pam_unix.so): /usr/lib/libpam.so.0: version >
May 01 14:01:01 gateway-arch crond[22590]: PAM adding faulty module: /usr/lib/security/pam_unix.so
May 01 14:05:01 gateway-arch crond[22616]: PAM unable to dlopen(/usr/lib/security/pam_unix.so): /usr/lib/libpam.so.0: version >
May 01 14:05:01 gateway-arch crond[22616]: PAM adding faulty module: /usr/lib/security/pam_unix.so
May 01 14:51:01 gateway-arch crond[169]: (root) CAN'T OPEN (/etc/crontab): No such file or directory
May 01 14:51:01 gateway-arch crond[169]: (root) RELOAD (/var/spool/cron/root)

Point one, vendor preset is disabled. So that explains why I kept seeing crond fizzle out on me. I wanted to use cron but then never enabled it with systemd. (And I’m sure it’s disabled because I ought to be writing a systemd unit file. But I’m not going to just now.)

Point two, there are a couple errors.

The /etc/crontab error is weird - looking at the full -Ql output shows stuff in /etc/anacrontab, not /etc/crontab. I’m betting this is an older cron process that started before I updated all my packages from their 2014 vintage flavors.

The PAM error looks grungy and noisy. A quick search through man systemctl didn’t show me how to change the character width so I could see the rest of the PAM module error, but searching found a good hit for dlopen failure of pam_unix.so.

Consensus is: You had a glibc update. Now a process still using the older glibc is trying to open a shared object using a newer one.

Fix: systemctl restart cronie

Restarting the Service Fixes All Errors

And indeed, that does it:

[root@gateway-arch jeremy]# systemctl status cronie
* cronie.service - Periodic Command Scheduler
   Loaded: loaded (/usr/lib/systemd/system/cronie.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-05-01 14:54:35 UTC; 17s ago
 Main PID: 22985 (crond)
   Memory: 612.0K
   CGroup: /system.slice/cronie.service
           `-22985 /usr/bin/crond -n

May 01 14:54:35 gateway-arch systemd[1]: Stopped Periodic Command Scheduler.
May 01 14:54:35 gateway-arch systemd[1]: Started Periodic Command Scheduler.
May 01 14:54:35 gateway-arch crond[22985]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 96% if used.)
May 01 14:54:35 gateway-arch crond[22985]: (CRON) INFO (running with inotify support)
May 01 14:54:35 gateway-arch crond[22985]: (CRON) INFO (@reboot jobs will be run at computer's startup.)

No more errors. Good to go!

Total time cost: 20 minutes.

Cards: Closing Out a Wikipedia Crawl

Sat, 20 May 2017 00:00:00 +0000

I don’t recall how it started, but I’m at the tail end of a several-day Wikipedia crawl through playing cards and card games. Read on for fun trivia, neat games you might not have heard of, and, if your playing card experience is primarily American, probably the destruction of a lot of your assumptions around what a “standard deck of cards” is.

Fun Trivia

These mostly reflect my interest in the how we ended up with the 52-card, four-suit deck I grew up playing cards with.

The Tarot trumps started life as a dedicated trump suit for Italian playing cards. France imported them from Italy, some folks started using them for cartomancy in the XVIIIth Century, and then that got exported to England.
Those “exotic” suits of Wands, Cups, Pentacles/Coins, and Swords? Yeah, those turn out to just be the standard “Latin” suits in use in Italy, and still used some today in Italy, Spain, and thereabouts.
Corner indices - a small notation of the suit and rank at the corners of the cards, to let you easily read your hand with the cards fanned, without having to study the whole face over - are kinda recent. They seem most common with Anglo/French cards, and you can still find other flavors of decks without them.
Corner indices are what drove the adoption in English of “Jack” as the name for the lowest court card. Before that, “Jack” was kinda slightly scandalous/lower class, and everyone called them “Knaves”, even up into the XIXth Century it seems. But it turned out to be way too easy to confuse the “K” and the “Kn” when you had your cards fanned, so, out with the Knave, and in with the Jack!
French suits - hearts, diamonds, clubs, and spades - partly took off because they are such simple shapes that they make it easy to just stamp out the pip cards. This made them a lot cheaper and to produce. No fancy crossed seven clubs art needed!
The English names for French suits reflect that English card playing started with the Latin suits. What we call “clubs” and “spades”, the French call “clovers” and “pikes”. The corresponding Latin suits actually are clubs and swords, and “spades” is just a name for a kind of sword.
The French suits actually evolved out of the German suits of Hearts, Bells, Acorns, and Leaves. (The Swiss swap Hearts for Roses and Leaves for Shields, because they’re cool like that.)
You can still buy German-suited decks today! And some of the German games get played with both systems, and there’s even a hybrid, compromise deck (used in tournaments for the German card game Skat) that uses French suit symbols with German suit colors.
Cards are used to play card games, and card games can be used for gambling, so people sometimes banned cards. So you’d sometimes see tiles pop up, or a renewed interest in Dominos, or, if you’re purebred American Puritan, you’ve got your XXth Century Rook deck, with 4 colored suits (just colors: black, red, yellow, and green) of 14 cards (which matches a tarot deck having four court cards per suit of jack, knight, queen, king, or similar), plus a blue Rook card that acts kinda like a trump. It’s mostly just used nowadays (and maybe ever?) to play a card game named after the deck.
The “red suits” (vs the “black suits”, or if you fancy the Latin decks, the “long suits” vs “round suits”) used to rank their pip cards the other way around, where 1 (ace) is high and 10 is low, rather than 10 high and 1 low. This old-school inverted ranking is still used when playing modern-day French Tarot!
Jokers started life as a “super trump” for Euchre. “The Imperial Bower” takes all comers. Then they spread from there, till you can find some games having six jokers in play. Because they’re such a recent card, their design is not standardized, but usually, in two-joker decks, there’s a “greater” and a “lesser” joker, where the lesser either is uncolored or has the deck’s guarantee printed on it or whatever. This allows to rank the two jokers against each other for cases where they’re at the top of the trumps.
The French court cards are considered as depicting specific historical/literary characters, rather than just being anonymous stand-ins. Decks sometimes have the character’s name written vertically along the edge.
“Stripped decks”, where you play with less than 52 cards, sometimes way less, by throwing out whole ranks across all the suits, are actually pretty darn common! You see this in some games played with an Anglo-French deck, like Euchre, but for some other-suited decks, you’d be hard pressed to find anyone selling a deck with all 52 cards, simply because the most popular games played with those decks (like Skat with the German deck) don’t need that many cards. This goes for, well, basically every deck flavor other than the Anglo-French – at least, that’s the impression I’ve retained after browsing around. That includes most Tarot card decks.

Interesting Games

Most of these stuck out for me based on national popularity. Like, “Whoah, there’s this game I never knew existed, and it’s the biggest card game ever in (some country/region)?”

The big German games are Skat, Doppelkopf, and Schafkopf. Skat especially is big, with an active playing community and standardized rules for competition.
This snippet of Wikipedia just tickled my fancy:

In Germany, Schafkopf is not deemed a gambling game and can therefore be legally played for money. Especially in Bavaria it is normally played for small amounts of money to make it more interesting and the players more focused. Normal rates are 10 Euro cents for normal and 50 for solo games.
You need like 5-7 riffle shuffles to properly randomize a deck. Using a machine saves your hands and gives you a seriously randomized deck, which is why you see them used so often in gambling games.
True randomness isn’t even really desirable in some card games! The Grand Skat Authority (not their actual name) actually ruled against using a shuffling machine, since it’d introduce too much randomness in the deck. On the far side of “eh, random shuffling, meh”, the rules of Belote straight up bar shuffling between hands: You just cut the deck before beginning dealing for the hand, instead.
The big French games look to be Belote and French Tarot these days. Formerly, Piquet was pretty big, and it had its hey-day in England, as well.
The big Eastern European game is Durak, or “Fool”, which is an interesting shedding-type game (your aim: end up holding no cards, as fast as you can). In characteristic Eastern European style, it seems the game has no winners, only a designated loser, the fool who’s the last one left holding cards at the end of the game.
I didn’t touch on East Asia in the card deck bit, but they’ve got their own thing going on, with games you mostly won’t find in the US, aside from a Hanafuda import popular in Hawaii called Koi-Koi. Japanese deck evolution was driven by the Tokugawa Shogunate’s anti-Western bent, though there are also games that derive from a kind of “finish the poem” matching game mold. I often found the descriptions on Wikipedia of these games kind of hard to follow, probably because they’ve received less attention - I was reading the English-language Wikipedia, after all, and it has its biases.
The play style I’m familiar with in Euchre is pretty similar to a class of games related to Whist. Hearts is a “negative game” where the goal is to not score points (“card golf”), while Spades seems like it might be treated as Bridge with Training Wheels?
Euchre has a less confusing recent variant called Bacon, which doesn’t have the suit-shifting fun times of the Left Bower.
The game Ombre (“I’m l’Hombre, err, the man!") introduced auctions into card games.
Whist eventually evolved into Contract Bridge, which seems completely bonkers from the outside, especially once you run into the notion of Brown-Sticker and Yellow-Sticker bidding conventions/tendencies.
Scarto is a three-player Tarot game that’s supposed to be an easy starting point if you want to get into that class of game. Loser of the game buys the next round of beers. What’s not to love?
Cribbage is supposedly kinda big in England? I know of the game, but haven’t really ever played it, though I recall puzzling over a cribbage board my mom had once or twice as a kid.
Cribbage is also apparently kinda “The Game” for US submariners, and they keep a WWII-vintage cribbage board in the wardroom of the oldest submarine, and pass it on to the now-oldest whenever that one gets decommissioned.
Scopa is a major Italian game, also apparently kinda popular in Brazil and some parts of the US. It’s a kind of matching/capturing game.
I grew up playing “plain trick games”, where scoring is basically just who got how many tricks, with maybe some bonuses for a shut-out or playing alone, but nothing fancy. Loads of European games, though, are what are called “point-trick” games, where the scoring depends on the specific cards captured and their assigned point-values. Sounds like kind of a pain to track to me, but I’m sure it becomes easy-peasy after a few evenings, especially given their widespread popularity!
OK, in case point-trick games weren’t complicated enough, a ton of games also include “declarations”, where you can score points for holding certain combos of cards, at the price of revealing info about your hand to your opponents. And some games (OK, mostly just the German games!) have several different “game modes”, where the flavor of the game varies based on deal or sometimes choice, or sometimes it just cycles through the game flavors. Forcing cycling seems more common in competitive play, with stuff like, “everyone has to trigger this one game mode at least once” popping up across several different games.

Conclusions

There are some really cool looking decks with suits I knew nothing about out there. Many wouldn’t be too useful for playing the card games I’m used to, due to their starting life as a stripped deck for playing the games most popularly played in their region.
The big regional games are probably worth looking at if you’re looking for something new to play:
- Skat
- Scopa
- Belote
- French Tarot
- Durak
- Cribbage
Contract Bridge is, uh, shall we say, “baroque”. Think “C++ template metaprogramming”. It’s at the end of a long evolution begun with Whist, but it seems frozen as a result of competitive international play. If you really want to lose yourself in detail, though, this might be the game for you!

Microservices vs Distributed Objects

Wed, 08 Mar 2017 00:00:00 +0000

Distributed objects died out eventually; you can’t really hide the network layer without changing your system design to match. Here’s a Cocoa take. And here’s a Martin Fowler take found via the article below, with a sidebar suggesting a remote façade (to provide a coarse API as a remote endpoint) and data transfer objects (to provide coarse data transfer also as a way around slow remote communication times).

So, if DO sucks, why are microservices any different?

Enter Phil Calçado’s Microservices and the First Law of Distributed Objects:

Objects are not a good unit of distribution. They are too small and chatty to be good network citizens. Services, on the other hand, are meant to be more coarse-grained.

Group terms by affinity; grab out your connected components; now you have bounded contexts. Make those your services.

Early on, not worth the trouble - monoliths make sense. At huge scale, performance considerations (not going down) dwarf maintenance. In the middle, though, this rule of thumb ain’t bad.

Found via Devops Weekly.

The Gist of Regex

Sat, 14 Jan 2017 00:00:00 +0000

Regular expressions scare some people. They’re really quite warm and cuddly, or at least, conceptually very neat and tidy. If you don’t feel that way, this post is for you! Here’s how I think about regexen, in a nutshell.

I use this conception on a regular basis; when it comes to writing regex, I think about what I want to do in this model, then translate it into whatever regex notation the system I’m using at the time gives me. (I do the same thing with distributed version control and relation databases, but let’s stick to regexen for now.)

Regex Is Tiny Machines!

Regular expressions are a compact description of a symbol-matching machine. Like, “If you see an a, then maybe a b, and then one or more c, it’s a match!” for ab?c+.

But the machines can nest, so you can instead say stuff like, “If you see one thing matched by this machine, then maybe one thing matched by that one, followed by one or more things matched by that other, it’s a match!” So the a, b, and c from the last bit could actually be bigger regular expressions themselves.

But you have no variables in regex! So, instead, you plop the whole machine descriptions in there in parentheses, like (…)(…)?(…)+ And repeat the description if you need the same machine twice.

Pitch in self-referentiality - “if you see exactly the same thing as you ended up matched back there” - by using backrefs to parenthesized machines, you’re in our modern world of extended “regular” expressions. At that point, what we’re talking about is no longer actually expressions describing what’s technically known as a regular language, but they’re exceedingly useful extensions of the notation, so no-one cares. ;)

Compact Notation, Effective Expressive Power

What makes regular expressions so useful is:

Reach: A lot of stuff we want to match against can actually be described by them, especially when you pitch in a lot of the extended power-ups
Compactness: They’re a marvelously compact notation for what would otherwise be a lot of very boring code! Instead of writing that code, we dash of a regex, and we leave the translation into code to the regular expression engine.

For the More Curious

If that’s whet your whistle, Friedl’s Mastering Regular Expressions is excellent. And, as a bonus, you can probably just read the first few chapters and emerge enlightened. :)

P.S. You can also look at regular expressions as definitions of regular languages - as generators rather than consumers of text. Running them backwards like this can be a good way to think about whether a regex you’re writing captures exactly what you’re aiming at, or whether it might include a bit more than you intended!

P.P.S. And if you think about them in terms of machines, it’s really easy to start thinking about how to write fast regular expressions.

P.P.P.S. Hat-tip to @bazbt3 over at App.Net. What is dead can never die!

Iterative Development

Sat, 14 Jan 2017 00:00:00 +0000

“At last, my current practice of writing no automated tests has the blessing of science! See, TDD doesn’t do anything!” That’s how Fucci et al.‘s 2016 conference paper An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach was introduced to me.

And, indeed, it concludes like so:

Despite adopting such countermeasures, aimed at reducing researchers’ bias [when replicating a prior, baseline study], we confirmed the baseline results: TDD does not affect testing effort, software external quality, and developers’ productivity.

Takeaways:

All coding is debugging
- Work in small steps
- Stay grounded in observed outputs
- Keep good notes (tests or REPL session logs)
TDD won’t slow you down at steady state
- Changing how you code to be more intentional and iterative might to start.
- What will definitely slow you down: Learning your tooling and the impact of that iterative approach on the code you produce (expose those probe points for external testing! add indicator LEDs via assertions!)

Now that we’ve got the conclusion out of the way, keep reading to see how I got there. :)

Martin: It Tells Us Nothing: Keep on keeping on with TDD!

In the time since I’ve had “I should really write up my response to this” on my to-do list (a few months…), Bob Martin wrote up his take: TDD Doesn’t Work.

His article concludes the study made a distinction without difference and so naturally found no difference: Basically, that the folks involved were still practicing TDD, but where they actually wrote the tests – rather than conceived of and directed their coding efforts to them – was altered slightly.

Me: Iterativeness Is The Key!

But, check this:

Control treatment: the baseline experiment and its replication compared TDD to a really similar approach, labelled as TLD. Under this [sic] circumstances, we might be focusing on the incorrect part of development process (i.e., whether write tests first or not), and disregard the part of the process in which the a substantial effect might lie (i.e., the iterativeness of the process). Accordingly, the tasks used for both experiments were designed to fit the iterative nature of both treatments — i.e., isolate the process itself from the cognitive effort required to break down a complex problem into sub-problems. Pančur and Ciglarič [33] made a similar claim reporting the inconclusive results of a similar experiment. (bold emphasis added)

Fail Fast, Focus on Observed Outputs

The authors are on to something: I’ve seen people new to TDD stumble over learning to work in checkable baby steps. That iterative, fail-fast approach is where a lot of the time savings comes from; the other savings comes from being very intentional and focused about the concrete change you’re attempting to effect, or the specific knowledge you’re trying to elicit through experimentation. This same mindset also pays off in spades in debugging.

TDD or REPL, Just Use One!

We know you can learn this mindset via TDD, but a dev loop based around a REPL can work just as well. It ends up as TDD without the durable byproduct – once the session scrolls away, all those tests you wrote during bring-up are gone.

TDD Won’t Slow You Down

There’s another positive takeaway from the paper’s conclusion of no substantive difference:

TDD does not affect testing effort, software external quality, and developers’ productivity. As long as you’re working iteratively and actually writing tests, you’re going to write working software and be as productive as you can under the circumstances.

If you don’t practice TDD already, fear not: TDD is not going to slow you down.

…But Learning to Use Test Frameworks Might, To Start

Getting the hang of working iteratively, and actually writing tests yourself, on the other hand – those will take a bit of time. And then save you far more over time.

Missing

The Long View: Maintenance Burden

It would be interesting to see experiments comparing TDD or ITLD (iterative, test-last development) and development where one of those two constraints is relaxed:

Either drop the iterative bit, or
drop the test-writing bit.

This was a small dev task, so I bet you’d see productivity go up as quality goes down. Put another way, I bet small tasks naturally lead people to ditch both of the things that make up TDD/ITLD.

This short scale approach doesn’t assess two real-world challenges that we are concerned with as software maintainers:

Responding to changes over time.
Not breaking stuff on timescales longer than a single workday in codebases larger than what one person can turn out in a workday.

Style: Internal Code Quality

The study also did not assess internal quality (is it readable, navigable, maintainable code?) in any way. Out of scope for their purposes, rather important for those of many professional developers, as the wide spread of PR-based code review processes and the flourishing of adjuvants like SwiftLint and Danger reflect.

Conclusion

Reread the intro. (Or, as the grimly satirical Scarfolk Council would say: “For more information, please reread.") I’m saving you having to read the abstract and then page to the conclusion this way. Go forth, and do the same for others. ;)

How to Work Around an Empty Zenfolio Zip File

Mon, 28 Nov 2016 00:00:00 +0000

My family recently had some holiday photos taken. The photographer was using Zenfolio to host their photos. I loved the photos and wanted to archive the originals on my laptop (and NAS, and Amazon Photos, and Time Machine, and Carbon Copy Cloner clone, and…). But every time I tried to download an original – of one photo, of all the photos, makes no difference – the server always sent me an empty zipfile!

I emailed the photographer to let them know, but I wasn’t going to wait.

Rather than work around this manually by visiting each page and right-clicking to Save As each photo – and I’m not sure that would show me the full-size image , anyway! – I figured Zenfolio would have an API.

Sure enough, there’s a well-enough documented Zenfolio API. I was in business!

I was able to lash together some shell commands to grab my full photoset. To save you some fumbling, here’s how I did it.

Walkthrough

Grab the Photo Details for the Photoset

Get the photoset ID. You can grab this from the URL you’re using to view the photos on the photographer’s website. If you view your photos at http://www.example.com/p544941453, then your photoset ID is 544941453.

Fetch the list of photos in that photoset using curl and save the JSON response to disk for the next step:

curl -v \
    -H'Content-Type: application/json' \
    api.zenfolio.com/api/1.8/zfapi.asmx \
    -d '{
      "method": "LoadPhotoSetPhotos",
      "params": [544941453, 0, 100],
      "id": 1
    }' \
    > photoset.json

This grabs the photos in photoset 544941453 starting from index 0 and returns at most 100 photos. Tweak those values to match your photoset and number of photos.

Also, I’m using fish as my shell. You might need to tweak that command line to make your shell happy, especially with the multiline string literal.

See: LoadPhotoSetPhotos method documentation

Download Each OriginalUrl

Grab the OriginalUrl field from the photo objects in the photoset response using jq, the JSON multitool:

jq '.result[].OriginalUrl' photoset.json

Download each file at those URLs by feeding them to curl via xargs:

jq '.result[].OriginalUrl' photoset.json \
    | xargs -n 1 curl -O

(The -n 1 is there so that curl sees one -O for each file argument. Without it, xargs would run curl -O url1 url2 url3…. This causes curl to download only the first URL to a matching file on disk; the rest, it starts piping out to stdout. I couldn’t work out a good way to get xargs to repeat the -O per argument, so I just throttled it to calling curl -O justASingleURL repeatedly.)

Enjoy your photos!

Caveat: Assumes Public Photos

This walkthrough assumes no authentication is required to download your photos. I lucked out: All my photos had an AccessDescriptor.AccessType of Public.

If the originals are password-protected, you’ll find a walkthrough of the hoops to jump through in “Downloading Original Files”.

If things are more locked down, you might need to sort out the authentication flow before you can even grab the photoset details. I didn’t need to do any of that, so I can’t walk you through how. Sorry!

A Practical Example of FlatMap

Thu, 22 Sep 2016 00:00:00 +0000

The Swift standard library introduces some unfamiliar concepts if you’re coming from Obj-C and Cocoa. map is one thing, but for some, flatMap seems a bridge too far. It’s a question of taste, and of background, if something comes across as a well-chosen, expressive phrase or if it just seems like status signaling, high-falutin’ bullshit.

Well, I’m not going to sort that all out, but I did find myself rewriting an expression using a mix of if let/else into a flatMap chain recently, so I thought I’d share how I rewrote it and why.

If you’re mystified by Optional.flatMap, read on, and you should have a good feel for what that does in a couple minutes.

I’m not going to demystify everything: You still won’t know why it’s called flatMap.

But then, why do we use + for addition? And how do you implement it in terms of a fixed number of bits?

Just because you don’t know a symbol’s etymology or a function’s implementation, that doesn’t mean you can’t make it do useful work for you. If you treat flatMap as an operator written using Roman letters, you can get good value out of it!

Duck, Duck, Goose

Here’s what some deserialization code looked like to start:

init?(json: JsonApiObject) {
    guard let name = json.attributes["name"] as? String
        , let initials = json.attributes["initials"] as? String
        else { return nil }
    self.name = name
    self.initials = initials
    self.building = json.attributes["building"] as? String
    self.office = json.attributes["office"] as? String
    self.mailStop = json.attributes["mailStop"] as? String
    if let base64 = json.attributes["photoBase64"] as? String
    , let data = Data(base64Encoded: base64) {
        self.photo = UIImage(data: data)
    } else {
        self.photo = nil
    }
}

Notice how you’re trucking along reading, “OK, we set this field, set that field, set that other field, and WHAT THE HECK IS THAT.” The if let bit comes out of left field, breaks your ability to quickly skim the code, and takes some puzzling to sort out. It also leads to repeating the assignment in both branches.

Cleaning This Up

Extract Intention-Revealing Method

To start with, we can take the existing code as-is, yank it out into a helper method, and call that:

self.photo = image(fromBase64: json.attributes["photoBase64"] as? String)

This makes the call site in init? read fine, but we’ve just moved the ugly somewhere else.

Take Advantage of Guard

Shifting it into a method dedicated to returning an image does open up using guard let to make the unhappy path clear:

func image(fromBase64 string: String?) -> UIImage? {
    guard let base64 = string
    , let data = Data(base64Encoded: base64)
    , let photo = UIImage(data: data) else {
        return nil
    }
    return photo
}

Still Too Noisy!

But that’s no real improvement:

The return values just restate our return type. They’re noise.
The reader has to manually notice that we’re threading each let-bound name into the computation that’s supposed to produce the next one.
We’re forced to name totally uninteresting intermediate values just so we have a handle to them to feed into the next computation.

All told, that’s a lot of noise for something that’s conceptually simple and that should be eminently skimmable.

A Pipeline with Escape Hatch

The pipeline we have is:

feed in a string
transform it into data by decoding it as base64
transform that into an image by feeding it into UIImage
spit out the image

The trick is, if any of these steps fails – that is, if any step spits out a nil – we just want to bail out and send back a nil immediately. It’s like each step has an escape hatch that short circuits the rest of the pipeline.

Pipeline with Escape Hatch Is Just FlatMap

Well, that’s exactly the behavior that sequencing all these with Optional.flatMap would buy you! Have a look:

func image(fromBase64 string: String?) -> UIImage? {
    return string
           .flatMap { Data(base64Encoded: $0) }
           .flatMap { UIImage(data: $0) }
}

And if you inlined it, it’d still be eminently readable, because it puts the topic first (“hey, y’all, we’re going to set photo!"), which preserves the flow of the code and its skimmability, and you can quickly skim the pipeline to see how we get that value.

Conclusion

Flatmap very clearly expresses a data transformation pipeline, without extraneous syntax and temporary variables.

We backed into using it in this example for reasons of readability, not for reasons of “I have a hammer! Everything is a nail!”

Sometimes, the new tool really is the right tool.

Appendix: Similar Rewrites

This “assign something depending on something/s else” situation happens a lot. And it can shake out a lot of different ways.

If the expression had been simpler, we could have rewritten it using ?: to eliminate the repeated assignment target. This often shows up with code like:

- if haveThing {
-     x = thing
- } else {
-     x = defaultThing
- }
+ x = haveThing ? thing! : defaultThing

Which, in that common “sub in a default” case, can be further simplified:

- x = haveThing ? thing! : defaultThing
+ x = thing ?? defaultThing

And if nil is an A-OK default, becomes the wonderfully concise:

- let defaultThing = nil
- x = thing ?? defaultThing
+ x = thing

There’s a similar transform that eliminates guard let stacks by using optional-chaining, but that deserves a bit more of an example, I think.

The Internet Speaks: Testing FP Code

Tue, 20 Sep 2016 00:00:00 +0000

One problem I have writing Swift is that I’m not really sure how to tackle testing FP-ish code using XCTest.

I did some quick Internet research. If you read it on the Internet, it must be true. This is a distillation of those great Internet truths.

The Context: Data Persistence

But first, some context. Why did I care about this?

I ran into this in the context of sorting out how to persist and restore some app data at specific “app lifecycle” hooks.

Specifically:

When the app backgrounds, start a background task, then serialize and write to disk, then end the task.
- Inputs: data store, serialization strategy, where to write to
- Outputs: updated file on disk (side effect)
When the app launches, block the main thread till we’ve loaded the data from disk and unpacked it. This should be fast enough. Anything else will lead to folks seeing a not-yet-ready UI.
- Inputs: serialization strategy, where we wrote to
- Outputs: We can see the restored DataStore (side effect)

This is very much “app lifecycle” stuff, so we want the App Delegate to do it.

What’s the cleanest code we could imagine?

bracket startBackgroundTask endBackgroundTask $
    dataStore |> serialize |> write location

deserialize(location)
|> fromJust seedDataStore
|> set dataStoreOwner .dataStore

I think my big ??? is that I don’t get how to test a functional pipeline. It seems to not having any of the seams you’d usually rely on.

Testing FP Code

Summarizing:

Separate out pure code from impure.
Use PBT for the pure code.
Use typeclasses or protocols or similar dynamic binding methods to swizzle impure actions.

I guess, use acceptance testing to check that you got the wiring to impure stuff correct? That issue seems mostly ignored in favor of the much happier “pure functions are easy to test” story.

In practice, I think I’m now foundering on the mess that is object-functional blending. You’d hope that the Scala folks might have something good to stay on that, but that’ll have to be a later round of The Internet Speaks.

Static Methods Are Death to Testability

http://misko.hevery.com/2008/12/15/static-methods-are-death-to-testability/

Recapitulates the problem I identified:

Unit-testing needs seams, seams is where we prevent the execution of normal code path and is how we achieve isolation of the class under test. seams work through polymorphism, we override/implement class/interface and than wire the class under test differently in order to take control of the execution flow. With static methods there is nothing to override.

Recommends converting static methods to instance methods:

If your application has no global state than all of the input for your static method must come from its arguments. Chances are very good that you can move the method as an instance method to one of the method’s arguments. (As in method(a,b) becomes a.method(b).) Once you move it you realized that that is where the method should have been to begin with.

Says not to even consider leaf methods as OK as static, because they tend not to remain leaves for long.

Unit Testing and Programming Paradigms

http://www.giorgiosironi.com/2009/11/unit-testing-and-programming-paradigms.html Identifies the same problem as you move away from leaf functions in the context of procedural programming:

The problem manifests when we want to do the equivalent of injecting stubs and mocks in higher-level functions: there are no seams where we can substitute collaborator functions with stubbed ones, useful for testing. If my function calls printf(), I cannot stub that out specifying a different implementation (unless maybe I recompile everytime and play a lot with the preprocessor).

Outlines, in theory, what they would do, but have not done, for FP code: Pass in functions to parameterize behavior:

So instead of injecting collaborators in the constructor we could provide them as arguments, earning the ability to pass in fake functions in tests. The upper layers can thus be insulated without problems (with this sort of dependency injection) and there are no side effects that we have to take care of in the tear down phase

Omits stack and logic paradigms. No surprise there.

Recoverability and Testing: OO vs FP

https://www.infoq.com/news/2008/03/revoerability-and-testing-oo-fp

Sums up a conversation that happens across several blogs. Weirdly omits any links to primary sources. Yuck.

OO is rife with seams that are easy to exploit, so Feathers likes it. Where you need a seam is a design issue:

Another blogger, Andrew, highlights that if “code isn’t factored into methods that align with the needs of your tests”, the implementation will need to be changed to accommodate the test. Hence, he argues as well that “thoughts about “seams” are really just getting at the underlying issue of design for testability”, i.e. the proper placement of seams.

But not all systems are always so designed (putting it nicely), so “recoverability” matters: being able to make something testable in spite of itself.

According to Feathers, even though there are alternative modules to link against in functional languages, “it’s clunky”, with exception of Haskel where “most of the code that you’d ever want to avoid in a test can be sequestered in a monad”

Then there’s an argument that pushing the impurity to the edges makes things testable. No-one addresses validating correct composition of verified components, though. :(

SO: Testing in Functional Programming

https://stackoverflow.com/questions/28594186/testing-in-functional-programming

Answers point out:

Function composition builds units, in that you can test them quickly.
QuickCheck/SmallCheck dodge the combinatorial explosion of codepaths that you get by composing functions.
Coding against a typeclass that you can swizzle out for a test one lets you stub out IO-like functions. (Or just manually pass in a dictionary type.)

Why I'm Meh About JSON API

Sat, 23 Jul 2016 00:00:00 +0000

JSON API has been pretty successful at providing a framework for APIs that lets you focus on roughly the entity–relationship diagram of your data.

But I find it frustrating at some turns (too flexible!) and peculiar at others (why is it bound to just one content-type?).

My frustrations with JSON API are ultimately because it doesn’t solve the problems I have as an API consumer, and its aim of preserving flexibility results in API consumers paying the price of that in needing to deal with the foibles of a specific implementation and in manually tuning their API queries.

I find the approach taken by GraphQL more directly and usefully addresses my needs as a client developer while also necessarily, by design, minimizing requests made and data transmitted.

JSON API makes it possible to accomplish that, but it leaves the responsibility for doing so up to the client developer; GraphQL makes it possible to accomplish that, but it takes the perftuning responsibility upon itself, which makes my life as a client dev easier.

Introduction

I spent the last couple months working on an Ember app. The backend was running the cerebris/jsonapi-resources flavor of JSON API implementation. The frontend was using Ember Data’s JSON API adapter.

It worked, but I also kept running across ugly data requests like:

this.store.findAll(
  'work-order',
  { include:
    [ 'location'
    , 'shipping-address'
    , 'credit-card'
    , 'user'
    , 'shipments.shipment-items.order-item.inventory-item.part'
    , 'order-items.inventory-item'
    , 'inventory-items.part.part-kind'
    ].join(',')
  });

When I see something like that, all I can think is, Why am I listing all this out for the computer? It should figure it out! Maybe in a year or so, Ember Data will indeed do that, but you need to do that sort of thing today, unless you want template rendering to lead to this conversation between HTMLBars and Ember Data: “render, oh crud we need some data, fetching… rerender, oh crud more‽ ok, fetching… rerender… what, more! fetching…”

But if you’re hitting the API by hand – be it manual XMLHTTPRequest preparation or curl – that leads to a bear of a URL. And parsing out the data once it arrives is also not so fun. I hope you enjoy writing JOIN logic client-side!

And how do you even find out what you can toss in that include bit? I just popped over to the backend source and nosed around. That’s fine when you have access to the backend source code, but what if you don’t? What’s JSON API got to say to that?

Well. I’m not terribly happy with JSON API’s answers – and we’ll come to those in a bit – but let’s see if we can understand where JSON API is coming from: How did JSON API end up like this, and to what end?

JSON API: Bytecount Golfing with -ility Handicaps

JSON API’s primary purpose is to minimize request count and data transmitted. It attempts to balance this against concerns for readability, flexibility, and discoverability.

Readability: Not Too Shabby

JSON API is pretty readable. Hit a site using it (most anything Ember), and check out the API requests and responses in your browser debugging tools, and you can work out pretty quickly what’s going on.

The side-car style for included objects, where you have to bounce from an ID reference in the main response to a lookup table that got sent along with it, hurts a bit here for humans: you have to do manual joins client-side. But inlining them wouldn’t play nice with the “minimize transfer” focus, so it makes sense.

The URLs asking for those included objects get pretty gnarly, though.

Discoverability: Meh

I’d say its discoverability is pretty darn poor; this is partly a result of its flexibility, but mainly a result of its not providing much in the way of standardized introspection facilities.

The most frustrating lack for me when I hear “backend is using JSON API” is not being able to hit the root of the API and crawl from there to work out the whole of the API and what it supports. This is one of the most important attributes for the usability of a RESTful API from where I stand, but JSON API drops the ball, or heck, doesn’t even pick it up in the first place: Hypertext through-and-through it simply ain’t.

Where this often comes to a head is with include; there’s no standard way to signal that this is supported by a backend. You can give it a go and see if it yells at you, though. But if it does support it, it’s not clear what you’re not/allowed to include with something until you try.

And if the backend doesn’t support include, then it’s free to unilaterally include whatever alongside the data you asked for. If the backend API is sanely versioned – and JSON API does not specify how to manage that – you’re probably fine, but if it’s not, and your JSON API client library prefers to fail eagerly rather than being liberal in what it accepts, your backend can break your frontend pretty readily. Versioning aside, that’s more an implementation issue than a spec issue, though, so we can let that slide.

So we have our answer to the question from the intro: How do you even find out what you can toss in that include bit? You don’t, or you guess, or you look up the docs or source code for the backend, or you email support. Mmm, emailing support: Definitely something I like to include smack in the middle of my development cycle.

Flexibility: Hurts Discoverability and Limits Utility of Having a Spec

There are a lot of “servers may do this, or that, or maybe that…” bits in there too, which make finding out a server uses JSON API less of a “now I know everything about it” than it could be. (Search for “MAY” and “SHOULD” in the document.)

We saw this with include, but it also comes into play with requesting only certain bits of a record (sparse fieldsets), sorting, pagination, and filtering, the latter of which is specified in its entirety as: “The filter query parameter is reserved for filtering data. Servers and clients SHOULD use this key for filtering operations.” The limited specification of filtering and sparse fieldsets seems suprising in the face of a focus on reducing the amount of data transferred: This seems very much fair game for a spec with that aim in mind, but it handwaves and throws it in the flexibility bin, instead.

This really smarts for two reasons as a client dev:

There’s no standard way to communicate what implementation-defined choices a JSON API backend has made.
There’s no requirement to make those choices uniformly across all APIs.

This again means that learning an API is using the JSON API spec doesn’t buy you as much as it could; you still have to ask a lot of questions to sort out what that means in practice.

It also means that any client-side de/serializer for JSON API is limited in the support it can provide to you. The spec is very open to customization, which means that you will have to learn those customizations in force for your backend and teach your JSON API parser about them.

This reminds me a bit of how OAuth 2 moved from being a spec to a meta-“spec”, flexible to a fault, as described by the one-time lead author and editor of that spec:

One of the compromises was to rename it from a protocol to a framework, and another to add a disclaimer that warns that the specification is unlike to produce interoperable implementations. (“OAuth 2.0 and the Road to Hell”)

Peculiar: Why Only JSON?

JSON API seems weirdly bound to the content-type (it’s an API! in JSON!), which is kind of funny to me in light of the “A server MUST prepare responses, and a client MUST interpret responses, in accordance with HTTP semantics” language. This feels like following the letter rather than spirit of the law: There’s no notion of a resource that might go by various possible representations. Content transferred under JSON API’s auspices goes by a JSON-API–specific content-type.

JSON is not a terribly expressive data format; you’ve got the rudiments needed to cobble together more specific data types atop it. That also means there’s little reason you couldn’t translate the data in a JSON API response into another content type, be it BSON, XML, S-expressions, or something even more unique.

Maybe That’s Not JSON API’s Job?

Perhaps the JSON API homepage, rather than the spec, is more honest in its aims:

If you’ve ever argued with your team about the way your JSON responses should be formatted, JSON API can be your anti-bikeshedding tool.

By following shared conventions, you can increase productivity, take advantage of generalized tooling, and focus on what matters: your application.

Clients built around JSON API are able to take advantage of its features around efficiently caching responses, sometimes eliminating network requests entirely.

It takes for granted you’re building an API, and it’s only going to support JSON. Its pitch: Use JSON API so you don’t have to quibble about how you encode your data, and you get this already thought-through support for caching data and minimizing the requests needed for free!

Perhaps JSON API’s audience is specifically API producers, not consumers, and that’s why I don’t find it addressing my needs.

How Is GraphQL Better?

The more declarative approach of GraphQL (and, to a lesser degree, Falcor) fulfills the spec-stated goals better than JSON API does itself.

Heck, it also satisfies those of the homepage better, too!

GraphQL in a Nutshell

The rough idea is:

There is a typed spec for what data is available and how it’s related.
Components can request specific bits they need using a query language. Queries can be typechecked.
A query builder can aggregate component requests into a more general request, coalesce them, and then hit the backend/respond from cache intelligently, without the components needing to worry about this.
The system then vends back to the components precisely the info they requested, no more, no less.

You can see where that’d be handy in a React world of little components asking for this or that nugget of info, which was the background against which GraphQL arose.

No Intrinsic Content-Type

The actual wire protocol is kind of beside the point unless you’re the GraphQL engine implementor. Is it sent by JSON or BSON or MP3s modulated at 56kbps? Is it using HTTP over TCP? Audio blips over SCTP? Who cares! Here’s how the data is laid out when it reaches you, here’s the types of that data; ask for what you need.

No Manual Perftuning

The query optimization bit can get arbitrarily clever without impacting the components, which is excellent for future performance tuning: Upgrade your client-side GraphQL engine, and your app stands to suddenly get more performant, without any further work on your part.

And then I go to an Ember app using JSON API where they have these insane URLs where they’ve got &include=this,that,theotherthing,ohandnowthisthing and the diffs for that insanely long line are so fun to read, lemme tell you. (I prettied up the intro code snippet to use […].join() so it’d be readable at most widths without horizontal scrolling, but that was just one big ol’ string in the source. Ayup.)

It’s this very manual, rough query optimization by hand that I think it’s silly they’re needing to worry about, when they’re not even concerned about query optimization; they just need some data. And the optimization is limited by the size of the records, as well.

Conclusion

JSON API is a rather limited spec that I find flexible to a fault as a client developer. It seems wide open to bikeshedding still on the API producer’s side as well, due to that flexibility, so I’m not sure how well it meets either its spec-stated or marketing-stated aims.

GraphQL offers a declarative approach to directly expressing the data available – which addresses my desire to be able to pull that information without a lot of digging as a client dev – and the data requested – which addresses my desire to be able to pull the data I need without worrying about the details of how it’s going to get to me.

But JSON API slots neatly into an existing niche – an API! in JSON! Hey, I think I’ve used a few of those! While GraphQL is a different bird entirely. Consequently, I’ve yet to use GraphQL, while I have ended up working against a JSON API already, and expect I’ll find myself doing so again in future as well.

JSON API is an incremental improvement, serviceable and certainly no worse than even more thoroughly ad hoc API creations, and so I expect it to spread widely: I expect to run into a lot of JSON API backends, and may never have the chance to consume a GraphQL backend. I’m glad for what sanity JSON API does bring to the wild west of APIs out there.

Thanks to Chris Krycho for feedback on a draft of this article. He tripped over all the awkward transitions so you don’t have to. ;)

Father's Day: Happy Hurricane

Fri, 17 Jun 2016 00:00:00 +0000

This Sunday marks my second Father’s Day as a father. If you’re not yourself a parent, that won’t mean much to you. It certainly didn’t to me. If you’re en route to fatherhood, read on to learn what “fatherhood” actually means.

My experience was that preparation for new parents focused heavily on the birth experience. What I knew of what would follow focused primarily on early childhood development and dangers with a side helping of lactation and baby-wearing. These are good things to know, but they don’t do jack for helping you cope with what having a newborn in your house means for you.

Birth as Loss

As a newborn, your kid is entirely dependent on its parents for everything. You will be eating, breathing, and sleeping baby. Your schedule is its schedule.

Eventually, stuff might get a bit saner. You’ll get nursing sorted out, you’ll find a sleeping arrangement that works for your family, you’ll find some ways to care for your kid.

But there’s an especially strong and demanding pairbond between mother and child, and you might very well feel left out, or more strongly, crowded out: You might feel like you’ve lost your wife to this child. And babies are not terribly relatable creatures, but they are very demanding, and they know no patience. It can feel like a raw deal.

I turned a corner once my son was able to laugh. I could do something, and he could respond to it, and I could relate to that. I think that’s when my baby went from “it” to “he” for me.

Say Goodbye to Life as You Knew It

When they’re a baby, their schedule is yours.

Turns out, that doesn’t really change as they age into toddlerhood.

You’ve Yielded Autonomy

You’ve lost a lot of autonomy by assuming stewardship of an amateur human. Sure, you can stay up late; but if your kid wakes up at 6 am, someone has to be up with them. That someone is likely you. So either you go to bed, or you spend a day tired and cranky, and no good to nobody. When you wake up is no longer your choice, and if you know what’s best for you, then neither is when you go to sleep, either.

More than that: What you do during the day is restricted to what you can do while watching over your child. Maybe you’ll have a quiet kid who is fine sitting and playing with whatever for a while. This won’t be much of a burden to bear. Maybe you’ll have a very interactive and active child who very much wants to do something right now thank you very much and are you watching this because we’re going to do this together. You’ll find your options in that scenario are rather limited.

Young children don’t suffer fools, or delays, gladly. If doing something involves waiting around, especially quietly, you can probably cut it out. Like dinners out with a 45-minute wait spent standing around and ordering drinks at the bar and gabbing to pass the time? Yeah - that’s incompatible with wee ones. Movies out don’t really work either. Picking up donuts to go with a kid in tow can be touch and go if there’s a line. A lot of stuff you took for granted that you could do, you can’t, at least without a sonic tax, possibly with tears attached.

Everyday stuff you take for granted that needs to happen can also become a challenge. Traveling by car means a lot more prep work. Shopping requires half a mind on what exactly your kid is doing with that produce you thought you safely tucked in the cart. And is vacuuming worth a tussle over who gets to control the vacuum? How badly do you need something cleaned, and how clean is clean enough?

You’ve Also Lost Environmental Control

That’s a good segue from loss of personal autonomy into loss of control. As an adult, you have a lot of control over your environment. If you’ve got your own place, you can pretty much stick something wherever, and expect to find it still there. You can demolish a staircase and landing, cut a hole for a new door, and take your time fitting a new door and rebuilding the staircase. You know well enough not to try to leap a storey down onto unforgiving cement. Even your cat’s curiosity isn’t enough to overcome their caution in that case.

Your kid is another matter. Even if they knew well enough that falling down that far would be a bad idea, they’re just not terribly good at moving around and keeping track of the environment in their head. They can easily accidentally walk too far, or lose their balance near an edge, or forget to watch where they’re going because there’s a housefly or a patch of sunlight. So you’ll find yourself reshaping your entire environment to fit their needs and behaviors. And you’ll weigh convenience against how big of a mess they can make if you’re looking elsewhere for thirty seconds. (A salt cellar makes a great mini-sandbox in a pinch, don’t you know?)

It’s a big step from “master of my tiny pocket universe” to “adult graciously allowed to exist as my caregiver and diviner of my needs and desires”.

A Change in the Weather

I found this stifling and isolating. It’s a very peculiar experience to find you’ve more autonomy in your work life than in your home life.

But it has its upsides. And I’d do it again.

Recentering

All those losses are losses from a point of view where you’re at the center of your own universe.

In practice, they’re just side effects of shifting the center from yourself to your family. It’s no longer all about you. There’s not time for you to maintain that illusion any more. Welcome to adulthood.

Learning Patience & Humility

Children are elemental forces. You can’t reason with them for the several years. You can empathize. You can distract with counter-proposals. But you can’t negotiate.

You mostly won’t get your way. You’ll find that it doesn’t even matter that you don’t get your way. You just wanted things to go your way because that’s what you were comfortable with.

You’re going to have to relax control and work with the situation as it presents itself. You’ll leave getting your way for when it matters – when there’s risk to health or safety, or there’s something important enough that it’s worth possibly distressing your kid, stressing everyone around you out, and maybe dealing with some shrieking and crying.

One concrete way this shows up is in learning to be patient. Yeah, I get it, you want to go right now. But your kid doesn’t. And you don’t really need to go right now. You just want to. Suck it up and wait a while. Set the expectation that you will be leaving soon. When the time comes, then you can leave.

Facing Humanity Head-On

Stuff will get broken. Things will go wrong. A lot of things will go wrong. Kids are clumsy, curious, and not bridled by concern for cleanliness, hygiene, or common sense. This is OK.

If you’re a perfectionist, you’re probably accustomed to everything going to plan, and ensuring that you have a plan and execute on it such that everything goes to plan. You won’t be able to exert that level of control, that clockwork execution, when it comes to large parts of your life any more.

You might have spent a couple decades driving out the human. It’s back now in spades, and you’ve no choice but to confront it head on.

This also teaches patience. It teaches you to expect people to stumble, to make mistakes, to err. You’ve probably kind of known that was the case in theory, but it wasn’t your experience before, and it was hard to cut someone a break because you’d worked out how to run things so you didn’t need anyone to cut you a break. Now it is your experience, and theory is practice, and boy, will you be getting a lot of practice. And you’ll probably be really glad when people cut you a break for acting in weird ways, running out of line to snatch up a kid about to get in trouble, walking all throughout the restaurant courtyard following an imp climbing up and down and around things and maybe bumping into people before bouncing off and away onto the next thing.

Having Fun

With kids, the lows can be low, but the highs can be so high. You have a license to be silly and a new set of eyes to experience the world through. You get to look at such commonplaces as trees, birds, and squirrels with fresh awareness and naked joy at their existence and activity. And if you’ve forgotten to play, you’ll learn that anew, too.

The Traumatic Hurricane of Fatherhood

So, Father’s Day. When you were born, you destroyed someone’s world and remade it around you. As a new father, you have to come to terms with the dramatic difference in responsibility, relationships, and rituals that come with this hurricane of a change. It’s sudden and total, but you can build a new and better life in its aftermath.

Here’s to hoping having a second kid is less hurricane and more tropical storm!

Types Complement Tests Complement Types

Thu, 05 May 2016 00:00:00 +0000

Types and tests are complementary. They might even be synergistic: The two together can accomplish what neither can alone. They definitely are not rivalrous goods, and if you’re picking only one, you’re doing yourself a disservice.

If You Have To Pick One, Though

There’s a ceiling to how far we can get with types. Most languages developers work in have rather limited type systems. Most developers lack the skill, practice, and simple exposure to past examples to make use of more powerful type systems. That’s not a slight: Generating those examples today can be a good way to get yourself at least a Masters if not a PhD.

We can push automated testing really far regardless of type system. There’s an abundance of popular literature on the subject. If you want to get better, you don’t have to look far, and you can put what you learn to practice immediately.

If you had to pick between either building 100% TDD’d code in a unityped language or building code with no automated tests in a conventionally typed language, you’d be a fool not to pick the TDD’d codebase.

But You Don’t, So Use Both

You don’t have to choose one or the other. Reject the false dichotomy, chase off its acolytes on their hobby horses, and make the most of all the technologies available to you today to produce better software.

Beyond Type Wars: Types Can Be Tests Too

Thu, 05 May 2016 00:00:00 +0000

Types and tests are not at war. Choose both.

In fact, if we tilt our heads a bit, types are just another flavor of test.

Don’t use just one flavor of testing; use all the tools you have at your disposal to make the best software you can.

Type Wars

Robert C. Martin believes code TDD’d into existence, and so having 100% test coverage by construction, nullifies the value of types:

My own prediction is that TDD is the deciding factor. You don’t need static type checking if you have 100% unit test coverage. And, as we have repeatedly seen, unit test coverage close to 100% can, and is, being achieved. What’s more, the benefits of that achievement are enormous.

Therefore, I predict, that as TDD becomes ever more accepted as a necessary professional discipline, dynamic languages will become the preferred languages. The Smalltalkers will, eventually, win. (Robert C. Martin, “Type Wars”, 2016)

The further your own development practice is from TDD, the more ludicrous this will seem to you.

If you ignore Martin’s emphasis on TDD, and focus instead on the “100% unit test coverage” bit, you’re likely to reject it out of hand: Coverage measures are a very fraught and limited measure. Even if you go “but that’s just line coverage!", well, not even 100% branch coverage suffices to demonstrate freedom from fairly mechanical bugs, never mind more abstract errors in how you’ve implemented whatever half-imagined, unspecified system you’re aiming at.

The Compilation Test

I think he’s leaving a tool on the table, though. Not even any tool: A robust bevy of tests. And a tool that slots neatly into test-driven development.

The more powerful your type system, the more oomph you can get out of simply, “Does it compile?”

Even with Java, though, you can get rather far:

Defining types is very much like writing tests—the compiler continuously checks the types for consistency while we loop back and fix errors. Step 0[, define all the types,] is exactly like normal TDD, except we are making formal statements about the system that the compiler maintains. Could step 0 take a long time? Sure. Maybe with a sufficiently-advanced type system we never even leave step 0. With Java I’m going to hit a wall pretty fast, but not before avoiding many of the worst problems with the Money design. (Ken Fox, “More Typing, Less Testing: TDD with Static Types, Part 2”, 2014)

As that demonstrates, you can usefully incorporate types into test-driven development with thoroughly salutary effects.

It Cramps My Style

It’s true that, once you’ve got a type system, you’re constrained to writing code that fits within its constraints. Often you can ram through something that doesn’t, but it’s uncomfortable and tends to come with some at least syntactic overhead that makes it not nice to do.

TDD puts you under similar constraints, though: In order to achieve test isolation, you have to structure your software differently. You’ve narrowed your collection of possible programs from all those that can be represented in your language to only those that can be test-driven into existence and so all those that yield readily to automated testing.

Both typing and testing constrain what we can do with our code; we accept both limits because they free us to build with more confidence than we’d have without either.

Types AND Tests, Or Types ARE Tests

Whichever way you look at it, use ‘em both.

Your software will be better for it, and you’ll grow to be a better software engineer for the practice.

Beyond Our Ken

Thu, 05 May 2016 00:00:00 +0000

The more I poke around, the more convinced I become that actually knowing what a piece of software is supposed to do is truly rather rare and generally beyond mortal ken. Making it do what you think it should do is nearly beyond our grasp.

If we’re honest with ourselves, we need every tool we can get just to wrangle software into behaving. That means types, that means tests, and that means, yes, even: proofs.

And that also means that proofs need tests, too.

What drove this home was reading a couple papers related to combining proving and testing.

Types and Tests

I’m on record for arguing in favor of using both types and tests to their utmost in both Types Complement Tests Complement Types and Beyond Type Wars.

It’ll come as no surprise what I’m going to recommend here: Use proofs and tests. And also types.

(It’s even less surprising if you’ve run across the Curry-Howard isomorphism, which relates logical proofs and propositions to exhibiting an instance of a type – Propositions as Types – or, more broadly, the notion of computational Trinitarianism. There are some deep connections here, and we should wring them for every last ounce of help they can give us in crafting correct and elegant software.)

Use Proofs AND Tests

This time, it’s not gonna be me saying it, though.

Really, you should use both tests and proofs, not just one or just the other:

This also reinforces the general idea that testing and proving are synergistic activities, and gives us hope that a virtuous cycle between testing and proving can be achieved in a theorem prover. (Zoe Paraskevopoulou et al., “Foundational Property-Based Testing”, 2015)

If you don’t, you’re going to screw up. In small ways often, basically just tripping over your feet, but sometimes in big ways, where no-one can see how to bail you out:

Second, tests complement proofs. We encountered five papers in which explicitly claimed theorems are false as stated. […] In every case, though, rudimentary testing discovered errors missed with pencil-and-paper proofs.

Indeed, we claim that tests complement even machine-checked proofs. As one example, two of the POPLmark solutions that contain proofs of type soundness use call-by-name beta in violation of the specification (Crary and Gacek, personal communication). We believe unit testing would quickly reveal this error.

Even better, one can sometimes test propositions that cannot be validated via proof. […] Testing also removes another obstacle to proof, the requirement that we first state the proposition of interest. Due to its exploratory nature, testing can inadvertently falsify unstated but desired propositions, e.g., that threads block without busy waiting (section 4.4). This is especially true for system-level and randomized testing. To some degree, the same is true of proving, but testing seems to be more effective at covering a broad space of system behaviors. (Casey Klein et al., “Run Your Research: On the Effectiveness of Lightweight Mechanization”, 2012)

Use ALL The Tools

We don’t have to choose just tests.

We don’t have to choose just types.

We don’t have to choose just proofs.

We have an abundance of tools waiting for us to take them up and apply them to our problems. It’s simple and reassuring to reject a whole class of them out of hand; if we pick just one, perhaps we can convince ourselves of our expertise. And you can indeed get quite far with just one. But if you can stomach your own ignorance, you might find you can get even farther by striving to master all these many disciplines.

That Said, Tests Are a Really Mature Technology

Automated testing stopped being rocket science at least a decade ago. If you do nothing else, at least write some automated tests.

Humans suck at repeating mechanical tasks, we’re bad at documenting them, bad at following them, we get bored really easily, and we’re really slow. Be virtuously lazy and sic a computer on your testing, for everyone’s sake.

If you’re going to pick just one of these, pick automated testing, and work it for all it’s worth. (Quite a lot, honestly!)

Here's to iOS apps in F#

Wed, 06 Apr 2016 00:00:00 +0000

At Build 2016, Microsoft announced that Xamarin is free with all versions of Visual Studio, and the Xamarin SDK will be open-sourced.

My first thought was: iOS apps in F#? Lemme at it!

Why F#?

Swift is going through growing pains, and it’s still substantially a statement-oriented language. It’s supposed to be very comfortable if you come from a blocks-and-braces background, with seamless interop with C and Obj-C, and it’s executed on that wonderfully. If you were hoping for a more truly functional language, though, it’s kind of a downer; its gig at the bleeding edge seems more generic programming than functional programming (“C++ done right”).

F# benefits more directly from the long evolution of ML languages. It’s been public longer, and it’s got a good pedigree: Microsoft have done interesting things with all their languages over the past decade, and F# is no exception.

It might be “grass is greener”, but I’d like to take that for a spin and kick its tires, without having to up and move to a completely different target platform.

Interested?

F# for Fun and Profit is really great for learning about F# and why it’s good stuff. It’s organized less like a blog and more like a collection of series of instructional content.

Here’s their one-page summary of “why use F#”. Most of the bullets apply as well for Swift as for F#, but the core difference of expression-orientation rather than statement-orientation – not called out there – matters quite a bit in how easy it is to compose expressions and extract expressions as independent functions. (The workflow/computation expression sugar is quite nice, as well, and of course F# for Fun and Profit has a series teaching it in detail.)

If you prefer to listen rather than read, then check out:

Both of these also get into introducing F# in the workplace, if that’s something you’re motivated to tackle.

Why Xamarin?

Xamarin was $$$ before, but this drops the price of adoption for me (and anyone inheriting my codebase) significantly. Adoption costs matter!

As a bonus, if I can be like, “Hey, you’ll get an iOS app, and you’ll get a pile of platform-independent code you can point at Windows, Mac, or Android afterwards,” that seems like a win all around.

But really, I want a shot to use an expression-oriented language today as my main language.

Warning: Untested Speculation

I haven’t actually tried to do this yet. It might go down in flames in practice when I try to get everything lined up and working together; lots of things sound good in outline that fail in implementation.

If I throw an afternoon to it some time in future, I’ll check back in with an experience report then.

XCTestExpectation Gotchas

Sat, 19 Mar 2016 00:00:00 +0000

XCTestExpectation simplifies testing callback-style code, but some of its design choices make tests using it fragile unless they’re mitigated:

It explodes if everything works right but later than you expected.
It explodes if everything works right more than once.

This article presents two concrete mitigations:

Use weak references to ensure the expectation dies before it can cause you trouble.
Use a different promise API to do your waiting.

Contents:

A Quick Review

XCTestExpectation is the tool Apple’s unit testing framework XCTest provides for coping with asynchronous APIs.

It’s a promise/future with one purpose: to answer the question, “did it get filled in time?”

To use it, you ask the test case to create one or more:

let promise = expectationWithDescription("it'll happen, trust me")

wait a configurable amount of time for every outstanding expectation to get filled:

waitForExpectationsWithTimeout(maxWaitSeconds, handler: nil)

and log a test failure if time runs out before that happens:

Asynchronous wait failed: Exceeded timeout of 1 seconds, with unfulfilled expectations: “it’ll happen, trust me”.

It would have succeeded if it had been filled in time:

promise?.fulfill()

Example: We’ll Call You

You can’t use the XCTest framework from a Playground (rdar://problem/17839045), so you’ll need to throw this in a full-blown project:

Get the code from GitHub

class LateCallback: XCTestCase {
    let callBackDelay: NSTimeInterval = 2


    func testNotWaitingLongEnough() {
        let promiseToCallBack = expectationWithDescription("calls back")
        after(seconds: callBackDelay) { () -> Void in
            print("I knew you'd call!")
            promiseToCallBack.fulfill()
        }

        waitForExpectationsWithTimeout(callBackDelay / 2) { error in
            print("Aww, we timed out: \(error)")
        }
    }
}

Go ahead and run this. Everything works fine – for now:

Test Suite 'All tests' started at 2016-03-19 21:56:49.223
Test Suite 'Tests.xctest' started at 2016-03-19 21:56:49.225
Test Suite 'LateCallback' started at 2016-03-19 21:56:49.225
Test Case '-[Tests.LateCallback testNotWaitingLongEnough]' started.
Aww, we timed out: Optional(Error Domain=com.apple.XCTestErrorDomain Code=0 "The operation couldn’t be completed. (com.apple.XCTestErrorDomain error 0.)")
/Users/jeremy/Github/XCTestExpectationGotchas/Tests/LateCallback.swift:26: error: -[Tests.LateCallback testNotWaitingLongEnough] : Asynchronous wait failed: Exceeded timeout of 1 seconds, with unfulfilled expectations: "calls back".
Test Case '-[Tests.LateCallback testNotWaitingLongEnough]' failed (2.247 seconds).
Test Suite 'LateCallback' failed at 2016-03-19 21:56:51.473.
	 Executed 1 test, with 1 failure (0 unexpected) in 2.247 (2.248) seconds
Test Suite 'Tests.xctest' failed at 2016-03-19 21:56:51.474.
	 Executed 1 test, with 1 failure (0 unexpected) in 2.247 (2.249) seconds
Test Suite 'All tests' failed at 2016-03-19 21:56:51.474.
	 Executed 1 test, with 1 failure (0 unexpected) in 2.247 (2.251) seconds


Test session log:
	/var/folders/63/np5g0d5j54x1s0z12rf41wxm0000gp/T/com.apple.dt.XCTest-status/Session-2016-03-19_21:56:45-vfvzhb.log

Program ended with exit code: 1

Test suite kicks off, everything runs, the test fails due to a timeout while waiting for the expectation to be met, and the process exits. This is how XCTestExpectation is supposed to work.

Kaboom: Missing the Window

We only ran the one test, though. Let’s say you have more tests to run after this one.

We can fake this out by adding a new test method whose name sorts alphabetically after our testNotWaitingLongEnough test that runs the runloop for a bit before exiting.

Conveniently enough, XCTest happens to run tests in alphabetical order, so the test runner will run our first test, then run this second one, then exit.

Here’s our new test method:

    func testZzz() {
        print("Let's just wait a while…")
        let tillAfterCallBack = callBackDelay
        spin(forSeconds: tillAfterCallBack)
        print("Yawn, that was boring.")
    }

Let’s see what happens (or you can skip to the summary):

Test Suite 'All tests' started at 2016-03-19 22:19:31.796
Test Suite 'Tests.xctest' started at 2016-03-19 22:19:31.798
Test Suite 'LateCallback' started at 2016-03-19 22:19:31.798
Test Case '-[Tests.LateCallback testNotWaitingLongEnough]' started.
Aww, we timed out: Optional(Error Domain=com.apple.XCTestErrorDomain Code=0 "The operation couldn’t be completed. (com.apple.XCTestErrorDomain error 0.)")
/Users/jeremy/Github/XCTestExpectationGotchas/Tests/LateCallback.swift:16: error: -[Tests.LateCallback testNotWaitingLongEnough] : Asynchronous wait failed: Exceeded timeout of 1 seconds, with unfulfilled expectations: "calls back".
Test Case '-[Tests.LateCallback testNotWaitingLongEnough]' failed (2.202 seconds).
Test Case '-[Tests.LateCallback testZzz]' started.
Let's just wait a while…
2.0: finished waiting
I knew you'd call!
2016-03-19 22:19:34.001 xctest[92369:96447173] *** Assertion failure in -[XCTestExpectation fulfill], /Library/Caches/com.apple.xbs/Sources/XCTest/XCTest-9530/XCTestFramework/Classes/XCTestCase+AsynchronousTesting.m:451
2016-03-19 22:19:34.002 xctest[92369:96447173] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'API violation - called -[XCTestExpectation fulfill] after the wait context has ended for calls back.'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff897ec03c __exceptionPreprocess + 172
	1   libobjc.A.dylib                     0x00007fff8674276e objc_exception_throw + 43
	2   CoreFoundation                      0x00007fff897ebe1a +[NSException raise:format:arguments:] + 106
	3   Foundation                          0x00007fff8b98b99b -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 195
	4   XCTest                              0x000000010006f149 -[XCTestExpectation fulfill] + 302
	5   Tests                               0x00000001006858ab _TFFC5Tests12LateCallback24testNotWaitingLongEnoughFS0_FT_T_U_FT_T_ + 203
	6   Tests                               0x0000000100685c4f _TFF5Tests5afterFT7secondsSd4callFT_T__T_U_FT_T_ + 367
	7   Tests                               0x0000000100685de7 _TTRXFo__dT__XFdCb__dT__ + 39
	8   libdispatch.dylib                   0x00007fff8301f700 _dispatch_call_block_and_release + 12
	9   libdispatch.dylib                   0x00007fff8301be73 _dispatch_client_callout + 8
	10  libdispatch.dylib                   0x00007fff8302d6a0 _dispatch_after_timer_callback + 77
	11  libdispatch.dylib                   0x00007fff8301be73 _dispatch_client_callout + 8
	12  libdispatch.dylib                   0x00007fff830284e6 _dispatch_source_latch_and_call + 721
	13  libdispatch.dylib                   0x00007fff8302093b _dispatch_source_invoke + 412
	14  libdispatch.dylib                   0x00007fff8302c5aa _dispatch_main_queue_callback_4CF + 416
	15  CoreFoundation                      0x00007fff8973f3f9 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9
	16  CoreFoundation                      0x00007fff896fa68f __CFRunLoopRun + 2159
	17  CoreFoundation                      0x00007fff896f9bd8 CFRunLoopRunSpecific + 296
	18  Foundation                          0x00007fff8b953b29 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 278
	19  Foundation                          0x00007fff8b971d9e -[NSRunLoop(NSRunLoop) runUntilDate:] + 108
	20  Tests                               0x0000000100685262 _TF5Tests4spinFT10forSecondsSd_T_ + 162
	21  Tests                               0x000000010068510f _TFC5Tests12LateCallback7testZzzfS0_FT_T_ + 207
	22  Tests                               0x00000001006852a2 _TToFC5Tests12LateCallback7testZzzfS0_FT_T_ + 34
	23  CoreFoundation                      0x00007fff896c37bc __invoking___ + 140
	24  CoreFoundation                      0x00007fff896c3612 -[NSInvocation invoke] + 290
	25  XCTest                              0x0000000100022598 __24-[XCTestCase invokeTest]_block_invoke_2 + 159
	26  XCTest                              0x000000010005602e -[XCTestContext performInScope:] + 184
	27  XCTest                              0x00000001000224e8 -[XCTestCase invokeTest] + 169
	28  XCTest                              0x0000000100022983 -[XCTestCase performTest:] + 443
	29  XCTest                              0x0000000100020654 -[XCTestSuite performTest:] + 377
	30  XCTest                              0x0000000100020654 -[XCTestSuite performTest:] + 377
	31  XCTest                              0x0000000100020654 -[XCTestSuite performTest:] + 377
	32  XCTest                              0x000000010000e892 __25-[XCTestDriver _runSuite]_block_invoke + 51
	33  XCTest                              0x0000000100033a1b -[XCTestObservationCenter _observeTestExecutionForBlock:] + 611
	34  XCTest                              0x000000010000e7db -[XCTestDriver _runSuite] + 408
	35  XCTest                              0x000000010000f38a -[XCTestDriver _checkForTestManager] + 696
	36  XCTest                              0x000000010005729f _XCTestMain + 628
	37  xctest                              0x0000000100001dca xctest + 7626
	38  libdyld.dylib                       0x00007fff8b25f5c9 start + 1
)
libc++abi.dylib: terminating with uncaught exception of type NSException
(lldb)

And now we’re sitting at the debugger. Oof, that smarts.

Take a look at what’s going on in that backtrace:

Our Zzz test is hanging out running the runloop.
The after(seconds:call:) finishes waiting and calls its callback.
The callback fulfills an expectation belonging to the (already finished, already failed) first test
This trips a “you’re holding it wrong” assertion in the test framework:

Terminating app due to uncaught exception ‘NSInternalInconsistencyException’, reason: ‘API violation - called -[XCTestExpectation fulfill] after the wait context has ended for calls back.’

You might run up against this in practice when writing integration tests against a live, but not always quick to respond, backend service.

Kaboom: Calling Twice

That’s not the only way things can go wrong.

What happens if our callback has at-least-once rather than exactly-once behavior, and happens to call back twice?

class DoubleCallback: XCTestCase {
    func testDoubleTheFulfillment() {
        let promiseToCallBack = expectationWithDescription("calls back")
        let callBackDelay: NSTimeInterval = 1

        twiceAfter(seconds: callBackDelay) {
            print("i hear you calling me")
            promiseToCallBack.fulfill()
        }

        let afterCallBack = 2 * callBackDelay
        waitForExpectationsWithTimeout(afterCallBack, handler: nil)
    }
}

This is what happens (or skip to the summary)

Test Suite 'Selected tests' started at 2016-03-19 22:38:09.451
Test Suite 'DoubleCallback' started at 2016-03-19 22:38:09.452
Test Case '-[Tests.DoubleCallback testDoubleTheFulfillment]' started.
1.0: finished waiting
now once
i hear you calling me
now twice
i hear you calling me
2016-03-19 22:38:10.567 xctest[93147:96490281] *** Assertion failure in -[XCTestExpectation fulfill], /Library/Caches/com.apple.xbs/Sources/XCTest/XCTest-9530/XCTestFramework/Classes/XCTestCase+AsynchronousTesting.m:450
2016-03-19 22:38:10.568 xctest[93147:96490281] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'API violation - multiple calls made to -[XCTestExpectation fulfill] for calls back.'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff897ec03c __exceptionPreprocess + 172
	1   libobjc.A.dylib                     0x00007fff8674276e objc_exception_throw + 43
	2   CoreFoundation                      0x00007fff897ebe1a +[NSException raise:format:arguments:] + 106
	3   Foundation                          0x00007fff8b98b99b -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 195
	4   XCTest                              0x000000010006f0bb -[XCTestExpectation fulfill] + 160
	5   Tests                               0x0000000100795c6b _TFFC5Tests14DoubleCallback24testDoubleTheFulfillmentFS0_FT_T_U_FT_T_ + 203
	6   Tests                               0x0000000100795e05 _TFF5Tests10twiceAfterFT7secondsSd4callFT_T__T_U_FT_T_ + 389
	7   Tests                               0x0000000100794eff _TFF5Tests5afterFT7secondsSd4callFT_T__T_U_FT_T_ + 367
	8   Tests                               0x0000000100795097 _TTRXFo__dT__XFdCb__dT__ + 39
	9   libdispatch.dylib                   0x00007fff8301f700 _dispatch_call_block_and_release + 12
	10  libdispatch.dylib                   0x00007fff8301be73 _dispatch_client_callout + 8
	11  libdispatch.dylib                   0x00007fff8302d6a0 _dispatch_after_timer_callback + 77
	12  libdispatch.dylib                   0x00007fff8301be73 _dispatch_client_callout + 8
	13  libdispatch.dylib                   0x00007fff830284e6 _dispatch_source_latch_and_call + 721
	14  libdispatch.dylib                   0x00007fff8302093b _dispatch_source_invoke + 412
	15  libdispatch.dylib                   0x00007fff8302c5aa _dispatch_main_queue_callback_4CF + 416
	16  CoreFoundation                      0x00007fff8973f3f9 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9
	17  CoreFoundation                      0x00007fff896fa68f __CFRunLoopRun + 2159
	18  CoreFoundation                      0x00007fff896f9bd8 CFRunLoopRunSpecific + 296
	19  Foundation                          0x00007fff8b953b29 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 278
	20  XCTest                              0x000000010006e6e8 -[XCTestCase(AsynchronousTesting) waitForExpectationsWithTimeout:handler:] + 1083
	21  Tests                               0x00000001007954d6 _TFC5Tests14DoubleCallback24testDoubleTheFulfillmentfS0_FT_T_ + 614
	22  Tests                               0x0000000100795722 _TToFC5Tests14DoubleCallback24testDoubleTheFulfillmentfS0_FT_T_ + 34
	23  CoreFoundation                      0x00007fff896c37bc __invoking___ + 140
	24  CoreFoundation                      0x00007fff896c3612 -[NSInvocation invoke] + 290
	25  XCTest                              0x0000000100022598 __24-[XCTestCase invokeTest]_block_invoke_2 + 159
	26  XCTest                              0x000000010005602e -[XCTestContext performInScope:] + 184
	27  XCTest                              0x00000001000224e8 -[XCTestCase invokeTest] + 169
	28  XCTest                              0x0000000100022983 -[XCTestCase performTest:] + 443
	29  XCTest                              0x0000000100020654 -[XCTestSuite performTest:] + 377
	30  XCTest                              0x0000000100020654 -[XCTestSuite performTest:] + 377
	31  XCTest                              0x000000010000e892 __25-[XCTestDriver _runSuite]_block_invoke + 51
	32  XCTest                              0x0000000100033a1b -[XCTestObservationCenter _observeTestExecutionForBlock:] + 611
	33  XCTest                              0x000000010000e7db -[XCTestDriver _runSuite] + 408
	34  XCTest                              0x000000010000f38a -[XCTestDriver _checkForTestManager] + 696
	35  XCTest                              0x000000010005729f _XCTestMain + 628
	36  xctest                              0x0000000100001dca xctest + 7626
	37  libdyld.dylib                       0x00007fff8b25f5c9 start + 1
)
libc++abi.dylib: terminating with uncaught exception of type NSException
(lldb)

We trip yet another assertion in XCTest:

Terminating app due to uncaught exception ‘NSInternalInconsistencyException’, reason: ‘API violation - multiple calls made to -[XCTestExpectation fulfill] for calls back.’

This probably does indicate an actual error in the code calling the callback much of the time, but if it doesn’t, you’ll want to know about and be able to dodge this assertion, too.

What’s Wrong?

This double-callback scenario calls back twice in succession. But if there were a delay between the first and second call back, and the test runner happened to exit during that delay, you’d get a successful test run rather than crashing every time.

With a delay between callbacks, you’d only trip the assertion when other tests kept the test runner process running long enough.

This situation parallels that of the too-late callback: no problems till appear till something else runs out the clock.

This is tricky:

You won’t ever trip them when you’re banging away at whatever the latest test you’re working on is, because a test runner running just that async test will exit as soon as the wait-timer runs out, before the too-late/second callback can occur.
You might not even trip them when you run your whole test suite at first, because they might be the last test in the run or the tests that follow don’t run for long enough.

This is also obnoxious to run into: When an assertion trips, it bombs the entire test process. (Unwrapping an implicitly unwrapped optional to find a nil has the same effect.)

These assertions aren’t test failures that would allow testing to continue; instead, XCTest treats as programmer error:

Fulfilling a promise after its test has already finished
Filling an already-filled promise

To be fair, these cases are called out in the documentation for XCTestExpectation.fulfill():

Call -fulfill to mark an expectation as having been met. It’s an error to call -fulfill on an expectation that has already been fulfilled or when the test case that vended the expectation has already completed.

but the documentation isn’t explicit that “it’s an error” translates to “and it will bomb your whole test process”.

Avoiding These Assertions

In both cases, the problem is that we’re calling fulfill when we shouldn’t. So let’s not do that.

Let the Expectation Die With the Test

XCTest actually hangs on to the expectations it creates so it can collect them during the wait call.

Our test method doesn’t need yet another strong reference to the expectation; if we instead work with a weak reference in our callback closure, the expectation will die with our test, rather than lingering for us to trip over after the test has completed, and we’ll have turned our callback into a no-op.

First, neuter the time-bombed testNotWaitingLongEnough by prefixing its name with an x so it won’t get picked up by the test runner any more:

 class LateCallback: XCTestCase {
     let callBackDelay: NSTimeInterval = 2


-    func testNotWaitingLongEnough() {
+    func xtestNotWaitingLongEnough() {
         let promiseToCallBack = expectationWithDescription("calls back")
         after(seconds: callBackDelay) { () -> Void in

Now clone it, but this time, use a weak reference to the expectation:

    func testPreparedForNotWaitingLongEnough() {
        weak var promiseToCallBack = expectationWithDescription("calls back")
        after(seconds: callBackDelay) { () -> Void in
            guard let promise = promiseToCallBack else {
                print("too late, buckaroo")
                return
            }

            print("I knew you'd call!")
            promise.fulfill()
        }

        waitForExpectationsWithTimeout(callBackDelay / 2) { error in
            print("Aww, we timed out: \(error)")
        }
    }

Run the LateCallback suite again, and the logs now look like (or skip to the summary):

Test Suite 'Selected tests' started at 2016-03-19 23:19:19.980
Test Suite 'LateCallback' started at 2016-03-19 23:19:19.981
Test Case '-[Tests.LateCallback testPreparedForNotWaitingLongEnough]' started.
Aww, we timed out: Optional(Error Domain=com.apple.XCTestErrorDomain Code=0 "The operation couldn’t be completed. (com.apple.XCTestErrorDomain error 0.)")
/Users/jeremy/Github/XCTestExpectationGotchas/Tests/LateCallback.swift:34: error: -[Tests.LateCallback testPreparedForNotWaitingLongEnough] : Asynchronous wait failed: Exceeded timeout of 1 seconds, with unfulfilled expectations: "calls back".
Test Case '-[Tests.LateCallback testPreparedForNotWaitingLongEnough]' failed (1.945 seconds).
Test Case '-[Tests.LateCallback testZzz]' started.
Let's just wait a while…
2.0: finished waiting
too late, buckaroo
2.0: all done here
Yawn, that was boring.
Test Case '-[Tests.LateCallback testZzz]' passed (2.004 seconds).
Test Suite 'LateCallback' failed at 2016-03-19 23:19:23.932.
	 Executed 2 tests, with 1 failure (0 unexpected) in 3.950 (3.951) seconds


Test session log:
	/var/folders/63/np5g0d5j54x1s0z12rf41wxm0000gp/T/com.apple.dt.XCTest-status/Session-2016-03-19_23:19:16-QZf0lq.log

Test Suite 'Selected tests' failed at 2016-03-19 23:19:23.933.
	 Executed 2 tests, with 1 failure (0 unexpected) in 3.950 (3.953) seconds
Program ended with exit code: 1

Our testZzz runs to completion and passes, and the test process exits on its own terms reporting the one failure.

The late callback still happened, but by that time, promiseToCallBack had been zeroed, so we never called fulfill().

Assertion: Dodged!

Kill the Expectation Proactively

What about the double-callback case? We can use the same trick, only this time, we’ll want to proactively annihilate the expectation:

    func testSafelyDoubleTheFulfillment() {
        weak var promiseToCallBack = expectationWithDescription("calls back")
        let callBackDelay: NSTimeInterval = 1

        twiceAfter(seconds: callBackDelay) {
            guard let promise = promiseToCallBack else {
                print("once was enough, thanks!")
                return
            }

            promise.fulfill()
            promiseToCallBack = nil
        }

        let afterCallBack = 2 * callBackDelay
        waitForExpectationsWithTimeout(afterCallBack, handler: nil)
    }

With the unsafe test neutered via the prefix-x trick, running the test class gives (or skip to the summary):

Test Suite 'Selected tests' started at 2016-03-19 23:22:56.356
Test Suite 'DoubleCallback' started at 2016-03-19 23:22:56.357
Test Case '-[Tests.DoubleCallback testSafelyDoubleTheFulfillment]' started.
1.0: finished waiting


Test session log:
	/var/folders/63/np5g0d5j54x1s0z12rf41wxm0000gp/T/com.apple.dt.XCTest-status/Session-2016-03-19_23:22:51-14ywpS.log

now once
i hear you calling me
now twice
once was enough, thanks!
wasn't that nice?
1.0: all done here
Test Case '-[Tests.DoubleCallback testSafelyDoubleTheFulfillment]' passed (1.099 seconds).
Test Suite 'DoubleCallback' passed at 2016-03-19 23:22:57.457.
	 Executed 1 test, with 0 failures (0 unexpected) in 1.099 (1.100) seconds
Test Suite 'Selected tests' passed at 2016-03-19 23:22:57.458.
	 Executed 1 test, with 0 failures (0 unexpected) in 1.099 (1.102) seconds
Program ended with exit code: 0

Since we explicitly set the promise to nil, we only end up fulfilling it once. No harm, no foul.

Use a Different Promise API

If you’ve got an API written in terms of a promise/future library already, such as Deferred, then there’s no need to use XCTest’s promises:

class BringYourOwnPromises: XCTestCase {
    let anyDelay: NSTimeInterval = 1


    func testGettingAPony() {
        let futurePony = giveMeAPony(after: anyDelay)

        let longEnough = anyDelay + 1
        guard let pony = futurePony.wait(.Interval(longEnough)) else {
            XCTFail("no pony ;_;")
            return
        }

        print("we got a pony! \(pony)")
    }


    func testWhenImpatientNoPonyForYou() {
        let futurePony = giveMeAPony(after: anyDelay)

        guard let pony = futurePony.wait(.Now) else {
            print("no patience, no pony")
            return
        }

        XCTFail("we got a pony???! \(pony)")
    }


    func testZzzDoesNotCrash() {
        spin(forSeconds: 2 * anyDelay)
    }
}

Summary

Always assign your expectations to a weak reference, and then bail in your callback if it’s nil.
In the rare case where you expect your callback to be triggered more than once, you can avoid fulfilling by annihilating your weak reference after fulfilling it and then ignoring future calls.
- More likely, you know how many times you should be called, and you’ll want to fulfill the promise only on the last call. But the workaround is there if you need it.
If you’re already working with a promise-based API, you can skip XCTestExpectation and use whatever wait-and-see API is provided by that promise instead of XCTest’s own.
- This has the added advantage of linearizing your test code by eliminating the need to handle the delivered value in the closure (or manually shuttle it out to assert against after the XCTest wait has finished).

Embedded Content Contains Swift

Sun, 06 Mar 2016 00:00:00 +0000

If you’re developing a QuickLook plugin using Swift, make sure you flip on the EMBEDDED_CONTENT_CONTAINS_SWIFT build setting for the target, otherwise bundle loading will fail in a spectacularly unhelpful way.

Creating a Mixed-Language QuickLook Plugin

Recently I decided to add a QuickLook plugin to my ImageSlicer utility app.

The default QuickLook plugin template stamps out an entirely C plugin. Changing the thumbnail/preview template files to have a .m suffix put us back in Obj-C land, but getting to Swift land takes a couple more steps.

Not to worry: Add a new Swift file to the target, and Xcode will offer to make bridging easy-peasy for you. Give it the go-ahead, and you should be good to go, right?

I add the main model and view classes from my app project to the QuickLook target, wire stuff up to load the document and render the view, and everything compiles and links all happy-like. Let’s test this thing!

Gatekeeper?

I fire up qlmanage, point it at my generator and a .slicedimage document, and I see That Error:

The bundle “QuickLookSlicedImage” couldn’t be loaded because it is damaged or missing necessary resources.

I’ve seen this error way too many times when I grab an older app bundle off the Internet. Every time before, “damaged or missing necessary resources” has been code for “no-one signed this app bundle”.

I’m asking the system to execute code, so, sure, that kind of makes sense?

I hare off looking at using spctl to whitelist my bundle, successfully whitelist it with spctl --add --label JWSDev path/to/QuickLookSlicedImage.qlgenerator, and spctl --assess is OK with it.

Let’s try again.

Not Gatekeeper

I see the same error. Hrm. What if it really is missing something? Now I want to see the smoking gun.

After sufficient rooting around, I eventually work through to where it loads the bundle, then the plugin, then finally to where the real business happens: dlopen.

After the call to dlopen, the CFBundle machinery checked for success with dlerror, and that gave me an actually informative error message (which I’ve abbreviated and hard-wrapped for readability):

(lldb) x/s $rax
0x100576819: "dlopen(LONG_PATH/QuickLookSlicedImage, 262):
Library not loaded: @rpath/libswiftAppKit.dylib\n
  Referenced from: LONG_PATH/QuickLookSlicedImage\n
  Reason: image not found"

Yup, missing Swift dylibs.

EMBEDDED_CONTENT_CONTAINS_SWIFT

The fix is to tell Xcode to copy all the Swift dylibs the built product needs into its bundle using the build setting EMBEDDED_CONTENT_CONTAINS_SWIFT=YES.

(The other fix is to ensure qlmanage is actually running the generator you’re building now, not the generator embedded in the copy of your app you built an hour or two ago that still has the missing-dylib issue. Oops.)

Take-Away

The take-away is this:

When Xcode offers to add a Swift–Obj-C bridging header for you,
Then that means the target was not previously configured for Swift,
And you should probably ensure that EMBEDDED_CONTENT_CONTAINS_SWIFT=YES gets set for the target.

The “probably” is there because, if you’re baking it into an app bundle that’s already embedding the Swift dylibs, you could probably mess with the rpath to get it to share those rather than having Yet Another Copy of the Swift support dylibs in your app bundle.

But that’ll be a pain, and disk space is cheap, so you’ll probably still want to just flip on EMBEDDED_CONTENT_CONTAINS_SWIFT=YES.

Review: SE-0026: Abstract classes and methods

Mon, 29 Feb 2016 00:00:00 +0000

This is a review of SE-0026 “Abstract classes and methods”.

I am against the acceptance of this proposal:

It lacks a clear problem.
The leap from a nebulous problem to abstract classes as the solution is a non sequitur.
Its arguments are insufficient to justify the complication it would add to Swift, which is contrary the simplification and clarification aims of the Swift community.

The contrast is sharpened by comparison to the Python Enhancement Proposal that accompanied the introduction of abstract base classes into Python. The present proposal fails to provide a correspondingly thoughtful rationale.

No Clear Problem

The proposal itself does little to define a practical problem, and less to explain how abstract classes solve this problem better than alternatives. It feels like a solution in want of a problem, which is the opposite of a considered addition to the language.

As best I can determine, the primary problem introduced is that of wanting to have abstract properties. The example given is better resolved by providing the url as a constructor argument, as noted by Stephen Celis. Further, the immediate solution appears to be to argue in favor of uniform access as found in Self and Eiffel, not abstract base classes, which compound non-uniform access by a further serving of complexity.

Another problem mentioned is lack of easy delegation of implementation in the context of protocols; providing a simple way to proxy calls to another object would present a promising and useful avenue for resolving this problem that would also compose more generally with the rest of the language. NSProxy has always been somewhat awkward in this regard; perhaps we can do better in Swift?

No Clear Significance

Without a clear problem to address, it becomes difficult to evaluate the significance of the problem.

Ultimately, it’s unclear precisely what the problem under consideration is, unless the problem is stated simply as, “Swift doesn’t have abstract base classes.” If that is the true problem considered to address, then it seems especially insignificant; Swift also lacks good support for relational programming à la mini-kanren, but a difference does not a problem make.

If we focus on “no abstract classes” as the problem, then the problem appears insignificant: Smalltalk and Objective-C have both made do without formal support for abstract classes. Objective-C went so far as to remove subclassResponsibility from the common language vocabulary, which eliminated all inbuilt support for abstract classes. Never have I heard either Smalltalker or Obj-C hacker end up despondent and cursing over the lack of built-in abstract class support in these languages.

Compared to Python’s Rationale for Adding Abstract Classes

It is interesting to consider the motivation for adding abstract base class support to Python as explained in PEP 3119.

In Python’s case, the decision was motivated by the desire for a reliable means to test particularly for some shared quality of a group of objects - basically, a reliable respondsToSelector: or isKindOfClass: that allows detecting this quality without incidental risk of false positives or negatives (“Rationale”).

As a result, Python adopted abstract base classes as an alternative to interfaces (“ABCs vs. Interfaces”). But Swift already has interfaces in the form of protocols; this answers the need that motivated the addition of abstract base classes to Swift.

Because we cannot borrow the rationale used for adding abstract base classes to Python, and the document before us spends its effort explaining abstract base classes rather than the problem they would solve, it remains for those arguing for the added formal complexity of abstract base classes to motivate their addition in the context of Swift. The current proposal is manifestly lacking in this regard.

Out of Alignment with Swift

Adding abstract class support to Swift seems unprincipled. I cannot see what problem would be solved, and Swift is working towards considered language growth, and even better, language contraction, at this point in time. Adding abstract base classes would feel like nodding to feature agglutination by cargo cult, not the careful evolution we aspire to.

Effort

I read the article and then looked at the arguments in favor of supporting abstract base classes in Python for comparison. I would love to see a rationale as tailored to Swift and to real problems as PEP 3119 was to Python and its programmers’ problems! In Python’s case, “[m]uch of the thinking that went into the proposal [was] not about the specific mechanism of ABCs, as contrasted with Interfaces or Generic Functions (GFs), but about clarifying philosophical issues[…].” This sort of laborious semantic work is a necessary accompaniment to any significant proposed changes to an object system, and that thought is unfortunately not apparent in this proposal.

This article was originally posted to swift-evolution on 28 February 2016.

Go Versions and the Open-Closed Principle

Wed, 24 Feb 2016 00:00:00 +0000

People aren’t happy about Go’s approach to managing software versions:

aren’t different API versions supposed to live at different import paths in Go? This works great if you have a proprietary codebase, are using a monorepo, and don’t support the sharing culture of open source. And, it doesn’t address the issue of minor or patch versions.

Hello, Open-Closed Principle

The funny thing is that Go’s official version management approach is effectively a strict reading of the open-closed principle as applied to libraries rather than classes.

The “fork it and rename it” approach was actually the way the principle was originally introduced for classes.

You want to change how a class works?

Fine, subclass it and make your changes.

Dependents can adopt MyVeryOwnFooV35 at their convenience, rather than you just stomping on the one and only MyVeryOwnFoo class in the project.

But That’s Crazy Talk!

Yeah, it didn’t much catch on in object-oriented programming, either, in spite of being enshrined in the SOLID acronym.

Apparently Gophers think it’s equally crazy for libraries (ibid):

Can you imagine that every time a library needs to increment a major version it needs to create a new repo on GitHub? Yeah, no one does that. The path for major API version is a Go thing. It’s not intuitive. Someone had to tell me. And, many Go developers just don’t do it. If they did there would be no reason for gopkg.in.

People Actually Do That

I can imagine it, and people actually do it. Check out the Creating Stable Releases section of the Collective Code Construction Contract. This is the social contract that governs development of ZeroMQ, amongst a few other projects.

Every time they want to make a stable version, they shard off a new repo for that version, with its own steward.

Thus, every time ZeroMQ needs to increment a major, or minor, or patch version, they need to fork a new repo. Mainline development continues on the main repo, and the stable release repo gets its own repo, its own maintenance patches, and its own name in the form of repo URL.

Why Don’t We Do That in OOP?

I think we don’t do this for OOP precisely because we find ourselves in the monorepo scenario that let Google avoid introducing package management into go. Most object-oriented projects live in one repo, so we can readily coordinate changes across the codebase: we don’t need to fork a new subclass, because we can just update all callers to play ball with the new version.

Housekeeping

Thu, 07 Jan 2016 00:00:00 +0000

This post automagically appeared on the site thanks to a post-receive hook. Every prior post was written, compiled, and rsynced from my laptop. No more!

Now: I can post from my phone using Working Copy.

Later: I’ll work out handling for microposts, so I can send those here and sync to ADN after.

Later still: Figuring out a good workflow for link blogging from my phone. For ADN, I’ve got a very slick workflow using @dasdom‘s wonderful Jupp sharing extension, and I’ve seen how any hiccups in that workflow significantly reduce how much I share my reading with ADN.

Do more of less

Sat, 21 Nov 2015 00:00:00 +0000

The most valuable lesson of Kanban is to limit work in progress. At the personal level, this jives with studies showing that humans suck at multitasking.

This is a hard lesson for me: My life is littered with the detritus of works begun, works planned, resources squirreled away against a future that rarely comes back to them.

A messy desk or hard-drive becomes an oppressive labyrinth: one sits down for a purpose, only to have all one’s energies dispersed for nothing down the forking hallways of might-have-beens.

Facing this honestly is terrifying: It means admitting one might never pursue that avenue, never chase that morning’s dream. It means confronting the brief spark that is human life; no, that’s not what frightens: rather, the dark that follows, as persistence of vision gives way to vanishing memory, and one’s name and deeds fade forgotten.

***

Please forgive my messy desk; the dark is waiting, and I would but close my eyes a while longer.

Agile

Fri, 13 Nov 2015 00:00:00 +0000

I take agile as rejecting the notion of estimation as having value. In the event you have a deadline, the best you can hope for is to deliver as much working software as you can before that deadline. Dithering over what’s going to fall on which side of the deadline is time better spent delivering a feature and winnowing out low-value crap that came along with the high-value bits of your original ideas so you don’t waste time implementing the dross.

If I do need to estimate, I reject the silly notion of giving a ludicrously precise single value, and instead give a more honest pair of (estimate, complexity), where complexity is rated on a scale from “I’ve done this a hundred times” to “nobody in the world has ever done this” (http://lizkeogh.com/2013/07/21/estimating-complexity/). And potentially give a range instead of a single value, though some sort of highly-skewed normal distribution might be better.

I similarly reject the notion of “backlog” as wasting time counting your chickens before they’ve hatched (http://ronjeffries.com/articles/015-10/the-backlog/article.html). The only project artifact that matters is the running code you have at the end of a sprint. Everything else is BS; use whatever support tools you need, but don’t confuse your list of dreams with what you have in hand now. If you can’t ship what you have now? You’re probably setting yourself up for serious pain and suffering when the budget suddenly runs out, or your main dev gets moved to another project, or quits, or…

I joke that the product champion should keep a stack of might-wants in a Trello board with “a pony” at the bottom. No-one ever gets everything they want implemented in software; most of those inchoate wants are a mix of some good and valuable ideas and a bunch of lossy cruft that would be a waste of time to do anyway. (Also: diminishing returns.) We forget human finity at our, and our projects’, peril.

Many project management artifacts and behaviors seem smoke and mirrors rituals attempted in the vain hope of preventing the dread manifestation of Learning and consequent Change. Unfortunately, no matter how many ways we invent to scream, “COME NOT IN THAT FORM!” into the unknown, we remain saddled with imperfect knowledge, or alternatively, blessed with the joy of learning ever more and new things about our domain of interest. Dispensing with these distractors – from the primitive state of both today’s tools and the discipline of software development as a whole, and more generally from our own cloud of unknowing – is terrifying, but addressing reality head-on frees you to make the best use of the precious time and limited tools you have.

Updating Plex on Synology NAS

Sun, 08 Nov 2015 00:00:00 +0000

My family has been using ChromeCast to send YouTube videos to the TV. While flipping through the ChromeCast app on my phone, I noticed Plex integrates with ChromeCast. Funny enough, Synology also ships a Plex server package. How hard could this be?

ChromeCast: Easy Come, Easy Stow

I have a very curious toddler.

The ChromeCast is easy to hook up and break down as needed when you want to use it, and there’s not much to break. It was a simple and immediate solution to make the TV usable again without needing to run any cables or install a shelf outside toddler reach.

Before this, we spent several months with the TV completely unplugged after we dismantled the entertainment center and mounted the TV to the wall as part of making our living room child-resistant.

(Child-proof vs child-resistant is like waterproof/water-resistant: Nature finds a way, and all we can do is try to hold out – in this case, till an adult notices a curious and ill-omened silence.)

Plex Client Is Picky; Synology Plex Is Old

Setting up the app on my phone was fairly easy. I needed to create a new login, which, yawn, but 1Password is with me.

Installing the package on my NAS was also one-click.

Getting them talking to each other was a bloody mess. Plex client is very aggressive about not working with older versions of Plex server, which meant that right now, it didn’t work at all with the version packaged by Synology.

Installing a Manual Package

Luckily, Plex packages Plex Server for Synology (and several other flavors of NAS) themselves.

Their instructions only cover a small part of the install process, though. What papered over the gap for me is this article. (That article has pictures, unlike this one.)

Here are the steps I followed:

Check your processor type in the Synology Control Panel
Download the package for that processor type from Plex Downloads
Download the Plex package signing key linked from here
Verify the md5sum they give you for what comfort that might give. (md5sum? Really?)
Open the Synology package center and hit the Settings button:
- On the General pane, widen your trust from just Synology to Synology plus trusted publishers.
- On the Certificate pane, upload the PlexSign.key you just grabbed.
  - Now Plex is a trusted publisher.
Exit the Settings modal and hit the Manual Install button.
- Select the Plex package you downloaded.
- Wait for it to upload, then OK the install.

Unfortunately, Plex doesn’t seem to publish a stream of updates, just individual packages, so when the client yells at you again about the server being too old, you get to repeat most of this dance.

Gotchas

Potential: Synology vs Plex package differences

I read some tales of issues with swiching between the two packages. I know that installing the Synology version first and then the Plex version worked fine for me. Your mileage may vary. If it breaks, you might need to pop the hood and ssh in to see what’s gone wrong.

I didn’t encounter any issues myself, so I wouldn’t worry about this unless you run into it.

Derp: Manual packages must be uploaded from the client

If you’re like me, you might think like this:

I will need to install this package to the NAS.
The package file needs to end up on the NAS eventually.
Downloading the package file directly to the NAS using Download Station will save transfer time.

You’re right in theory, and wrong in practice, because the manual install flow only lets you select a local file to upload. Let me say this again: There is no way to point the manual install wizard at a package that’s already downloaded to the NAS. You have to upload it from your local machine directly to the manual package installer.

The fun end result of this is that, if you downloaded the file to the NAS to begin with, you now get to download the file from your NAS so you can upload it back for the manual package install flow.

The steps I listed above skip this time-wasting cleverness.

Conclusion

Plex would be a lot easier to use if they’d do a better job of preserving client–server compatibility across versions.

If you’ve been looking for an excuse to wander into manual Synology package installation, though, you’ve come to the right product.

An Idris metaprogramming "hello world"

Sat, 10 Oct 2015 00:00:00 +0000

Idris is a programming language with dependent types. Like any civilized language, it has metaprogramming support; its REPL even supports interactively crafting metaprograms by employing tactic functions.

Let’s walk through a small example that covers the bases of interactive metaprogramming.

*This article is an elaboration of a gist I posted a few days ago.*

A Simple Function

Here’s a simple program that take a list of strings and returns either the first string in the list or the empty string when the list is empty:

module Head

firstStringOrEmpty : List String -> String
firstStringOrEmpty strings =
    case strings of
        Nil =>
            ""

        string :: _ =>
            string

This function pattern matches on the strings argument to see if it is the empty list Nil or a list with at least one item:

In the Nil case, it evaluates to the empty string "".
In the non-Nil case, the first item is bound to the new name string. It ignores the rest of the list by binding it to the nameless name _. It then evaluates to that first item, string.

Here’s what it looks like in practice by firing up the REPL:

> idris Head.idr
     ____    __     _
    /  _/___/ /____(_)____
    / // __  / ___/ / ___/     Version 0.9.19
  _/ // /_/ / /  / (__  )      http://www.idris-lang.org/
 /___/\__,_/_/  /_/____/       Type :? for help

Idris is free software with ABSOLUTELY NO WARRANTY.
For details type :warranty.
*Head> firstStringOrEmpty []
"" : String
*Head> firstStringOrEmpty ["a"]
"a" : String
*Head> firstStringOrEmpty ["a", "b", "c"]
"a" : String

(In future examples, I’ll be omitting the banner with --nobanner, but I figure it’s useful to see what version I was working with.)

Poking a Hole in It

Now that we know what we want the function to look like, let’s poke a hole in it!

Idris represents holes as names with a leading question mark, like ?hole. Let’s replace the Nil case with a hole:

module HoleyHead

firstStringOrEmpty : List String -> String
firstStringOrEmpty strings =
    case strings of
        Nil =>
            ?holeThatWasTheEmptyString

        string :: _ =>
            string

Now let’s see what the REPL has to say:

> idris --nobanner HoleyHead.idr
Type checking ./HoleyHead.idr
Holes: HoleyHead.holeThatWasTheEmptyString

Would you look at that! It found our hole.

Filling the Hole with Elaborator Reflection

We know what we want this to look like, since we started with a working program and then poked a hole in it, but let’s humor this example and pretend we need Idris’s help to figure out how to fill it in.

Import Elaboration Scripts

Start by pulling in the elaboration scripts provided by the system in the module Language.Reflection.Elab:

*HoleyHead> :module Language.Reflection.Elab
Holes: HoleyHead.holeThatWasTheEmptyString
*HoleyHead *Language/Reflection/Elab>

Notice how the prompt changed to reflect the added module.

Survey Your Tools

The good stuff in Language.Reflection.Elab lives in the Tactics module. Take a look at the fun stuff in there:

*HoleyHead *Language/Reflection/Elab> :browse Language.Reflection.Elab.Tactics
Namespaces:

Names:
  addInstance : TTName -> TTName -> Elab ()
  apply : Raw -> List (Bool, Int) -> Elab (List (TTName, TTName))
  attack : Elab ()
  check : Raw -> Elab (TT, TT)
  claim : TTName -> Raw -> Elab ()
… [list abbreviated to save you scrolling] …
  solve : Elab ()
  sourceLocation : Elab ()
  unfocus : TTName -> Elab ()
  whnf : TT -> Elab TT

These are the functions we’ll be using to fill the hole. Each of these has a docstring with more info about it. For example, here’s one we’ll use in a bit:

*HoleyHead *Language/Reflection/Elab> :doc intro'
Language.Reflection.Elab.Tactics.intro' : Elab ()
    Introduce a lambda binding around the current hole and focus on
    the body, using the name provided by the type of the hole.

    The function is Total
Holes: HoleyHead.holeThatWasTheEmptyString

Translated into plainer English, this says that intro' transforms a hole like ?hole into a function _ => ?newHole and points you at filling in the new hole.

(That line at the end of the docstring about “The function is Total” represents the judgment of Idris’s totality checker. A total function is one that terminates for all its inputs, which means it eventually returns a value for each and every one and cannot run forever.)

Gotcha

We’re taking a peek now, because unfortunately, we won’t be able to run :browse while we’re working on filling the hole, short of firing up a new REPL. We will be able to run :doc still, so we can just scroll back to this list as needed.

Enter the Elaborator Shell

Enter the interactive elaborator shell by issuing the :elab <HOLE> command:

*HoleyHead *Language/Reflection/Elab> :elab holeThatWasTheEmptyString


----------                 Goal:                  ----------
{hole0} : List String -> String
-HoleyHead.holeThatWasTheEmptyString>

When you first enter the elaboration shell, it prints the current elaboration state, with the goal – the term you’re trying to fill using elaboration scripts – shown below the line, and any other remaining goals shown above the line. You can review this state at any time by issuing the :state command. To get a list of available commands, run :help.

The elaborator shell has its own prompt, which tells you the hole you’re working on filling.

Reach Your Goals

Our current goal is hole0, and it has type List String -> String. This reflects that we currently have in scope – and available to our hole-filling efforts – a List String, in the form of the strings argument to our function, and that the type of whatever we compute using this context must end up as String.

In light of this List String -> String phrasing, what we had filling this space before was a constant function \strings => "" that ignored its arguments and always returned the empty string.

Since we don’t actually care about the name of the function argument, let’s use that intro' tactic function we saw earlier:

-HoleyHead.holeThatWasTheEmptyString> :state


----------                 Goal:                  ----------
{hole0} : List String -> String
-HoleyHead.holeThatWasTheEmptyString> intro'

----------              Assumptions:              ----------
----------                 Goal:                  ----------
{hole0} : String
-HoleyHead.holeThatWasTheEmptyString>

Filling a Hole

Cool, now all we need to do is provide a String to fill the hole with. What could we use to do that? Well…

-HoleyHead.holeThatWasTheEmptyString> :doc fill
Language.Reflection.Elab.Tactics.fill : Raw -> Elab ()
    Place a term into a hole, unifying its type

    The function is Total

That fill function looks like exactly what we need if we’re to put "" back in that hole. Maybe we can just fill "" directly?

-HoleyHead.holeThatWasTheEmptyString> fill ""
(input):1:6:When checking an application of function
Language.Reflection.Elab.Tactics.fill:
        Type mismatch between
                String (Type of "")
        and
                Raw (Expected type)

Welp, that didn’t quite do it.

Chasing Down Constructors

We need a way to turn the string "" into a Raw value. What’s a Raw?

-HoleyHead.holeThatWasTheEmptyString> :type Raw
FFI_C.Raw : Type -> Type
Language.Reflection.Raw : Type

Looks like there are a couple kinds of Raw around, but one is a Raw someType and the other is just a Raw. Plus, we’re not doing FFI, so let’s bet on the Language.Reflection.Raw. Let’s pull the docs on that:

-HoleyHead.holeThatWasTheEmptyString> :doc Language.Reflection.Raw
Data type Language.Reflection.Raw : Type
    Raw terms without types

Constructors:
    Var : TTName -> Raw
        Variables, global or local

    RBind : TTName -> Binder Raw -> Raw -> Raw
        Bind a variable

    RApp : Raw -> Raw -> Raw
        Application

    RType : Raw
        The type of types

    RUType : Universe -> Raw


    RForce : Raw -> Raw


    RConstant : Const -> Raw
        Embed a constant

"" is a string constant, so RConstant sounds like what we want.

In order to create an RConstant, we need to pass it a Const. What’s a Const?

-HoleyHead.holeThatWasTheEmptyString> :doc Const
Data type Language.Reflection.Const : Type
    Primitive constants

Constructors:
    I : Int -> Const


    BI : Integer -> Const


    Fl : Double -> Const


    Ch : Char -> Const


    Str : String -> Const


    B8 : Bits8 -> Const


    B16 : Bits16 -> Const


    B32 : Bits32 -> Const


    B64 : Bits64 -> Const


    AType : ArithTy -> Const


    StrType : Const


    VoidType : Const


    Forgot : Const


    WorldType : Const


    TheWorld : Const

Bingo! Str "" will give us a Const, and then we can pass that to RConstant to get a Raw, and then we can pass that to fill:

-HoleyHead.holeThatWasTheEmptyString> fill (RConstant (Str ""))

----------              Assumptions:              ----------
----------                 Goal:                  ----------
{hole0} : String =?= ""

Solve et Coagulum

We’re sitting with one last goal, to show that our guess "" really is a String. If you take a peek at the docs, you’d see that solve does just this:

-HoleyHead.holeThatWasTheEmptyString> :doc solve
Language.Reflection.Elab.Tactics.solve : Elab ()
    Substitute a guess into a hole.

    The function is Total

(You could also find solve by running :search Elab () – all our tactics return a value in Elab, we have no inputs to provide, and we just want to trigger the side effect of finishing this up – and looking through the few lines of output to find a likely candidate. But peeping at the docs is faster.)

So let’s use it:

-HoleyHead.holeThatWasTheEmptyString> solve
holeThatWasTheEmptyString: No more goals.

And victoriously exit the elaborator shell:

-HoleyHead.holeThatWasTheEmptyString> :qed
Proof completed!
HoleyHead.holeThatWasTheEmptyString = %runElab (do intro'
                                                   fill (RConstant (Str ""))
                                                   solve)
*HoleyHead *Language/Reflection/Elab>

Let’s give it a go:

*HoleyHead *Language/Reflection/Elab> firstStringOrEmpty []
"" : String

It worked!

Paste In the Script

When we left the shell with :qed, it wrote out a function definition:

-HoleyHead.holeThatWasTheEmptyString> :qed
Proof completed!
HoleyHead.holeThatWasTheEmptyString = %runElab (do intro'
                                                   fill (RConstant (Str ""))
                                                   solve)

Copy and paste this into the code, and it will fill the hole when the code is compiled, just like we did interactively at the REPL:

module HoleyHead

firstStringOrEmpty : List String -> String
firstStringOrEmpty strings =
    case strings of
        Nil =>
            ?holeThatWasTheEmptyString

        string :: _ =>
            string

HoleyHead.holeThatWasTheEmptyString = %runElab (do intro'
                                                   fill (RConstant (Str ""))
                                                   solve)

Fire this up in the REPL, and notice that there aren’t any unfilled holes:

> idris --nobanner HoleyHead.idr
Type checking ./HoleyHead.idr
HoleyHead.idr:12:37:
When checking right hand side of HoleyHead.holeThatWasTheEmptyString:
No such variable Language.Reflection.Elab.Elab
Holes: HoleyHead.holeThatWasTheEmptyString
*HoleyHead>

Oh, snap. Remember how we ran :module Language.Reflection.Elab before entering the elaboration shell? We also need to bring those names into scope in our file, too:

module HoleyHead

import Language.Reflection.Elab

firstStringOrEmpty : List String -> String
firstStringOrEmpty strings =
    case strings of
        Nil =>
            ?holeThatWasTheEmptyString

        string :: _ =>
            string

HoleyHead.holeThatWasTheEmptyString = %runElab (do intro'
                                                   fill (RConstant (Str ""))
                                                   solve)

You can either run the :edit command at the REPL and add the import line, or edit it in your editor and ask the REPL to :reload your current file, or just exit, edit, and then run idris --nobanner <FILE> once more.

Either way, you’ve just completed your hello world!

Summary

Idris allows you to interactively build a script that generates source code at compile time by solving for a goal represented by a hole in your source code.

You’ve seen how to:

Represent a hole in source code using a question-mark–prefixed name like ?hole.
Import Language.Reflection.Elab to make the system-provide tactics available to you.
Enter the elaborator shell using :elab <HOLE>.
Get documentation on tactics using :doc <NAME>.
Use intro' to focus on the body of a function without naming the function’s arugment.
Use fill to provide a guess for a hole’s value.
Use solve to instantiate the guess.
Conclude the session with :qed to get a proof script you can paste into your file to fill the hole in future.

Challenges

Some challenges, in order of increasing difficulty:

Fill the hole with the string “bananas” instead of “".
Name the function argument (check out intro rather than intro').
Replace the string result in the other case with a hole ?holeThatWasString, and interactively create an elaborator script to fill that hole.

For the More Curious: Quotation

Figuring out that whole RConstant (Str "")) bit was a pain. If you felt like Idris should have been able to do that work for you, you were right! Idris supports quoting syntax, which will seem awfully familiar if you’ve written any Lisp macros:

> :doc quote
Language.Reflection.quote : Quotable a t => a -> t
    Quote a particular element of a.

    Each equation should look something like
    quote (Foo x y) = `(Foo ~(quote x) ~(quote y))

    The function is Total

Instead of laboriously hunting down the various type constructors and then writing out fill (RConstant (Str "")), we can use quoting to do the hard work for us with:

fill (quote "")

Even sweeter, we can use Lisp-like quotation syntax to write this with even less typing as:

fill `("")

When is a proof not a proof?

Fri, 02 Oct 2015 15:19:59 +0000

When is a proof not a proof? When you think you’ve proven one thing, but actually, you’ve proven something else.

In the last post, I addressed a good question raised in the first half of David Owens II’s article “Dependent Types: I’m Missing Something”, where he addresses my discussion of dependent types.

This post looks at the latter half, which talks about an example drawn from early in the chapter “Introducing Inductive Types” of Adam Chlipala’s Certified Programming with Dependent Types.

This example is interesting in part because it is actually not an example of a dependent type, so it’s something we can talk about almost entirely in terms of Swift, absent the computer-assisted theorem proving that is Coq’s special sauce. Tackling theorems in Swift is a perfect fit for a generative testing framework.

Putting the example in context

The book has introduced a bool type as an example of defining an enumeration type. It then defines a function negb that performs Boolean negation to demonstrate function definition using pattern matching. The book then proceeds to prove a couple theorems about the negb function as part of teaching the reader how to do proofs using Coq.

Rewriting the example in Swift

bool is not a dependent type, so Swift and SwiftCheck are up to the task.

The book doesn’t get to dependent types till 5 chapters later. bool is an enumeration type, and negb relies on non-dependent pattern matching, all of which Swift can do. This means we can pretty directly translate the example definitions and theorem from Gallina into Swift:

enum Bool_ {
  case True
  case False
}

func negb(b: Bool_) -> Bool_ {
  switch b {
  case .True: return .False
  case .False: return .True
  }
}

/// Generative testing of the `negb_inverse` theorem's claim.
func testNegbInverse() {
  property("negb is its own inverse") <- forAll { (b: Bool_) in
    return negb(negb(b)) == b
  }
}

extension Bool_: Arbitrary {
  static var arbitrary : Gen<Bool_> {
    return Gen.oneOf([Gen.pure(.True), Gen.pure(.False)])
  }
}

I’ve translated the negb_inverse theorem into a SwiftCheck test and provided a generator of arbitrary Bool_ values for the test generator to use.

Running the test as-is gives:

Test Case '-[NegbTests.NegbTests testTheoremNegbInverse]' started.
*** Passed 100 tests
.

You can run this and see for yourself by cloning my Negb project repository.

Mistaken proof

David’s concern is that, “It’s up to the programmer to realize that we have actually not created all of the proofs required to prove correctness.” This concern arises from his thinking that the proof for negb_inverse goes through even if you alter negb so it always returns true.

The version of negb he defines that always returns true actually already fails at the negb_inverse theorem, without need to proceed to trying to prove negb_ineq.

You can see this in practice by pasting the bogus negb definition into coqtop and then trying to complete the proof of the theorem, but you can probably convince yourself without needing to do that simply by considering the false case.

The theorem’s claim is that negb(negb(b)) == b. In the false case, with the always-true negb, we get:

negb(negb(b)) == b  // the negb_inverse claim
--> consider the case where b = false
negb(negb(false)) == false
--> evaluate the inner negb application
negb(true) == false
--> evaluate the remaining negb application
true == false
--> contradiction! <--

If you flip the case .True: return .False to return .True in the Swift version of negb, you’ll see it quickly find this failing case using the test, as well:

Test Case '-[NegbTests.NegbTests testTheoremNegbInverse]' started.
*** Failed! Proposition: negb is its own inverse
Falsifiable (after 1 test):
False
*** Passed 0 tests
.

If you go the coqtop route, you’ll be able to discharge the true case with simpl. trivial., but using the simpl. tactic in the false case leaves you at the end step of our reasoning above, trying to prove that true = false.

If you’re in coqtop, you’ll also quickly notice that the proof David copied in for negb_inverse was incomplete. destruct. leaves you with two subgoals left to prove, corresponding to a version of the theorem with b instantiated at each case of the enum. You can complete the proof via reflexitivity. Qed., which the book gets to a half-page later after some discussion.

Proving the right thing and human misapprehension

A sound system won’t draw bogus conclusions, but that doesn’t mean we can’t misinterpret our own claims, whether they’ve been proven true or not.

David’s core concern about knowing you’ve proved what you want to prove is a real problem.

The problem doesn’t appear where he thought it did, in a theorem that he knew shouldn’t be provable but he thought still was.

The problem shows up in the same place that it did with the dependently typed flipped Boolean discussion, where the human is giving names and external meaning to the mathematical expression. In a sound system, the proofs we provide will be correct, but we can still misunderstand what the theorem we have just proved really means.

If we’re building on top of that proof, that misunderstanding will undoubtedly come to light as we try to use it to prove new claims. We just won’t be able to make a chain of reason from what we want to prove to what we did prove, because our new goal will depend on what we thought we proved, which was something different.

If what we thought we proved was the final achievement of our development, though, then we might not run into trouble till we try to actually use it to do something. Then we’ll see it breakdown at runtime, when it does what it in fact tells us on its face it does, which just so happens not to be what we read it as doing.

What if you get your dependent type backwards?

Fri, 02 Oct 2015 00:00:00 +0000

David Owens II read my notes on “Why Dependent Types Matter” and asked a good question:

This is the part I don’t get: we allow for incorrect code to be written in the non-dependent-typed case, but we assume that we can’t do the same with dependently-typed code? Why? What’s preventing me from swapping left and right in the Order type that is returned?

Review: Backwards Booleans

If we’re not using dependent types, it’s not hard to flip the sense of a test and end up with a result that means the opposite of what we intended due to a hiccup in mentally simulating our code.

The running example is something like is(left: UInt, lessThanOrEqualTo right: UInt) -> Bool.

A Bool is just a bit that can be manufactured any which way, so this simple test fails to provide evidence – by which I mean constructive proof – for its claim.

If you switch to compare(left: UInt, to right: UInt) -> NSComparisonResult, you get more than a bit of info back, but you still can mess up. (If anything, it’s easier, since you now have to get 2 tests right to discrimate between the 3 cases rather than the 1 needed to decide between true/false in the Bool version.)

How do dependent types prevent backwards Booleans?

Now let’s say we’re using the dependent type Order(left: UInt, right: UInt):

enum Order(left: UInt, right: UInt) -> Type {  /* #1 */
    case lessThanOrEqual(because: LEQ(x: left, y: right)):  /* #2 */
        Order(left, right)  /* #3 */

    case greaterThanOrEqual(because: LEQ(x: right, y: left)):  /* #4 */
        Order(left, right)  /* #5 */
}

We have 5 instances of using left/right together, labeled in comments at the end of each line.

What’s preventing me from swapping left and right in the Order type that is returned?

Well, let’s see! David mentions swapping the left/right in the Order type that’s returned, which corresponds to #3 and #5, so we’ll start there.

Dropping the Evidence

Let’s drop the evidence (the because bits) and swap left/right #3 and #5:

enum OrderNoEvidence(left: UInt, right: UInt) -> Type {  /* #1 */
    case lessThanOrEqual:  /* #2 - eliminated */
        OrderNoEvidence(right, left)  /* #3 - flipped */

    case greaterThanOrEqual:  /* #4 - eliminated */
        OrderNoEvidence(right, left)  /* #5 - flipped */
}

In this case, nothing would catch the inversion. left and right are both UInt, so there’s no harm done as far as the compiler is concerned.

There’s also no actual distinction in shape between the two cases; we can freely substitute one for the other anywhere in our code without the type system griping, just as with Bool or NSComparisonResult.

Unlike the non-dependent type, an instance of this type carries around information at the type level about the values whose comparison it represents, so you can recover at least which values were supposedly compared through pattern matching.*

But this doesn’t solve our flipped-Boolean problem.

Restoring the Evidence

Now let’s restore our evidence in its un-flipped form, while leaving the resulting type still flipped in #3 and #5:

enum OrderFlippedInstances(left: UInt, right: UInt) -> Type {  /* #1 */
    case lessThanOrEqual(because: LEQ(x: left, y: right)):  /* #2 */
        OrderFlippedInstances(right, left)  /* #3 - flipped */

    case greaterThanOrEqual(because: LEQ(x: right, y: left)):  /* #4 */
        OrderFlippedInstances(right, left)  /* #5 - flipped */
}

We’ve flipped #3 and #5 still, but the rest are back and not flipped.

Compared to the evidenceless version of the type, you can no longer gin up an instance of Order out of thin air. You once again have to provide an instance of LEQ:

enum LEQ(x: UInt, y: UInt): Type {
    case zeroLEQEverything:
        LEQ(0, y)

    case stepLEQ(LEQ(x, y)):
        LEQ(x + 1, y + 1)
}

Alas, even with LEQ completely correct, we can still define this version of Order. It will compile just fine.

This might confuse the user of this API, especially if the documentation says that Order(left, right) tells how left compares to right rather than how right compares to left.

Unlike in the evidence-less case, though, consumers of instances of this type can work out that it’s the wrong way around based on the because evidence: an instance like lessThanOrEqual(zeroLEQEverything: LEQ(0, 1)): OrderFlippedInstances(1, 0) hands the consumer a proof that LEQ(0, 1), and if they pattern-match that out and use it – as they likely would while producing evidence for the correctness of whatever they’re building atop this data – it’s merely frustrating that our documentation is backwards.

This “solves” the flipped Boolean problem, but no tool can solve the problem of misleading names. Misleading names provide bad input into our informal reasoning processes, and we’re likely to write bogus code as a result. If we’re programming with evidence, as dependent types let us do, we’ll catch this while interacting with the compiler; if we’re trusting the names, and ignoring the evidence, as dependent types also let us do (and non-dependent types force us to do), we likely won’t, absent testing.

Flipping All Pairs

If we flip all pairs, we get something identical to our original Order(left: UInt, right: UInt) type:

enum OrderMirrored(right: UInt, left: UInt) -> Type {  /* #1 */
    case lessThanOrEqual(because: LEQ(x: right, y: left)):  /* #2 */
        OrderMirrored(right, left)  /* #3 */

    case greaterThanOrEqual(because: LEQ(x: left, y: right)):  /* #4 */
        OrderMirrored(right, left)  /* #5 */
}

Aside from the backwards parameter names if you look at the definition or documentation, this works the same as the original Order. Even the case names are the right way around.

Providing the Wrong Evidence

If we accept the wrong evidence, by flipping around just #4, then we finally get real breakage:

enum OrderWrongEvidence(left: UInt, right: UInt) -> Type {  /* #1 */
    case lessThanOrEqual(because: LEQ(x: left, y: right)):  /* #2 */
        Order(left, right)  /* #3 */

    case greaterThanOrEqual(because: LEQ(x: left, y: right)):  /* #4 - flipped */
        Order(left, right)  /* #5 */
}

By flipping #4, we’ve ended up with two cases with different names that just say the same thing. This is clear because they have the same shape: each takes one argument of the same type, and each constructs an instance of the same type.

Again, if you’re leaning on that evidence, this will work itself out while you’re writing code, because you’ll reach for the expected y <= x proof when handling the greaterThanOrEqual case only to find it’s not there.

If the client code is more trusting and ignores the evidence, then you’ll get broken behavior, and you won’t find it till you test the code by running it.

Dependent types allow us to provide and demand evidence

This is the trade-off you’re able to make when using dependent types:

trust and test,
or demand proof.

Without dependent types, you don’t have any proof – you’re left to trusting and testing.

*Dependent pattern matching can bind values not just out of the instance, as normal pattern matching does with case let, but also out of the type signature.

Read: Why dependent types matter

Wed, 26 Aug 2015 00:00:00 +0000

Altenkirch, Thorsten, Conor McBride, and James McKinna. Why Dependent Types Matter. 2005. 19 pages (+2 more of references). Accessed 2015-08-26.

What is a dependent type?

A dependent type is a type that depends, not just on other types, but on terms. This means you can have values – plain old data – in your types. A common example is that of length-indexed lists:

/// A list parameterized by `ElementType` and indexed over its length.
/// This means that `Vector(length: 3, ElementType: Int)` is a different
/// *type* than Vector(length: 4, ElementType: Int)`.
///
/// You can read a type `Vector(n, T)` as "the type of vectors containing
/// precisely *n* Ts".
enum Vector(length: UInt, ElementType: Type) -> Type {
    /// Returns an empty list of inferred type `type`.
    case vnil():
        Vector(length: 0, ElementType: type)

    /// Prefixes an `element` whose type matches `vector`'s ElementType.
    ///
    /// NOTE: The type of the result vector differs from the type of the input
    /// vector by being one greater than the input.
    case vcons(element, vector: Vector(length: n, ElementType: type))
      where element is type:
        Vector(length: n + 1, ElementType: type)
}

You likely caught on that that code is Swift-ish but not actually Swift. That’s DTSwift*, where I pretend that Swift is dependently typed.

Types are terms are types

Notice that I’m writing the type constructor using normal function application with parentheses () rather than “generic type application” with angle brackets <>:

enum Vector(length: UInt, ElementType: Type) -> Type {
// NOT: enum Vector<length, ElementType> {
// The <> assume that everything in them is a type, which just ain't so
// in DTSwift.

As far as DTSwift is concerned, a type constructor is just a function from some terms/types to a type. We no longer need separate, parallel languages for working with types (like Int or String) vs working with terms in those types (like 3 or "perish the thought").

Why do we care?

Provably correct code where you want it, loosey-goosey where you don’t

Dependent types make it possible to be very specific about the terms in a type. We don’t have to be, but now we have the option of cranking up the specificity from “eh, whatever, gimme a list” to “this list has to be non-empty” to “this list has to have a prime number of elements”.

This means that dependent types give us pay-as-you-go provable correctness. You can require proof of the most critical properties in your program in the form of very specific types that ensure those invariants, while handwaving about components that are either less important or just more readily shown correct by test or inspection.

We can treat different data in different ways at compile-time

No need to wait to crash at runtime or during a test; you can push “make invalid states unrepresentable” to the max if you want.

Where Swift’s types let you down

Have you ever noticed how, if you write let order = x <= y ? .OrderedAscending : .OrderedDescending, you could swap x and y and still end up with a valid program? It’s like we learned nothing from that test we performed!

As far as normal Swift’s type system is concerned, any T is as good as any other; there’s “no means to express the way that different data mean different things, and should be treated accordingly in different ways” (p. 11). We can always just swap around the result of an if/else expression ?:, and our compiler will happily accept our backwards logic.

While we can write:

func compare(x: UInt, y: UInt) -> NSComparisonResult {
    return (x <= y) ? .OrderedAscending : .OrderedDescending
}

This would be just as correct as far as the compiler is concerned as either of these definitions:

func compare(x: UInt, y: UInt) -> NSComparisonResult {
    // BACKWARDS! IT LIES!
    // (And how many times have you accidentally flipped these?)
    return (x <= y) ? .OrderedDescending : .OrderedAscending
}

func compare(x: UInt, y: UInt) -> UInt {
    // I hope you weren't actually planning on relying on this function.
    return .OrderedAscending
}

Aside: Yes, this is a toy example

This is a toy example. I’m sure you can quickly figure out when an if-then-else has the branches swapped. Refining your code to use more specific types grows more valuable as the complexity of its behavior grows, and this is not a terribly complex example!

The effort/reward calculus changes when it comes to library code, though. Specific types in library code can save a lot of collective confusion and wasted time. Think of some Apple API you’ve looked at and gone, “Gee, I wonder what this method does when it runs into this corner case?” If the types clearly said what the method did, you wouldn’t be left scratching your head.

As a side benefit, specific types constrain your code to where the compiler can provide non-trivial help in writing the code. There’s a section on this in the paper, or just give a dependently-typed language a spin.

If you need to see some less toy examples to be convinced, check out Oury and Swierstra’s The Power of Pi (2008) and Fowler and Brady’s Dependent Types for Safe and Secure Web Programming (2013).

How dependent types fix this

Dependent types let you replace a Boolean that throws away the knowledge gained from a test by a type that represents the outcome of that test.

In DTSwift, we have the option of shifting from operator <=(left: UInt, _ right: UInt) -> Bool to a LEQ type that must provide evidence for how left and right in particular are related:

/// A type expressing that `(x <= y)` via proof by induction.
enum LEQ(x: UInt, y: UInt): Type {
    /// Base case: 0 <= any other UInt.
    case zeroLEQEverything:
        LEQ(0, y)

    /// Induction: If we know x <= y, then we also know x+1 <= y+1.
    case stepLEQ(LEQ(x, y)):
        LEQ(x + 1, y + 1)
}

We can then move from a “blind” NSComparisonResult to an Order type that expresses how two values are related and includes proof that this is indeed the case:

/// A type expressing the relationship between `left` and `right`.
/// Each case is associated with a term witnessing to the relationship
/// between `left` and `right`.
///
/// See: `LEQ(left, right)` defined just after this.
enum Order(left: UInt, right: UInt) -> Type {
    /// Represents that `(left <= right)`.
    /// Witnessed by a proof that `left <= right`.
    case lessThanOrEqual(because: LEQ(x: left, y: right)):
        Order(left, right)

    /// Represents that `(left >= right)`.
    /// Witnessed by a proof that `right <= left`.
    case greaterThanOrEqual(because: LEQ(x: right, y: left)):
        Order(left, right)
}

This burden of proof would rule out both of our bogus definitions of compare above:

In the flipped case, we would find we couldn’t prove an untruth.
In the constant case, we couldn’t generate any proof by flat-out ignoring our inputs. We have to examine them and work with them to arrive at a proof of their relationship, and the type ensures it is indeed the relationship between those specific values.

We can now write a type-correct version with these new types, one whose return type captures the relationship between the two input terms:

func compare(left: UInt, _ right: UInt) -> Order(left, right) {
    switch left {
    case 0:
        return lessThanOrEqual(because: zeroLEQEverything)

    case let xMinus1 = left - 1:
        switch right {
        case 0:
            return greaterThanOrEqual(because: zeroLEQEverything)
        case let yMinus1 = right - 1:

            switch compare(left: xMinus1, right: yMinus1) {
            case lessThanOrEqual(because: evidence):
                // We know now that (x - 1) <= (y - 1), so we can prove
                // by adding 1 to each side that x <= y.
                return lessThanOrEqual(because: stepLEQ(evidence))

            case greaterThanOrEqual(because: evidence):
                // Ditto, but the evidence will show that (y - 1) <= (x - 1)
                // so we can prove y <= x AKA x >= y.
                return greaterThanOrEqual(because: stepLEQ(evidence))

            }  // switch on recursive call
        }  // switch on right
    }  // switch on left
}

Summary of the Example

We started with a compare function that did the right thing, but it didn’t say so – it just said “I’ll give you back a UInt, one’s as good as another, right?” (paraphrasing p. 12).

Because our starting version of compare didn’t say what it did via its types, it could also maybe not do the right thing; you could test it, but for all you know, some perverse developer decided to return the wrong answer for exactly one combination of unsigned integers.

We ended up with a compare that had no choice but to do the right thing, because if it didn’t, it would no longer typecheck.

So Much More

I encourage you to read this paper. It’s a great introduction to programming with dependent types, and it covers a number of other topics while demonstrating the topic by refining the implementation of merge-sort throughout.

You’ll meet:

Totality, and why you should care
How you can do real work without general recursion, or, how Turing completeness has caused you generally avoidable pain
- I’d write about these now, but they’ll come up in more detail when I write up my notes on Turner’s “Total Functional Programming”.
Interactive type-directed editing, where you cooperate with the compiler to write your program, since very specific types lead to very specific shapes of programs working with those types
- Faking up a DTSwift example of this would be a blog post in itself.
Approximations to dependent types that have shown up already in some not quite (not yet?) dependently-typed languages
- Spoiler: Swift isn’t one of them.

…what we have tried to demonstrate here is that the distinctions term/type, dynamic/static, explicit/inferred are no longer naturally aligned to each other in a type system which recognizes the relationships between values. We have decoupled these dichotomies and found a language which enables us to explore the continuum of pragmatism and precision and find new sweet spots within it. Of course this continuum also contains opportunities for remarkable ugliness and convolution – one can never legislate against bad design – but that is no reason to toss away its opportunities. Often, by bringing out the ideas which lie behind good designs, by expressing the things which matter, dependent types make data and programs fit better. (p. 21)

*Since I don’t have a formal spec or implementation for this DTSwift fantasy language, it risks confusing things more than using an actual dependently-typed language, but I’m willing to take that risk to make this topic more approachable for people coming from Swiftland (population: 1 Gulliver).

Read: A tutorial on the universality and expressiveness of fold

Tue, 18 Aug 2015 00:00:00 +0000

Hutton, Graham. A tutorial on the universality and expressiveness of fold. 1999. 16 pages (+ 2 more of references). Accessed 2015-08-18.

Fold (reduce, inject, cata) lets you take a list and swap out its constructors for a combining function and base value of your own devising, like swizzling in + for cons/: and 0 for nil/[]:

[1 ,  2 ,  3]
== EXPAND THE SUGAR ==>
(1 : (2 : (3 : [])))
== FOLD (+) 0 ==>
(1 + (2 + (3 + 0 )))

Notice how all the : were rewritten into the provided function ((+)), and [] was rewritten into the provided value (0). Also notice how everything groups to the right; this fold operator is consequently sometimes called foldr, since it folds to the right.

Universality

You can write fold as a recursive function:

fold f v [] = v
fold f v (x : xs) = x `f` (fold f v xs)

or:

let g = fold f v in
  (* Equation 1: *) g [] = v
  (* Equation 2: *) g (x : xs) = f x (g xs)

If you can mash a function g into being defined by those two equations, then you’ve solved for f and v such that g = fold f v, and you can write g without needing to explicitly recurse.

The Universal Property of Fold

The universal property of fold is that you get a bi-implication between the two equations and g = fold f v:

substitute in fold f v for g, and you’ve defined fold;
prove the two equations, and you’ve got the makings of a proof by induction that your function is simply a fold.

“Taken as a whole, the universal property states that for finite lists the function fold f v is not just a solution to its defining equations, but in fact the unique solution” (p. 358 in the journal, which is p. 4 in the PDF).

Substitution: An aide-mémoire

The substitution bit makes it easy to remember the universal property: Write the obvious recursive definition of fold, then substitute away the fold f v.

Proofs and Definitions

The other bit is the part you’ll actually use. It lets you prove that a function is equivalent to a fold. To do so, you check that your function gives the same result in both the base case (equation 1) and the step case (equation 2). (For examples, check the section “Universality as a proof principle”.)

One case where this is handy is when it’s obvious what the fold does and less obvious what the other function does.

An arguably more useful case is when you have 2 different functions you want to prove equal. Rather than writing out an induction proof longhand, you can rewrite them both as folds; if the two folds are identical, then they’re equal.

It’s also useful when you want to rewrite a function you’ve written using explicit recursion to instead be a fold. In this case, you end up setting up an equality between some function you hope is a fold and those equations, then solving for v and then f using equational reasoning. (For examples, check the section “Universality as a definition principle”.)

Fusion: Pushing function composition inside the fold

You can also use the universal property to derive the fusion property of fold, which gives the requirements to be met to push a function applied to a fold into the fold itself: h . fold g w => fold fusedFunction fusedValue. (There’s a subsection on that, too.)

Fusion as optimization

Fusion plays a small role in this paper, but it’s very useful for practical reasons of program optimization. There’s a good chunk of machinery in some functional language compilers dedicated to automatically fusing (“deforesting”) functions, since this reduces the overhead of communicating between parts of a program via intermediate values: with fusion, the values get consumed as they’re produced, rather than accumulated into a data structure and then handed off for the next step of processing.

Expressiveness

Tuples give you primitive recursion

Generating tuples lets you pull a fun trick where you rebuild the entire input in the second part of a 2-tuple and then throw it away. This is useful because it makes the original input available to the function you’re folding.

This lets you define dropWhile p xs as a fold.
This lets you define a generic primitive recursive function as a fold.
- Primitive recursion: fold is as powerful as any program you can write that has a bounded number of repetitions in any loop.

First-class functions give you general recursion

Generating functions pushes the expressiveness from primitive recursion to general recursion, as demonstrated by writing the Ackermann function (evidently the standard example of a non-primitive recursive function) as a fold (using lists of length n to stand in for the natural number n).

General recursion: fold is as powerful as any program you can write. It can perform unbounded iteration. Enjoy your halting problem.

First-class functions give you foldl via foldr

Generating functions also lets you write foldl – the function that turns [1, 2, 3] into (((0 + 1) + 2) + 3), i.e., which folds everything down to the left – using fold. Turns out the order fold processes the list in comes down to the function it’s provided.

Foldr is more powerful than foldl because strictness

You can’t write fold in terms of foldl because foldl is strict in the tail of its list. What that means in practice is that foldr (\a b -> a) 0 [0..] evaluates to 0 lickety-split, but foldl (\b a -> a) 0 [0..] runs forever (“diverges”).

Questions I hesitate to call “frequently asked”

I don’t know that these are frequently asked, but they’re definitely questions I asked myself at some point in the past.

Why is it called “fold”?

I explain calling it a “fold” or “reduction” to myself by thinking of the example that started this post, which collapses (folds up) a list of values to a single value. In that case, the list [1, 2, 3] collapses down to the single value 6.

This explanation makes less sense when you’re building a structured value, like say reversing a list, or doubling each element, so that [1, 2, 3] goes to [1, 1, 2, 2, 3, 3], but let’s just ignore that unfortunate fact.

What’s up with the argument order for f in foldr vs foldl?

The arguments to the function provided to a fold reflect the fold’s left/right bias.

Say you want to fold a list of strings down to the sum of each string’s length.

If you use foldr, then it has a right-to-left bias, so you need the function \ string sum -> (length string) + sum. With foldr, the base case value appears on the right, and the iteration step value on the left.

foldl's bias is for left-to-right, so its function has the base case value appearing on the left and the iteration step value on the right. With foldl, you’d need the function \ sum string -> (length string) + sum.

This has the fun result that the order of arguments to the function is flipped depending on if you’re dealing with a foldl or a foldr, as you can see with these side-by-side:

sumLengthsRTL :: [String] -> Int
sumLengthsRTL = foldr (\ string sum -> (length string) + sum) 0

sumLengthsLTR :: [String] -> Int
sumLengthsLTR = foldl (\ sum string -> (length string) + sum) 0

Read: Towards native higher-order remote procedure calls

Mon, 10 Aug 2015 00:00:00 +0000

I do a lot of reading on my iPhone 6+. It’s amazing what you can accomplish reading only a couple pages a day. I write notes in either Day One or Logsit, depending on how long the work is. They don’t do anyone much good buried in there, so I’m sharing my notes.

Fredrikson, Olle, Dan R. Ghica, and Bertram Wheen. “Towards native higher-order remote procedure calls.” IFL 2014. 12 pages.

Introduces Floskel, a Haskell-like call-by-value language with ADTs and location-aware expressions (expr @ nodeN). Compiles down to abstract machine byte code. Authors start with a CES machine (SECD without the dump) and then step gradually through adding a heap (CESH) then adding distributed communication (DCESH), first in a trivial single-node case and then in the general multiple-node, actually distributed case. They finish up with an async single-threaded network of machines with fail-stop fault tolerance via backups.

Each step from one machine to the next advances by a bisimulation proof formalized in Agda. It’s interesting to note that the details of the compilation from the surface syntax down to the bytecode are not formalized; in fact, they’re considered so well-known and commonplace that they’re entirely glossed over.

The compilation assumes a fixed number of nodes. All code is also known in advance. Since all nodes have the same code, each can call a remote function on another node just by sending a pointer to the function. A node also knows its node number. (They implemented this by compiling to C and using MPI.)

Benchmarks show it does well next to OCaml for the single-node case, and RPC overhead doesn’t hurt much.

New things I learned:

The two-level operational semantics for the network was new to me, though apparently it’s bog standard in the literature:

Both kinds of networks [synchronous and asynchronous] are modelled by two-level transition systems, which is common in operational semantics for concurrent and parallel languages. A global level describes the transitions of the system as a whole, and a local level the local transitions of the nodes in the system. (p. 6, “3.3 Network models”)

This makes sense - it lets you glue together a bunch of independent machines with some over-arching system behavior, and that’s roughly how you’d expect to naïvely approach describing how a network of independent machines acts. But that’s not something I’d ever tried to do formally, so, new to me!
They modeled the sync case via rendezvous, where both nodes must be ready to send/receive before the communication happens.
They modeled the async case as having a pool of in-flight messages; receiving a message corresponded to fishing one out of the “message soup”.
Lambda lifting has some concrete meaning to me now.

For compilation, we require that the terms t in all location specification sub-terms t @ i are closed. Terms where this does not hold are transformed automatically using lambda lifting [25] (transform every sub-term t @ i to tʹ = ((λ fv. t) @ i) (fv t)). (p. 7 “3.5 DCESH: The distributed CESH machine”)

Put into words, if you have an expression with free variables, lambda lifting is when you convert the expression to a function where all the formerly free variables are now arguments to the function. The function no longer has free variables, and implicit binding has been replaced by explicit binding by way of function application.
I also learned what a bisimulation is: think “isomorphism for labelled transition systems”.
- A subtle point made on Wikipedia is that a bisimulation is different than just being able to establish different simulations in both directions (x can sim y and y can sim x).
- I cribbed from Wikipedia for an intro to transition systems, too. A labelled transition system itself is a generalization of a finite state machine where the states aren’t necessarily countable, the transitions aren’t necessarily countable, and there’s no notion of initial or accepting states.
  
  You can view it as an abstract rewriting system (like how you’d graph out reduction of a lambda calculus expression), only the focus is on the transition labels (interpreted as actions/events) rather than the objects at either end of the transitions (like the classic “do we reach a normal form” question).

Unfortunately, I’d forgotten what some of those things referred to. iBooks’ delightful sync erased this paper from my phone’s library, along with my bookmarks, at some point. Lucky thing this is such a short paper that I was able to locate the context easily, but I take it I should be quicker to write up my notes in future!

Listened: SE Radio: Joe Armstrong

Sun, 02 Aug 2015 00:00:00 +0000

I listened today to an interview with Joe Armstrong recorded in late 2007. Joe Armstrong created Erlang. This interview took place right as interest in Erlang was rising due to the growth in generally available multicore machines.

In case you’d like to give it a listen as well, this was Software Engineering Radio episode 89.

My takeaways:

Concurrency and fault tolerance are the heart of Erlang; the two forces together drove it to the actor model. Its functional nature is an accident of its birth as a modified Prolog.
OTP provides a wealth of tools and ready-to-go patterns for building fault-tolerant systems.
Erlang happened, not while looking to gin up a new programming language, but while trying to solve a concrete problem with interesting constraints: How can we make it easier to write telephone switching software that can run for ages and never go down?

From memory, here are my more comprehensive (and correspondingly more rambling) notes:

Reddit gave the impression at the time that Haskell and Erlang were fighting for the future. Eight years later, and the future still isn’t quite here, or Reddit was as poor a predictor of general behavior then as now. ;)
Erlang arose almost accidentally out of solving problems inherent in plain old telephone service (POTS) switching:
- You have a ton of different people connected at once, all doing their own thing. Having lightweight processes makes this easy to model.
- You can’t take the system down. Ever. So you need hot-swapping.
- You can’t let the system go down. Ever. So you need fault-tolerance.
Erlang came out of modifying Prolog to have lightweight processes. This was alongside a number of other researchers implementing POTS switching logic in whatever languages they could get their hands on to run on the random VAX UNIX machine they had in the office: Ada, Concurrent Euclid, Smalltalk, ML, C, and Armstrong with Prolog.
The driving forces behind Erlang were concurrency and fault tolerance. Its actor approach to concurrency was intentional; the functional nature of sequential Erlang, though, is an accident of growing up inside a Prolog (where you can’t unify a term first to one thing and then to something unequal) rather than inside C. Nothing in the actor model would prohibit an actor performing local mutation of bindings.
- Armstrong points out that assign-once variables are very handy for debugging, though: If you find a bogus value in a variable, then there’s exactly one place in your program where it was bound to that bogus value, vs one of several places where it might have been updated if multiple assignments to the same name were allowed.
The “let it crash” ethos is a response to specifications’ refusal to spec anything but the happy path. Instead of ad-libbing some rubbish error handling that dwarfs the code you need to write, and likely introduces some errors of its own, you just don’t even bother handling junk input. The worker process dies, its supervisor respawns it, and a report of its abend is written to an error log.
- You can later review the error log to decide if you want to introduce logic to handle an error case that keeps cropping up, or if it’s so rare that abending and being reborn is still the right approach.
The ability to actually distribute Erlang across machines came very late. Everyone involved had the sense it would be pretty easy based on how the rest of the system had been implemented, so no-one actually bothered getting around to doing it for a long time.
Erlang uses its host OS to manage files and sockets and fork over some memory for the Erlang runtime to manage.
- Erlang’s runtime system is basically an application container or operating system in and of itself.
- OTP is a framework (like Rails), but rather than being a framework for writing web sites, it’s a framework for writing fault-tolerant systems.
  - Notice how “fault tolerance” keeps coming to the fore!
Armstrong contrasts shared memory concurrency (a headache) to message passing concurrency (not so bad). He sees Erlang’s greatest selling point as being easy, painless concurrency.
- Software transactional memory doesn’t enter into this discussion. Was that even a thing in late 2007? Yeah, in Haskell-land, Harris et al. introduced “Composable Memory Transactions” in PPoPP’05, with a couple other papers following in 2006 per the Haskell wiki.)
- The message passing approach also arises out of the fault tolerance requirements. If machine A crashes, machine B has to have a copy of all needed data to continue, so A needs to send over a copy rather than just a pointer to its own memory.
Armstrong reckons it unlikely that Akka can get as good of performance as Erlang, as it is unable to alter its abstract machine language to make the core actor model implementation primitives actually performant primitives, vs emulating these using an ill-suited abstract machine:
- Switch context from one actor to another
- Send a message
- Spawn a new process
Armstrong’s last remarks were on Erlang’s bit-matching DSL. He also asks a good question: Why do regex always work at the byte, rather than bit, level? Why are there no good bit-level regex libraries?
- I suppose the answer is likely that, unless you are doing systems programming, you don’t much need them, and systems programmers raised on C are unlikely to turn to a regex library to do bit-smashing.

Using rbenv with fish

Tue, 28 Jul 2015 00:00:00 +0000

I switched from zsh to fish shell a month or so ago. I lost bang-history (no more !?gi) and gained a shell small enough to understand and write scripts for without fearing I’m going to step into some gotcha from the 1970s. No more shell-as-quirks-mode!

There’s a downside to shifting to a non-POSIX shell, though: scripts intended to modify the shell environment itself no longer Just Work.

This tripped me up in one case: rbenv, the Ruby environment and version manager.

rbenv expects you to run the output of rbenv init in your shell. This fixes up your PATH, rebuilds rbenv's sense of the world, and lastly redefines rbenv as a dispatching function. rbenv provides a few different flavors of script, but none is for fish.

No problem! Let’s rewrite this script for fish.

When you run rbenv init, it dumps out a call to eval:

# Load rbenv automatically by adding
# the following to your profile:

eval "$(rbenv init -)"

When you run that bit of code, you see something like:

export PATH="/Users/jeremy/.rbenv/shims:${PATH}"
rbenv rehash 2>/dev/null
rbenv() {
  typeset command
  command="$1"
  if [ "$#" -gt 0 ]; then
    shift
  fi

  case "$command" in
  rehash|shell)
    eval `rbenv "sh-$command" "$@"`;;
  *)
    command rbenv "$command" "$@";;
  esac
}

Translating this to fish is a good introduction to scripting fish. Pop open fish help in a browser tab, and lean on functions to look at how the functions provided with the shell are coded.

With a bit of that, I ended up with:

set PATH "$HOME/.rbenv/shims" $PATH
rbenv rehash ^/dev/null
function rbenv
    set -l command $argv[1]
    if test (count $argv) -gt 1
        set argv $argv[2..-1]
    end

    switch "$command"
        case rehash shell
            eval (rbenv "sh-$command" $argv)
        case '*'
            command rbenv "$command" $argv
    end
end

I bet there’s a fishier way to do this, but it’s working fine for me. If you’ve been considering adopting fish as your shell but ran into rbenv as a blocker, this should get you past that. Enjoy!

Dodging State

Wed, 15 Jul 2015 00:00:00 +0000

Soroush Khanlou gives some concrete tips for reducing the amount of state in your code. I’m going to rephrase his advice into terms that connect better with other reading I’ve been doing:

When you recognize cohesive groups of properties, bud off a new object.
Prefer domain types to primitive types.
Take advantage of computed properties to express derived state; reintroduce caching only when unavoidable.

Budding Off

Sometimes you notice you’re manipulating the same bits of state together, say properties dollars and cents of a Product. Well, viewed another way, maybe you’re actually just inlining the methods of a latent type; dollars + cents might be the answer to the total message of a not yet existent Price class.

You don’t have to end up here by an act of deep vision; you can work you way up to it by way of doing some extract method refactorings in extant classes, then noting that you have a coherent group of methods and state, and then just bundling those up and schlepping them off to a new class. Budding complete!

Domain Types

Whatever your code is doing, it’s very likely not natural to talk about at the level of bools and ints. You might instead have something that’s on or off, or red or green; something that’s big, medium, or small, or maybe you have some money rather than just a decimal value.

Using types native to the domain of discourse that you’re modeling and automating raises the level of abstraction of your code. It also raises the profile of these ideas in your language. If you fail to name these types, you risk ending up with bits of functionality relating to them scattered across the code that needs to work with it.

Naming primitives – even if to start with it’s little more than a wrapper around a single primitive value, like struct Name { let fullName: String } – makes the the meaning of the primitive value explicit and provides a foothold from which you can drag further functionality into existence in an appropriate context.

Derived State

You can split program state into two classes:

Essential state: Information that you have to have.
Derived state: Information that can be computed from other information you have.

In theory, you should be able to get by holding onto only the essential state and computing the rest as needed via functions/methods/computed properties. In practice, you might need to throw a caching layer (like a stored property) in there more often than you’d like, but let your profiler be your guide.

(What about the stuff in your program that’s not state? For a further teardown, you should check out Mosely & Marks’ “Out of the Tar Pit”, or at least skim Adrian Colyer’s survey of it.)

Conclusion

Soroush draws these conclusions as a result of approaching code from a functional programming standpoint, but that’s an accidental path dependency. Object-oriented design principles, as elaborated for example in Practical Object-Oriented Design in Ruby, provide plenty of motivation to reduce state and better name and organize what’s left of it.

Now, go check out Soroush’s article, particularly the cool bits about state machines – then wonder how you might use discriminated unions (AKA enums) in Swift to tackle the same problem.

Should DRY entail call-by-need evaluation?

Tue, 14 Jul 2015 00:00:00 +0000

Swift brings Cocoa devs a standard library that includes map, filter, and reduce. But in a strict language such as Swift, you’ll likely find yourself hand-fusing a composition of these operations at some point as a result of profiling-directed performance optimization.

This isn’t new to Swift. Lennart Augustsson observed in 2011 in More points for lazy evaluation:

Strict evaluation is fundamentally flawed for function reuse.

That’s because:

With strict evaluation you can no longer with a straight face tell people: don’t use recursion, reuse the recursion patterns in map, filter, foldr, etc. It simply doesn’t work (in general).

The strict evaluation order means naïve combination of functions will do more work than needed to produce a value: for example, in the case of any, which should short-circuit, reduce must still traverse its entire input.

Using macros doesn’t really save us this time, because of the recursive definitions. I don’t really know of any way to fix this problem short of making all (most?) functions lazy, because the problem is pervasive. I.e., in the example it would not be enough to fix foldr [AKA reduce]; all the functions involved need to be lazy to get the desired semantics.

Strict evaluation gets us a simple space-usage model, but it leaves us holding the bag when it comes to function composition.

CRC Cards

Fri, 10 Jul 2015 00:00:00 +0000

File under “old technique, still useful”: CRC cards

The core elements of an object system are an object, its responsibilities, and its collaborators. With just that info, you can start walking through scenarios.

You’ll note this doesn’t say anything about inheritance or protoypal delegation or whatever; that’s intentional:

The cards are being used as props to aid the telling of a story of computation. The cards allow its telling without recourse to programming language syntax or idiom.

Being able to pick up and hold a card, and thereby signal to yourself that you’ve become that actor, helps with getting in the object thinking groove:

We were surprised at the value of physically moving the cards around. When learners pick up an object they seem to more readily identify with it, and are prepared to deal with the remainder of the design from its perspective. It is the value of this physical interaction that has led us to resist a computerization of the cards.

In the modern HeaderDocful world, it’s useful to write the responsibilities and collaborators bits right into the class-level doc comment.

The “collaborators” bit is especially valuable for hinting at dynamic runtime context that can be hard to infer from static source text. The same info is handy in method docs (“called by X when Y happens”).

Using Swift Throws with Completion Callbacks

Wed, 17 Jun 2015 00:00:00 +0000

Swift 2 introduced the notion of throwing and propagating NSError values.

It works pretty well in a linear, synchronous workflow, but at first glance, it doesn’t appear to address the common case of completion callbacks.

Consider NSURLSession.dataTaskWithURL(_:completionHandler:). Swift 2 bridges this in like so:

func dataTaskWithURL(url: NSURL,
    completionHandler: (NSData?, NSURLResponse?, NSError?) -> Void)
    -> NSURLSessionDataTask?

Note how, in the completion handler closure, you still have to do Ye Olde Check Data Then Check Error dance. Yawn.

There’s a straightforward way to transform this into throws-land, though. Just think: What sort of thing can throw? A function call.

So, let’s use our functions, and rewrite this to:

typealias DataTaskResult = () throws -> (NSData, NSURLResponse)
func dataTaskWithURL(url: NSURL,
    completionHandler: DataTaskResult -> Void)
    -> NSURLSessionDataTask?

The completion handler is not marked as @rethrows, so it has to handle any error. Extracting the result or error is then done in the completion handler like so:

{ result: DataTaskResult in
    do {
        let data, response = try result()
        /* work with data and response */
    } catch {
        /* you got yourself an error! */
    }
}

This straightforward transformation preserves Swift 2’s directing attitude towards error-handling, while freeing users from having to remember the protocol for working with NSErrors.

It’s unfortunate we can’t ourselves apply this to Apple’s code. We’ll just have to continue to type through the error-prone, manual procedure for working with NSErrors when working with their APIs.

We needn’t continue to do so with our own, though: if you’re going to adopt throws, go whole-hog, and throwify your entire API, both synchronous and asynchronous.

Functor & Friends: Protocol + Tests

Sun, 08 Mar 2015 00:00:00 +0000

I’ve read articles that try to reduce the academic flim-flammery of functors, monads, and similar to concrete syntax by just presenting them as a series of interfaces, or protocols, that must be implemented.

This is reassuring: It turns something unfamiliar into something familiar, if not downright mundane. Unfortunately, reducing these abstractions to protocols alone oversimplifies them and reduces their practical utility tremendously.

What gives functional programming abstractions their oomph is the properties satisfied by the abstraction, not the specific API. These properties provide essential guarantees about the implementation of that API. Using these properties lets us reason about our code without getting bogged down in details of the implementation: The dream of meaningful abstraction lives on in FP.

Protocols alone are not powerful enough to specify this. Fortunately, there is a way for object-oriented programmers to return the foreign concept of “mathematical abstraction” to a comfortable familiarity without losing reasoning power. This is by recasting the abstraction in terms of protocols AND tests. The tests specify properties that implementations of those protocols must satisfy in order to truly conform to the protocol.

Together, protocols and tests capture the essence of functional abstractions in a way that an OO programmer can immediately be productive with.

(It’s an interesting observation that, as in TDD, these tests have more value than the protocols themselves.)

Example: Functor

You might see Functor explained like this:

Functor is a parameterized type with a map function, where
The map function takes an instance of the type and applies a function from its parameterized type to another instance with a potentially different type parameter.

Or, more concisely, in pseudo-Swift:

map(container: Container<Type>, function: Type -> PotentiallyOtherType)
  -> Container<PotentiallyOtherType>

The protocol fails us: permuting map

A protocol captures only the superficial elements of Functorhood. For example, this implementation of map on an array satisfies the protocol:

func map<T, U>(a: [T], f: T -> U) -> [U] {
    let indexes = allIndexes(a)
    let permutedIndexes = permute(indexes)
    var b: [U] = []
    for i in permutedIndexes {
        let element = a[i]
        let mappedValue = f(element)
        b.append(mappedValue)
    }
    return b
}

But it fails to be a Functor.

Try it for yourself: Mapping the identity function repeatedly (map([1, 2, 3, 4], { $0 })) can give results that differ from the input array. (For the identity function, those results will all be permutations of the input array.)

That makes for one frustrating faux-functor!

Tests to the rescue!

This is why it is not enough to have a function with a certain type signature. The heart of the beast is the set of sanity-preserving properties it’s required to conform to.

These properties go by the name of the functor laws; you’ll find equivalent laws for monads and similar abstractions. It’s these laws that make the abstraction meaningful and tractable. Preserving these laws makes the interface a true abstraction, rather than (in the case of functor) simply a generalization of a common imperative programming pattern.

Specifically, a functor must preserve identity and composition:

Identity Preservation: XCTAssertEqual(id(x), x.map(id))
Composition Preservation: XCTAssertEqual(x.map(f).map(g), x.map({ g(f($0)) }))

(Unlike the protocol above, here I’m using method syntax, because that composes a bit more readably.)

The XCTest macros aren’t quite up to the task of proving general properties. With something like SwiftCheck, we can get close, though.

Conclusion

To reiterate:

It’s not the protocol that makes the functor;
It’s not “you can write an [f]map function”;
Instead, it is specific properties exhibited by the compound entity of type + functions.
Concretely: It is protocol + tests.

No Single Swift Style

Fri, 24 Oct 2014 00:00:00 +0000

“Swift is too young for us to say what good Swift style is, or to have developed a sense for idiomatic Swift.”

Talks, blogs, and books repeat this sentiment or variations.

This statement assumes that there will eventually develop a single, canonical Swift style.

This is a mistake.

I expect no single style will develop; instead, several different styles will flourish. Perhaps most developers will be conversant with multiple styles. More likely, these styles will develop into several mutually unintelligible dialects.

What might these styles be?

Obj-C with different syntax: Existing frameworks exert a strong pull in this direction. Frankly, it feels awkward to me in Swift, but I expect a lot of code to be written this way.
OOP: A Swift-flavored OOP dialect will likely emerge from the ashes of Obj-C. Many are comfortable with OOP, and they’ll stick with it, using Swift as their new, less brackety vehicle.
Generic Programming: Swift’s small standard library exemplifies this, though it compromises in places by having both a generic function and an instance method, rather than picking a single one. I expect the latter is often there to assist discoverability.
Functional Programming: There has been a lot of interest in the dataflow aspects of functional programming across computing lately. Swift makes no exception.

In addition, Swift’s type system makes it possible to borrow at least some of the approaches pioneered by the ML family of languages, though its lack of support for higher-kinded types and incomplete support for variably-sized enum types can make this awkward to express. With a solid Prelude library to provide basic tools missing from Swift’s standard library, FP in Swift will be a very real possibility. Higher-level FP programming will likely be no less foreign to the other dialects in a couple years than it is today; reliance on FP-specific libraries will not help.

Swift’s styles are being built today by the snippets and libraries that will constitute their core vocabulary.

Some idioms will be common to several styles; some will be unique to a style; some of those will be inapplicable, even inexpressible, in other styles.

Swift isn’t too young to have a style: It’s just too big to be confined to having a single style.

Radar tip: Shell one-liners to dump configuration info

Fri, 25 Jul 2014 00:00:00 +0000

With the current Xcode and Swift betas, I find myself needing to paste in the versions of my OS, Xcode, and Swift every time I file a new Radar.

So I use these handy aliases:

alias rdrconf='{ xcodebuild -version; echo; sw_vers; }'
alias swfconf='{ xcodebuild -version; echo; sw_vers; echo; xcrun swift --version; }'

The echo bits are there to put a blank line between the different lists of versions. The output looks like this:

% swfconf
Xcode 6.0
Build version 6A267n

ProductName:    Mac OS X
ProductVersion: 10.9.4
BuildVersion:   13E28

Swift version 1.0 (swift-600.0.41.2.2)
Target: x86_64-apple-darwin13.3.0

To copy that into my browser, I do:

% swfconf | pbcopy

followed by a Cmd-v into the form field.

Pervasive use of Optional in Swift is penance for nil

Wed, 09 Jul 2014 00:00:00 +0000

If you’ve looked to do anything significant with Swift, you’ve likely had to fall back on our old friend, Foundation, and likely also some newer friends in the form of other core Apple frameworks.

One thing you cannot miss with this legacy APIs is the pervasive use of optional types.

Thanks to them, we still get to angst about the billion-dollar mistake of nil, only now we get to pay a steady syntax tax throughout our codebase.

It is in light of this that I read a snatch of a comment by Dr. Robert Harper with some interest:

Your emphasis on whether the nth argument to that function/method is an option or not does not do justice to the real issue, which is what Yaron Minsky calls making undefined states unrepresentable (or words to that effect).

[…]

It is of course possible to do C-like or “pythonic” programming in ML using options instead of the “null pointer”, but that’s not the way to write good code […]. What you want is

types to express the correlations between components, these are called sums, and

pattern matching to explicitly match the cases that are legal and allow the exhaustiveness checker to warn you when you’ve missed a case, either by mistake or by design or as a consequence of evolution of the code.

I am afraid we know for a fact that legacy Objective-C APIs are not up to this challenge. Surprise exceptions and undocumented behavior on nil input will be with us for some time.

It is an open question whether Swift-native APIs will be written to support and lead us by example to that latter style of programming.

The way the Swift book punts in the face of a data source protocol with two options for providing data is not encouraging to this end:

NOTE: Strictly speaking, you can write a custom class that conforms to CounterDataSource without implementing either protocol requirement. They are both optional, after all. Although technically allowed, this wouldn’t make for a very good data source. (“Optional Protocol Requirements”)

Such an aside would have been the perfect time, not to shrug out, “oh, they are both optional, too sad, but what are we to do?", but instead to explain how an enumeration could be used to admit a delegate implementing precisely one or the other but not none and not both.

Intuition behind the Swift external/local parameter system

Thu, 05 Jun 2014 00:00:00 +0000

David Bryant Copeland picks out Swift’s external/local parameter system as something never before seen:

The notion of giving named parameters different names for the caller than are used in the implementation is not something I’ve seen before, and it’s kinda genius.

But further reflection convinced me that allowing different external and local parameter names is simply the Swift version of a common Objective-C practice.

External and Local Parameters in Obj-C

Consider these parallel Obj-C and Swift method declarations:

- (void)insertPerson:(Person *const)p atIndex:(const NSUInteger)i;

func insert(person p: Person, index i: Int)

The Obj-C version demonstrates “external” parameter names in the form of a verbose selector. In Swift, the selector components move into the parens as external parameter names.

The Obj-C formal parameter names are analogous to the local parameter names in Swift. The Swift external-then-local declaration order perfectly follows the Obj-C selector-chunk-then-argument order: Swift person p: Person vs. Obj-C Person:(Person *const)p.

Reaching back beyond even Obj-C, to C, this has been possible by exploiting the difference between a function prototype, commonly publicized in the header, and its implementation, commonly in a private implementation.

External and Local Parameters in C

In C, the only thing about arguments that the compiler cares about in a function prototype is the argument type; the names are purely documentary.

In the function implementation, so long as the types don’t change, you can name the formal parameters whatever you want.

So the C equivalent of the above would be:

/* prototype, in header */
void Insert(Person *const person, const NSUInteger index);

/* implementation, in .c file */
void Insert(Person *const p, const NSUInteger i)

About That Const

Take another look at the Obj-C and Swift versions of the function declaration:

- (void)insertPerson:(Person *const)p atIndex:(const NSUInteger)i;

func insert(person p: Person, index i: Int)

It’s uncommon to see const qualifiers on arguments in Obj-C. In this case, I was trying to remain faithful to the Swift default of const formal arguments.

You see, a Swift function declared like so:

func insert(person p: Person, index i: Int)

accepts an implicit let declaration of its parameters:

func insert(let person p: Person, let index i: Int)

The mutability-faithful version of the more common Obj-C declaration:

- (void)insertPerson:(Person *)p atIndex:(NSUInteger)i;

would similarly have mutable parameters in Swift:

func insert(var person p: Person, var index i: Int)

Swift parameters are const by default, and it’s great: it’s high time that was doable without stuttering const all over your codebase.

Summary

Swift’s external-local parameter declarations are a continuation of Obj-C selector chunk then parameter declarations: Obj-C insertPerson:(Person *const)p becomes Swift insert(person p: Person).
Swift function parameters are let-declared (const) by default. Qualify a parameter with var if you absolutely must have it mutable: insert(var person p: Person, var index i: Int).

VDM & The “Agile Spec” Problem

Wed, 28 May 2014 00:00:00 +0000

At the end of Distributed Programming & CALM, I mentioned how organically growing software often fails to produce a system with clear semantics.

So I count myself lucky that Mark Fernandes recently mentioned the Vienna Development Method (VDM). It’s like if design by contract, abstract data types, first-order logic, and a small imperative/OO language with a collection library had a lovechild.

Developing an application using VDM starts like this:

State some overall semantics claims, introducing state/objects, types, and functions as needed to flesh things out.
Use preconditions and postconditions (but preferably not any implementation) to define the functions.
Use invariants to clarify the meaning of your types.
Get it all to preserve the desired semantics.

At this point, you’re several removes away from a runnable application.

You’ve got some types and some operations on those types that have been implicitly defined using assertions.

Progressive refinement is how you work your way to more concrete data types and more explicitly (operationally) defined functions, step by step.

In concrete terms, you actually start writing some code in between the bunch of NSParameterAssert() and NSAssert() statements that currently comprise the entirety of your program’s functions.

The first part of refinement is coming up with a lower-level (closer to the implementation language) representation of the same system.

You might convert a set into an array with a “no duplicates” invariant, or use strings to represent a certain enumeration.

You’ll update functions or add new ones as needed, and rephrase pre/postconditions in terms of the new data model, and flesh out the actual implementation of the functions in terms of your new data types.

Then we get to the real heart of the matter: Prove this new model is truly a refinement of the last one. To do so, you must prove that your new version of the system preserves all the properties of the original system.

…so I’ve just been cribbing from the Wikipedia page I linked above, and it’s pretty cryptic about that refinement step. (See the end of this article for my best guess at how to interpret that section of the Wikipedia VDM article.)

Looking elsewhere, it appears that in addition to straight-up proof, model-checking and simulation are also used to validate the model. This make sense given that, even before we begin refining our system, we need to convince ourselves that we’ve actually modeled what we intended to model. If we haven’t succeeded at that, all we’ll be refining is high-level garbage into lower-level garbage.

The Agile Spec Problem

Key to the Vienna Development Method is starting from a known-good high-level model, before the first line of implementation code is written.

After the first step of formally stating the specification of the application, you function as a manual compiler. At each step, you apply a program transformation, then prove it preserves your application’s semantics.

You start with a high-level spec, and you push it down the abstraction hierarchy till you finally end up with an actual, executable implementation in your target language.

But that’s not terribly agile.

If you know of any work bridging the very different worlds of “specs are great, we can lower them down into working software” and “stakeholder-driven organic growth is necessary for software to create value” – or how to iteratively, agilely develop a spec, with non-expert stakeholder validation – shoot me an email or reach out to me on App.net, where I’m @jws.

Refinement in VDM: This bit is a mess in the Wikipedia article. Here’s my understanding of it:

A mapping must exist from your new data representation to the old for all instances possible in the old representation. The mapping function is called the “retrieval” function, since it retrieves the original/old version of the state.

Once you have such a mapping, you must show that, for all your new data, and for all your new functions, the old invariants hold:
- Post-conditions from a valid starting point that hold in the new version should also hold with the retrieval of the input and output values in the old model.

Distributed Programming & CALM

Sun, 25 May 2014 00:00:00 +0000

Distributed programming doesn’t get much talk in those terms in Cocoaland.

If you’re writing an iPhone app with a server, guess what: You’re writing a distributed system. For your users’ sake, I hope it’s also an offlineable system.

And we can view a multithreaded program as distributed programming, only with the distribution being far more local. Ordering issues rear their head when you start pushing data in chunks through concurrent queues, and the notion of producer-consumer punctuations (see below, Consistency without Borders) is practically useful if for no other reason than, “oh yeah, you can hide that activity spinner now, no more search results for ‘z*’ are coming”.

The End of the API

Some recent delvings on my part started when I read an article by cemerick, “Distributed Systems and the End of the API”. He brought up CRDTs and CALM. I’d heard of CRDTs before (thanks Patrick!), but not CALM.

I looked up CALM and found a good summary on the Bloom lang page and its intro: http://www.bloom-lang.net/calm/

BLOOMing CALM

From there read a blog post introducing the idea of CALM. I found it most useful for its many links. I started with the keynote slides mentioned therein. Those didn’t make more than 80% sense till I read the companion paper. After that, they’d got my attention good, so then I read the CIDR 2010 paper for more background on Bloom.

Bloom is a language implemented in/over Ruby that boils down to the Dedalus flavor/extension of Datalog, which makes time explicit in each relation and avoids the mess you find in Prolog where there’s this execution algorithm outside the system you need to worry about and play with. The entire thing is declarative, but the real point is that it’s straightforward to visualize the dataflow and analyze a Bloom program for points where the code computes a non-monotonic result.

“Non-monotonic” means it might need to change its mind about the output as new results arrive, which means you need some degree of coordination to ensure that non-monotonic computation actually got all the data needed to render a final judgment. And coordination has costs, especially if it’s between datacenters, or with a non-responding peer whose hard-drive just ate it.

Programs are made of memories, guesses, and apologies

From there I chased a reference to Building on Quicksand, which introduced the notion of programs being structured around:

memories (of what has happened),
guesses (about what might be true),
and apologies (for when those guesses turn out wrong).

This is even more obviously true in distributed programs, where you can’t keep every actor on the same page. Also points out that sometimes the right response is for a program to throw up its hands, email a human, and say, “Something is wrong. Figure it out, and apologize to user 1312347 for this weirdness.”

A Tale of N Consistencies

And then back to the Databeta blog, where I found “Consistency Without Borders” and accompanying paper.

This is a call for more research into assisting developers to grapple with consistency between two extremes. The first extreme is “let’s establish consistency only at the database layer in terms of reads and writes”, which is generally too conservative and expensive and too hard to safely and faithfully “compile” your program’s operations into. The second is, “let’s just handle all the consistency in our app”, which is also easy to get wrong, expensive, and not at all reusable.

Consistency without Borders looks at 3 different middle-grounds:

object-level consistency: think CRDT; build app around known-good objects, developed once and reused; can be hard, and can lead to you ending up with structuring the entire app as a single CRDT, which is not reusable; fails to capture properties of composition of objects
- Also, CRDTs can only converge to a deterministic state that’s invariant under duplication and reordering. No good if you have non-deterministic but well-defined behavior, like “purchase request returns OK if non-zero inv, FAIL otherwise”, which depends on order of message processing.
flow-level consistency: look at flow of data between modules, processes, and services; key ideas are confluence (insensitivity to message delivery order), which is convergence at the dataflow level basically, and a neat trick is “data sealing” via producer-consumer punctuations (“yes, you have seen all results as of now”)
- Confluence analysis has the same problem with handling non-deterministic behavior.
language-level consistency: encode all knowledge needed for dependency analysis and finding non-monotonicity; very convenient, but also requires completely changing how you write code

The paper also highlights the lack of data to assist in choosing between the many flavors of consistency. What’s the cost in making at trade-off? Can we afford more/less consistency in this case?

(I have yet to read their intriguing reference to the LADIS ’08 write-up “Towards a Cloud-Computing Research Agenda” about the extreme expense and danger of full consistency in an industrial context.)

Then moved to start looking at Peter Alvaro’s Blazes (slide deck. This is the flow-level analyzer mentioned in Consistency Without Borders.

Blazes looks for non-monotonic operations that aren’t protected by coordination based on annotations of code. This is a beginning towards assisting with debugging the issues encountered in distributed systems, vs. those you can readily debug with gdb or lldb. Once your code is all correct, there’s that small matter of, “Oh yeah, and that thing it’s correctly doing, is that semantically correct?”

Where It All Goes Wrong

But you still have to have correctly annotated everything you’re using. Good luck with the balls of closed-source mud you get to work with in GUI programs.

A major issue in organically grown software projects is even stating semantic properties, never mind ensuring they’re preserved across the application. We get tied up in matters like “did I just create a strong reference cycle” and futz about with that, do some refactoring, extract some things, whatever, and can continue in this vein for a good while till we have a serious mess in light of what the actual purpose of the application is. Leastwise, that’s what I seem to see too often. More on that later.

Use jsonlint to debug bogus JSON data

Wed, 02 Apr 2014 00:00:00 +0000

NSJSONSerialization delights in an opaque “lolnope character 12341234” error message that provides zero context. It doesn’t even bother to take advantage of line numbers to help you out, even if the JSON data has linebreaks. This is a royal pain, especially if you’re working with hand-written stub data for a web service.

I got fed up with this, went “there has to be a linter!", and landed on jsonlint by Zach Carter.

Install it via npm install jsonlint -g, and you’ll be able to find and fix syntax errors in JSON far faster than you would puzzling over NSJSONSerialization’s error message.

Comparing Parser Error Messages

Sample Data

Bogus JSON:

{
    "something": "bogus",
    "this way": [{
        "comes"
    ],
    "don't you": "know",
    "ayup": "you do",
    "or you will": "soon"
}

NSJSONSerialization

The wonderfully useless NSJSONSerialization error:

% nush
Nu Shell.
% (NSJSONSerialization JSONObjectWithData:
    (NSData dataWithContentsOfFile:"bogus.json")
  options:0 error:(set e (NuReference new)))
()
% ((e value) description)
"Error Domain=NSCocoaErrorDomain Code=3840
\"The data couldn\u2019t be read because it isn\u2019t in the correct format.\"
(No value for key in object around character 67.)
UserInfo=0x7fbf2860e8b0 {NSDebugDescription=No value for key in object
around character 67.}"
% (exit)

Python

The equally useless Python json error:

% python
Python 2.7.6 (default, Dec 28 2013, 00:41:57)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.load(file("bogus.json"))
Traceback (most recent call last):
    # 8< snipped 8<
ValueError: Expecting : delimiter: line 5 column 5 (char 67)
>>>

Hey, at least Python takes advantage of the fact the file has line numbers.

jsonlint

The jsonlint output:

% jsonlint bogus.json
[Error: Parse error on line 4:
...        "comes"    ],    "don't you":
----------------------^
Expecting ':', got ']']

Now, isn’t that nice? No need to cross-reference to the file, since there’s context right there, and you know exactly what it’s expecting, rather than just some opaque “delimiter”.

(The Cocoa error in this example isn’t entirely useless, since it does call out that a key is missing its value. I’ve seen more useless ones where it faults the very end of the file for something that went wrong way earlier.)

Installing jsonlint

I’ve installed it locally using npm install jsonlint -g.

Apparently, you could also use http://jsonlint.com/, but:

I’m wary of posting potentially sensitive JSON data to a public web service
I like being able to work even without network access
command-line tools compose easily into pipe-lines far better than most websites

So, I naturally recommend you just install it locally.

OptionBits and BOOL gonna bite you one day

Fri, 14 Mar 2014 00:00:00 +0000

I got to talking with a coworker about some code that tested bitmasks:

BOOL isFoo = flags & FLAG_FOO;

Don’t do this; you are inviting pain, suffering, and head-scratching debugging.

I wrote about the wonderland of joy and kittens that is C arithmetic earlier, but only in the abstract. NSUInteger and BOOL provide concrete examples that hit where it hurts.

The Trouble

Apple now recommend you use NSUInteger for your option bits. But we keep holding on for dear life to BOOL, which is a signed char. That means our bitmasks and our booleans differ in both signedness and width.

Have a look-see: Given this seemingly innocuous arrangement:

typedef NS_OPTIONS(NSUInteger, Flags) {
    FLAG_A_BIT_TOO_BIG_FOR_BOOL = 0x100
};
NSUInteger flags = FLAG_A_BIT_TOO_BIG_FOR_BOOL;

This naïve flag test assigns zero:

/* DON'T DO THIS! */
BOOL is_flag_set = flags & FLAG_A_BIT_TOO_BIG_FOR_BOOL;

while this works just fine, and gives a non-zero result, as expected:

bool is_flag_set = flags & FLAG_A_BIT_TOO_BIG_FOR_BOOL;

Why `_Bool` Is So Swell

The reason is that assignments to _Bool (which bool expands to when you include <stdbool.h>) are effectively run through a double-bang, as if you’d written this:

bool is_flag_set = !!(flags & FLAG_A_BIT_TOO_BIG_FOR_BOOL);

Aside: Arm64’s `BOOL` is `_Bool`

Added after initial publication. Thanks, Mark!

Running iOS on arm64? Is today ever your lucky day!

Unlike every other Apple platform, arm64 iOS (and the 64-bit simulator) typedefs BOOL to be bool. You get sanity for free.

Just don’t forget to test on a non-arm64 platform if you plan to release to a non-arm64 platform, because it’s still the wild west out there.

Introducing Bang-Bang

The double-bang trick coerces the value to be either 0 or 1:

The first bang inverts its logical value, so if it was non-zero (true) it’s now 0 (false), and if it was zero (false), it’s now 1 (true).
The second bang reverses that, and NOT NOT TRUE is just TRUE, so we’re logically back where we started, only now with a tidy, known arithmetic value representing that logical value.

If you apply this trick, then the assignment to BOOL plays out as you’d hope:

BOOL is_flag_set = !!(flags & FLAG_A_BIT_TOO_BIG_FOR_BOOL);

You might also see this written like so:

BOOL is_flag_set = ((flags & FLAG_A_BIT_TOO_BIG_FOR_BOOL)
                    == FLAG_A_BIT_TOO_BIG_FOR_BOOL);

Since the result of the bit-and is either 0 or FLAG_A_BIT_TOO_BIG_FOR_BOOL, the == test results in either 0 or 1.

There’s Always a Moral

The moral is:

Use bool or use !!, and beware the wicked type conversions.

Get the Gist

You’ll find some ready-to-compile, comment-full sample code demonstrating these issues over in this gist.

*Note: While BOOL might be signed char, YES and NO themselves are a bit more than just that now, to support integer literals. That’s neither here nor there.*

EDIT: Fixed 0 for 1 typo graciously pointed out by Mike Cohen.

TestFlight TOS & Privacy Changes

Fri, 21 Feb 2014 00:00:00 +0000

TestFlight changed their terms of service and privacy policy on 18 Feb 2014.

As is depressingly routine in this scenario, they conveniently neglected to provide the changes in a readily reviewable format. You just get a link to the current policy; no past policy, no diff.

Thanks to Archive.org and opendiff, I was able to review the changes side-by-side.

Terms of Service

Aside from some wording changes, the big changes are:

New section with “we can shut this thing down at any time, suckers”.
Shift in court venue from LA to Northern District of California.
Removal of arbitration option, previously offered for amounts less than $10,000.

Oh, and FlightPath goes bye-bye.

Privacy Policy

Privacy changes are innocuous. Mostly rewording of existing stuff.

Merged contact URL from separate ones for testflight and flightpath to just legal@burstly.com.
Addressed Do Not Track, which they ignore, because they don’t have any 3rd-party trackers on their page.

Though they did manage to work “national security” in there, just because it’s the legal buzzword of the year.

Installing and using Euterpea under OS X 10.9

Sun, 09 Feb 2014 00:00:00 +0000

The BNR Book Club (JOIN US!) is working through Hudak’s The Haskell School of Music: From Signals to Symphonies.

The tricky part of working with pre-release books and unreleased software is getting both to run on your system.

The site itself warns you of the difficulties of using Euterpea under OS X, but if you’re pig-stubborn and Mac-happy like me, you might benefit from my setup instructions.

The Scene

I figured Chapter 2 would start by explaining how to set up Euterpea so you can actually run the code we’re encountering, but no dice.

There are some lengthy installation instructions available from the author that attempt to cover Windows, Linux, and OS X.

The short of it is, Windows is very well supported, Linux so-so, and Mac, you’re on your own.

Anecdotal reports indicate that if you have a 32-bit version of GHC installed, things will pretty much just work on your Mac. Otherwise, you’ll hit problems.

I have the 64-bit version and was able to get things mostly working, to the point that I can write notes to midi files on disk. The trick is knowing what doesn’t work so you avoid it.

Installing

Adjust GHC’s C Compiler Command

This patch-up is necessary as of the 2013.2.0.0 Haskell Platform release. It likely won’t be necessary after the next release.

Before you do anything else, if you’re on Mavericks, follow the instructions under “Xcode 5 & OS X 10.9 (Mavericks)” from the Haskell Platform Mac page. You will be adjusting the C compiler setting for GHC to go through a shim that fixes up some differences between how gcc and clang do things.

Clone and Cabal Install Euterpea

Then you can get on to Euterpea. Follow the basic checkout and build instructions:

git clone https://github.com/Euterpea/Euterpea
cd Euterpea
cabal update
cabal install

Gotchas

Once you have it, here’s what not to do:

Don’t try to pull the top-level Euterpea in from GHC or GHCI. You’ll bomb out linking in GLFW due to symbol relocation issues.
- Instead, pull in the specific submodules you need, like Euterpea.Music.Note.Music or whatever.
play will not work. The FFI call into PortAudio hits some enum range issue I haven’t spent time debugging. test, which writes a midi file test.mid, does work.
I think even there, I would sometimes run into an issue with ghci (or when using runghc) that weren’t an issue when just compiling and then running a program. It stumbles into the GLFW issue, even though you’re intentionally trying to avoid it.

Verifying Your Install

Here is what you should do:

Just write a demo.hs program like so that calls the test function, then use ghc demo.hs to compile it, then run it as ./demo to dump your midi file. Once that’s done, you can play the midi file just fine.

How should you play the midi file? I just used timidity, per the install document. I cut out almost all the optional libraries to quicken build time, so my brew install line went like this:

brew install timidity --without-libogg --without-speex \
  --without-flac --without-libvorbis

I’ve attached a demo program you can use to check you have this all working. In the end, you should be able to do this and hear an F# Major chord:

ghc demo.hs && ./demo && timidity test.mid

Enjoy!

Now you should be able to hobble your way through the rest of the text. I hope. I’m still working my way through, as well.

This post first appeared as a somewhat less-structured post to the BNR Book Club Google Group.

Self-initiating into Smalltalk

Sun, 19 Jan 2014 00:00:00 +0000

If you’re an Obj-C dev, you should consider picking up some Smalltalk.

See, if Obj-C is Romanian – that quirky little Romance language – then Smalltalk is Latin. Sometimes it’s good to know your roots.

Based on my far less linear path, I recommend you:

First, read a few chapters of Squeak by Example.
Then, work through the first couple sections of Stephan B. Wessels’ Laser Game example.

Then, you should pitch in on Graham Lee’s ClassBrowser project, which aims to bring the Smalltalk style of immediate development to Obj-C.

Clarified CQRS - Reading Notes

Fri, 20 Dec 2013 00:00:00 +0000

On 19 Dec 2013, I read the article Clarified CQRS published by Udi Dahan on 9 Dec 2009, so four years ago.

(This reading was for the BNR Book Club: It’s open to all, and you should join the group!)

In it, Dahan elaborates their interpretation of CQRS.

Dahan’s new ideas:

Data inevitably stales: Exploit this instead of fighting it.
Each command can, and should, be processed autonomously from the others.
DRY to the max by jettisoning code and data store complexity wherever the command–query system allows it, which is far more than you might think at first blush.

Section-by-section notes follow.

Two appendices include notes from additional articles by other authors on CQRS, to provide context to the discussion:

Martin Fowler provides a cogent summary and links to references.
Greg Young originated CQRS and distills it to its essence in a single example.

Clarified CQRS

Why CQRS

Driven by:
- Collaboration: Mutable state shared by >1 actor
- Staleness: Data read by an actor can be invalidated by a subsequent write by another actor.
  - Exacerbated by caching.
  - Leads users inevitably to act based on obsolete data.

Queries

Data is going to be stale. Give in and skip DB hits.
Cache data in view-model format to avoid unnecessary marshaling.
- No need for the cache to be a RDBMS - ViewModels don’t require any joins, they’re already denormalized.
Views hit the cache rather than the DB for their display data.

Scaling

Add multiple caches.
Don’t worry about keeping them in sync across each other. Users just encounter different vintages of stale data in different caches.

Data Modification

Optimistic concurrency conflicts.
Validation is a pain.
End up rejecting a whole chunk of modifications because 1 is off, then users must redo their work based on the new data.
More users, bigger entities => more frequent and annoying conflicts.
Solved by commands:

If only there was some way for our users to provide us with the right level of granularity and intent when modifying data. That’s what commands are all about.

Commands

“Using an Excel-like UI for data changes doesn’t capture intent, as we saw above.”
- Submit commands instead of a “write these fields” instead. (Task-based UI)
- Capture intention - can even process asynchronously, report progress and failures in UI, let user investigate why failed.
“Note that the client sends commands to the server – it doesn’t publish them. Publishing is reserved for events which state a fact – that something has happened, and that the publisher has no concern about what receivers of that event do with it.”

Commands & Validation

Validation is different from business rules in that it states a context-independent fact about a command. Either a command is valid, or it isn’t. Business rules on the other hand are context dependent.

Returns to example of delinquency update arriving before preferred status application causing the latter to be reject; reverse the order, and we would have accepted both changes.

This is basically just pointing out that “a valid command is one that has all necessary, valid data” rather than “a valid command is one that won’t be rejected”.

Rethinking UIs and commands

Can use query store to speed up updates - autocomplete from query store, update sends the ID we already have for the selected value rather than text. Again, less marshaling.

Reasons valid commands fail

The delinquent vs. preferred race is just bad design. Should have same business outcome regardless of which arrives first.

Outcome: Notify the user (email).

No rejection errors are ever returned to the agent submitting changes. They can do nothing but notify the user, anyway.

No need even to show pending commands: Instead, notify users as needed asynchronously out of band.

Commands and Autonomy

Command processor should be autonomous.
Queue commands for processing, process them at leisure, rollback and retry as needed (DB down frex).
Serving commands and queries from separate stores prevents cache thrashing.

Autonomous Components

Acronym: AC = Autonomous Component

Command processor is an AC with its own queue.

Can go even further than that: Can have each command processed by its own AC.

This lets you get detailed queue and processing time metrics, and can scale up ACs on a per-command basis.

Service Layers

Per-command AC means each processor is independent. This is a stark contrast to the rat’s nest at each layer of many layered architectures.

Domain Model

Domain model is no longer used to service queries.

Not really necessary for commands either.

Scarcely need relationships – just precompute (denormalize) for queries, and have commands sent with needed IDs.

Persistence for Command Processing

No need for fancy DB queries.

Commands come in with IDs anyway.

So ORM not strictly necessary; can do key-value, optionally splitting out properties that benefit from uniqueness constraint into their own columns.

Key point here: “How you process the commands is an implementation detail of CQRS."

Keeping the Query Store in Sync

Apply command and broadcast event in transaction.
Per-command events - DoBlah broadcasts DidBlah on success.
AC does Event -> Query Store (cache) updates.
- Can readily do one AC per ViewModel (aka table).

Bounded Contexts

“CQRS if used is employed within a bounded context (DDD) or a business component (SOA) – a cohesive piece of the problem domain. The events published by one BC are subscribed to by other BCs, each updating their query and command data stores as needed.”

Mash-up into a single UI as needed.

Summary

CQRS is about coming up with an appropriate architecture for multi-user collaborative applications. It explicitly takes into account factors like data staleness and volatility and exploits those characteristics for creating simpler and more scalable constructs.

One cannot truly enjoy the benefits of CQRS without considering the user-interface, making it capture user intent explicitly. When taking into account client-side validation, command structures may be somewhat adjusted. Thinking through the order in which commands and events are processed can lead to notification patterns which make returning errors unnecessary.

Appendix A: Martin Fowler on CQRS

Never expanded anywhere in the article is the acronym “CQRS”:

CQRS stands for Command Query Responsibility Segregation. It’s a pattern that I first heard described by Greg Young. At its heart is a simple notion that you can use a different model to update information than the model you use to read information. This simple notion leads to some profound consequences for the design of information systems.

[…]

The change that CQRS introduces is to split that conceptual model [integrating various views of the underlying data] into separate models for update and display, which it refers to as Command and Query respectively following the vocabulary of CommandQuerySeparation. The rationale is that for many problems, particularly in more complicated domains, having the same conceptual model for commands and queries leads to a more complex model that does neither well. (Martin Fowler)

Fowler introduces more terms, like Reporting Database and Eager Read Derivation, which can be used independently of CQRS but feature in it as well.

Points out that where CRUD fits, you should likely use it. CQRS should also be deployed on a per-“bounded context” basis - it’s effectively a domain modeling decision.

Not clear that commands and queries are often really separate enough that it’s worth having two entirely separate models.

CQRS is nice for high-load apps – you can scale reads and writes independently. But still can handle this in CRUD by splitting out the really high reads into a ReportingDatabase used to serve just those queries.

Appendix B: Greg Young on CQRS

Greg Young originated CQRS per Fowler.

Fowler links to this summary by Greg Young:

Split CustomerService into CustomerReadService and CustomerWriteService. Boom: CQRS.
“[No biggie, eh? But!] This separation however enables us to do many interesting things architecturally, the largest is that it forces a break of the mental retardation that because the two use the same data they should also use the same data model.”
“There is however one thing that does really require a task based UI… That is Domain Driven Design.”
“The Application Service Layer in Domain Driven Design represents the tasks the system can perform. It does not just copy data to domain objects and save them… It should be dealing with behaviors on the objects”
- Don’t use DDD for areas where CRUD really is the “ubiquitous language”.
Conclusion:

Going through all of these we can see that CQRS itself is actually a fairly trivial pattern. What is interesting around CQRS is not CQRS itself but the architectural properties in the integration of the two services. In other words the interesting stuff is not really the CQRS pattern itself but in the architectural decisions that can be made around it. Don’t get me wrong there are a lot of interesting decisions that can be made around a system that has had CQRS applied … just don’t confuse all of those architectural decisions with CQRS itself.

Arithmetic Will Bite You One Day

Thu, 26 Sep 2013 00:00:00 +0000

Int: A Young Love

Early C has an innocent air. Take for example this bounds-checking function, which converts a file descriptor to a pointer, after checking that the file descriptor is a valid index:

getf(f)  /* Unix 6th edition: unix/fio.c:6619 */
{
    register *fp, rf;

    rf = f;
    if(rf<0 || rf>=NOFILE)
            goto bad;
    fp = u.u_ofile[rf];
    if(fp != NULL)
            return(fp);
bad:
    u.u_error = EBADF;
    return(NULL);
}

Want a value in a register? Use register. It does what it says on the tin. (At least it did then.)

Types? Those are a syntactic convenience. Go with the flow and use the native word size: the default type is int, and that’s nearly all you’ll need. It’s the (default, so not explicitly declared) type of both fp and rf in this function. And what do you think the function’s return type is, eh?

The Love That Wouldn’t Die

Modern C shows its continuing love for int in subtle ways that will one day corrupt your code.

It’s subtle, because most of the time, C arithmetic just works, to the point where you can remain unaware of what’s actually going on when you perform some innocent-looking arithmetic.

What’s (0xFFFF << 24)? Let’s see:

uint64_t mask = (0xCAFF << 24);
uint64_t expected = 0xCAFF000000;
printf("%" PRIx64 " == %" PRIx64 "? %d\n", mask, expected, mask == expected);
/* ffffffffff000000 == caff000000? 0 */

Well, that ain’t right.

And if you’ve got warnings turned on, your compiler might even be so kind as to warn you that you’re doing something boneheaded:

mask.c:7:25: warning: signed shift result (0xCAFF000000) requires 41 bits to
represent, but 'int' only has 32 bits [-Wshift-overflow]
    uint64_t mask = (0xCAFF << 24);
                     ~~~~~~ ^  ~~
1 warning generated.

Int, what int? I ordered a uint64_t, thank you muchly.

But “‘int’ only has 32 bits”. And the compiler ran out of bits. And then, once it got done shifting the int around, it widened it to take up a full 64 bits, and the sign bit came with it:

We wanted
```
0000_00ca_ff00_0000
```
But our int only had room for
```
ff00_0000
```
Which widens to the uint64_t
```
ffff_ffff_ff00_0000
```

Integer Promotions and Arithmetic Conversions

So you see, int is still very much the preferred type for integral literals and arithmetic.

This preference is embedded in the core rules underlying C arithmetic:

The integer promotions, which describe how integral types smaller than int get promoted to int or unsigned int
The usual arithmetic conversions, which describe how to pick a common type for the arguments to an arithmetic operation, and what the final result type should be.

Both rules make extensive use of the integer conversion rank of the various integral types, which we can roughly summarize as:

Bigger integral types have a higher rank.
Unsigned and signed types of the same size have the same rank.

Put all this together, and you can draw up a big spreadsheet of what value converts how with what other type of value, and how the arithmetic operation’s result type gets picked.

And that’s not even to speak of the fun we can have with the limited precision provided by a fixed-width integral type, namely overflow (INT_MAX + 1) and underflow (INT_MIN - 1).

What do you do?

Crank up compiler warnings (-Weverything -Wextra -Werror)
Be suspicious of arithmetic:
- Are the types right, even after promotion?
- Can this overflow?
- Can this underflow?
Be doubly suspicious of external data, whether from files, the network, or even your own web service.

Why a 64-bit iPhone?

Thu, 12 Sep 2013 00:00:00 +0000

Apple announced a couple days ago an iPhone with a 64-bit processor.

Some people worried because their code was unclean. Those people should read Compatibility Basics and follow it up with a strong draught of Apple’s 64-Bit Transition Guide from the last time Apple transitioned a platform from 32- to 64-bit.

Some people wondered why bother. “It’s just about the RAM, innit? Who’s gonna put that much blasted RAM in a mobile device?”

Well, someone will, and sooner rather than later. But regardless of that, there are other reasons to go 64-bit.

We hear and remember “we’re going 64-bit”. But it’s not just a word width change: it’s a whole shift in processor, ABI, and, in general, a chance to make a clean break from the past.

A prime motivation behind moving from x86 to x86_64 was how tremendously register-starved x86 was.

ARM isn’t exactly register-starved, but it’s got some rough edges. Nowhere near as rough as the Intel lineage, but rough nonetheless.

Transitioning to 64-bit gives ARM that same opportunity. Only they weren’t forced into it so soon, so they got to watch how everyone else tried (and failed, and retried) to do it.

I’m no great guru of ARM, so I will instead direct you to the August 2012 write-up by Mr.\ David Kanter over at Real World Technologies, ARM Goes 64-bit for details.

The key bits for me are:

More registers.
Bigger registers.
Yup, even the vector registers get bigger and more numerous.
Simplifications that make it easier for those of us coming from Intel-land:
- The end of conditional suffixes on all the instructions.
- Load/pop multiple replaced by load/store pair.
- An exception-level system that more closely resembles the ring-system and replaces the far more complicated eight (!) processor modes of ARMv7.
A memory model and corresponding instructions that jive well with the memory model and atomic operation support in the most recent revisions of C++ and C.
And yes, we can now directly address more RAM with a simple index. Nothing like PAE or manual DOSseriffic segment-indirection needed.

Though it’s less RAM than you might naïvely think:

Currently, AArch64 features two 48-bit virtual address spaces, one for the kernel and one for applications. Application addressing starts at 0 and grows upwards, while kernel space grows down from 2^64; any references to unmapped addresses in between will trigger a fault. Pointers are sign extended to 64-bits, and can optionally be configured to use the upper 8-bits for tagging pointers with additional information. (“Virtual Address Space”)

(I’d point you at the horse’s mouth, but as of this time, almost all the ARMv8 docs are only available for those cosy with ARM, not us plebs. An exception is the ARMv8 64-bit architecture overview, which provides a concise one-pager of a description of the changes from ARMv7.)

Compatibility Basics

Mon, 26 Aug 2013 00:00:00 +0000

Supporting multiple platforms – “platform” being the ramshackle combination of languages, libraries, OS, hardware, etc., that your code depends on to run – is a burden.

But you’ve no choice once you push data out of your application and off into the wide wide world. Any data you write to disk or network is potentially going to end up being read back on another platform. It could be something wildly different – data sent from an iPhone 5 to a little-endian MIPS machine on someone’s desktop halfway across the world – or it could be something less drastic, like an old file written to disk by version 1 of your application and then read back several OS and application upgrades later.

There are techniques you can adopt to mitigate problems, but many applications will founder on far simpler issues well before they reach that level of sophistication.

You likely take for granted:

How large an integer is.
What happens when you add 1 to the largest representable signed integer on your platform.
Whether the bytes in a word are stored big-end or little-end first.
That C strings are encoded using UTF-8.
Struct padding is always the same, right?

But all these assumptions are a trap:

Integer size varies from machine to machine.

A user ditches their 32-bit machine and restores all their data to a 64-bit machine. Can your app still load its saved files from before the migration?

OS X developers of some years past are well-acquainted with this problem, and Apple wrote the 64-Bit Transition Guide to aid them – and now you – in coping with this issue.
Integer overflow is undefined behavior in C. Unchecked overflow also often presents a security risk. When a C program is compiled for an architecture using two’s-complement representation for integers (read: pretty much everything you’ll likely ever compile for, but not everything someone else might want to compile your code for), often integers will wrap around from their largest to their smallest representable value on overflow.

But not always, and sometimes the behavior changes as you change compiler optimizations, so don’t count on it!
Endianness bites most people when they first do networking.

They soon find ntohs/ntohl and htons/htonl, which are used to convert port numbers (s, short, 16-bit) and IPv4 addresses in numeric (l, long, 32-bit) format between net (i.e., big) and host-endian representations.

There’s no great magic hidden in these functions, except that they bake in knowledge of whether the host platform stores bytes in network byte order natively or requires byte swapping (or more complicated hijinks) to convert to and from network byte order.

A simple byte-swap requires no great magic:
- Initialize an output value to all zeroes.
- Repeatedly:
  - Rotate the byte you’re working on now to be rightmost.
  - Mask it off using a bit-and.
  - Shift it into the correct (mirrored about the middle) position.
- Bit-or it into the output value.

/** Swaps bytes, and so swaps big/little endianness.
 *  (Hope your platform doesn't use packed binary-coded decimal!) */
uint16_t byteswap16(uint16_t in) {
{
    uint16_t out = 0;
    for (size_t i = 0, e = sizeof(in); i < e; ++i) {
        uint16_t shifted_right = (in >> (i * CHAR_BIT));
        uint16_t byte_i = shifted_right & 0xFF;
        uint16_t mirrored = (byte_i << ((e - (i + 1)) * CHAR_BIT));
        out |= mirrored;
    }
    return out;
}

But endianness is also an issue for binary data files, which includes many applications’ document formats. If you just spit raw bytes to disk, in whatever order you find them in-memory, then you’ll run into trouble when an app in a different endian environment slurps that file in.

C strings can use whatever encoding. If you’re doing data interchange, you need to make sure the encoding is specified, and take the appropriate steps to convert strings to and fro.

All of these traps are straightforward to avoid once you’re aware of them, but are a royal pain to redress after you’ve built a mountain of code atop platform-naïve foundations.

Unless you’ve got these right, don’t even bother worrying about the more obvious (and #ifdef-multiplying) differences between platforms.

Retain Still Matters

Wed, 07 Aug 2013 00:00:00 +0000

I recently received a question about this particular bit of code:

//cc -Wall -Wextra -Weverything -g -framework Foundation -fobjc-arc \
//nsstring-weak.m -o nsstring-weak
#import <Foundation/Foundation.h>

int
main(void)
{
/* nsstring-weak.m:7:22:
 * warning: assigning retained object to weak variable; object will be released
 * after assignment [-Warc-unsafe-retained-assign] */
    NSString *__weak stringInit = [[NSString alloc] initWithFormat:@"Lewis"];
    NSLog(@"stringInit: %@", stringInit);  // stringInit logs as (null)

    NSString *__weak stringLiteral = @"String";
    NSLog(@"stringLiteral: %@", stringLiteral);

    NSString *__weak stringFormat = [NSString stringWithFormat:@"SomeString"];
    NSLog(@"stringFormat: %@", stringFormat);
}

As the comment points out, clang warns about the first assignment, the one to stringInit, but not to any of the others.

You might read this as, “Assigning to weak blah blah blah released after assignment.” OK, well, now that you point it out clang, duh. I’ll fix that.

But then you look at the other two assignments: Why is clang OK with these?

Here’s the warning again:

nsstring-weak.m:7:22: warning: assigning retained object to weak variable; object will be released after assignment [-Warc-unsafe-retained-assign]

It turns out the real keyword in that message is the one you very well might have glossed over the first time: retained.

That’s the difference between the three object references:

stringInit refers to an object returned by alloc–init. This was returned retained, and will be lovingly put to death ASAP by ARC.
stringLiteral refers to a string literal. This will outlast even the run of the program; it’s chilling in static memory.
stringFormat's referent was returned by a message parallel to that of stringInit's, with one difference: the return value is autoreleased, not retained.

So you can still run into differences between objects returned from an alloc–init pair and those returned autoreleased under ARC, even in a bit of wotsit-wotsit code as minimal as this.

If you’d like to learn more, the nearly definitive reference on ARC lives in the Clang docs.

The actually definitive reference, of course, is clang’s source code coupled with the Obj-C runtime code.

MissingM: Ansible and Salt: A detailed comparison

Tue, 06 Aug 2013 00:00:00 +0000

http://missingm.co/2013/06/ansible-and-salt-a-detailed-comparison/

If you haven’t heard of them before, Ansible and Salt are frameworks that let you automate various system tasks. The biggest advantage that they have relative to other solutions like Chef and Puppet is that they are capable of handling not only the initial setup and provisioning of a server, but also application deployment, and command execution. This means you don’t need to augment them with other tools like Capistrano, Fabric or Func.

[…]

As an experiment, I decided to write a collection of Ansible Roles and Salt States to perform the same set of tasks and configure a brand new Ubuntu 12.04.2 LTS server [to run a Sinatra webapp under Nginx + Passenger with a Falcon-patched Ruby].

After my recent hosting adventures, I decided to investigate configuration management, and picked up O’Reilly’s Test-Driven Infrastructure with Chef.

I haven’t gotten to the good parts, but let me just gently caution that you should not try to work the exercises under OS X, and that running them in a VM is also advisable.

I’ve already seen enough, though, to hazard a guess that Ansible might be more my speed than Chef.

The post above by Joshua Lund provides a good flavor of Ansible and its spirit-brother, Salt. (This solution-space is getting pretty crowded, eh?)

Go Reflection Codex

Mon, 05 Aug 2013 00:00:00 +0000

http://jimmyfrasche.github.io/go-reflection-codex/

Each article contains some snippets of Go code followed by the equivalent code using the reflect package.

If you were foggy about how to work with reflection in Go, this should provide more than enough examples and tips.

Share and enjoy!

/var/log/bhs: golang impressions

Mon, 05 Aug 2013 00:00:00 +0000

http://blog.bensigelman.org/post/56158760736/golang-impressions

At this point, Go became a viable contender. However, when I was doing my initial research about it, I had trouble finding high-level impressions from recent converts that (a) sounded like they were written by an author with substantial experiences building systems in other languages, and (b) didn’t seem zealous. This post is my attempt to provide the sort of overview I would have valued reading before I started my own investigation into the language.

Surfing with ctags

Thu, 25 Jul 2013 00:00:00 +0000

Big code, little time, and you’ve work to do.

Where’s that function at? What’s in that struct?

There’s an app for that: ctags.

In Brief

The short version:

Install exuberant ctags.
ctags -R . in the project’s top-level directory to generate the tags file.
vim $file from within the same directory as your tags file.
- Alternatively: vim -t $tagname to jump straight to that tag.
CTRL-] while cursor in a tag (roughly fn/var/other name) pushes your current location then jumps to definition of the tag you were on.
CTRL-t pops back up the tag stack, so you can continue where you left off.
:help tags for more, but that’s about all you need.

(The astute vimmer will now note that the ^]/^t pair is the same one you use to jump around in vim help files. Handy, that.)

If you forget this, just man ctags, /HOW TO USE, and read that very brief section.

Ctags!

You don’t want just any ctags. You want the exuberant kind. Don’t ask why, just trust me: brew install ctags and get yourself some exuberant ctags.

Caramel Apple Tarball

To demonstrate, nab yourself delicious Apple-flavored libc. This particular variety is the one included with 10.8.3. tar xzf Libc-825.26.tar.gz, cd Libc-825.26 into the resulting directory, and wow that’s a good chunk of stuff.

Surfing Ctags

Now it’s time for ctags. First, we generate the tags file by walking the entire directory hierarchy we’re sitting at the root of. Don’t worry, it’s not so bad as all that, just a -Recurse flag and a . directory to kick things off:

Libc-825.26% ctags -R .
Libc-825.26% wc -l tags 
   12832 tags

Thassa lotta tags.

Let’s jump into the thick of it. Fire up vim at the definition of pthread_cond_signal:

Libc-825.26% vi -t pthread_cond_signal

We find ourselves with cursor blinking right on the p at the start of pthread_cond_signal:

/*
 * Signal a condition variable, waking only one thread.
 */
int
pthread_cond_signal(pthread_cond_t *cond)
{
	return pthread_cond_signal_thread_np(cond, NULL);
}

This is where things would start to suck if we didn’t have ctags. The entire function is just a call to yet another function. The _np non-portable suffix makes me think maybe it will be somewhere else. Maybe they exiled all the non-portable functions to a different file? Who knows. Even better: we don’t care.

/_np, RET, and hit CTRL-] to jump right to that function:

/*
 * Signal a condition variable, waking a specified thread.
 */

int
pthread_cond_signal_thread_np(pthread_cond_t *ocond, pthread_t thread)
{
	npthread_cond_t * cond = (npthread_cond_t *)ocond;
	int sig = cond->sig;

Ooh, fun, we transition from an old-cond to a new-style struct. (Maybe that _np suffix was new pthreads this time.) What’s the difference between old and new? Let’s take a look-see and find out.

/np, n, RET, CTRL-]. Looks like the definition of npthread_cond_t lives right below the original pthread_cond_t struct definition, so comparing the two is easy. The new version looks to have swapped out most of the old guts:

/*
 * Condition variables
 */
#define _PTHREAD_COND_T
typedef struct _pthread_cond
{
	long	       sig;	     /* Unique signature for this structure */
	pthread_lock_t lock;	     /* Used for internal mutex on structure */
	uint32_t	waiters:15,	/* Number of threads waiting */
		   sigspending:15,	/* Number of outstanding signals */
			pshared:2;
	struct _pthread_cond *next, *prev;  /* List of condition variables using mutex */
	struct _pthread_mutex *busy; /* mutex associated with variable */
	semaphore_t    sem;	     /* Kernel semaphore */
} pthread_cond_t;

for free space and a different approach to getting things done:

typedef struct _npthread_cond
{
	long	       sig;	     /* Unique signature for this structure */
	pthread_lock_t lock;	     /* Used for internal mutex on structure */
	uint32_t	rfu:29,		/* not in use*/
			misalign: 1,	/* structure is not aligned to 8 byte boundary */
			pshared:2;
	struct _npthread_mutex *busy; /* mutex associated with variable */
	uint32_t 	c_seq[3];
#if defined(__LP64__)
	uint32_t	reserved[3];
#endif /* __LP64__ */
} npthread_cond_t;

There’s also LP64 support and some alignment games. Fun times.

Well, that’s enough struct-gazing, we can pop back on up to where we were with CTRL-T and resume looking at the implementation of pthread_cond_signal_thread_np, right where we left off.

int
pthread_cond_signal_thread_np(pthread_cond_t *ocond, pthread_t thread)

I bet it’d be interesting to see what happens when that thread argument is NULL, as it is in a standard pthread_cond_signal call.

And when might it be called with a non-NULL value?

Curious and curiouser.

I leave you to it, dear reader, with exuberant ctags frolicking and wagging tail at your side.

Added 2013-07-26T15:03:08Z-0400: This post is an introductory walkthrough. Once you’ve got the hang of the basics, check out the discussion at /r/vim for further pointers.

Intruder Alert

Sun, 21 Jul 2013 00:00:00 +0000

The Apple developer center has been down since Thursday. The only information available from Apple was a “we’ll be back soon, and we’re not yanking any apps” reassurance. Thought maybe to check the dev forums? Those were down too, with roughly the same message.

Finally, Sunday evening, Apple revealed, by email and by updating the holding page at the iOS Dev Center, that they yanked their entire system down after

an intruder attempted to secure personal information of our registered developers from our developer website.

Well, that doesn’t sound good, does it? It gets better!

Sensitive personal information was encrypted and cannot be accessed –

Whoah there, Apple, I’ma let you finish, but cannot be accessed? Then how were you accessing it? Plainly it can be.

How hard it is for a cracker to access will depend on how it was encrypted. Apple have seen fit to reassure us that it was “encrypted”, as if all should be forgiven at the merest mention of “encryption”. Wave the magic encryption wand, and presto-changeo, insta-secure!

So, don’t worry, security fairies are at work. Just a slight slip-up:

we have not been able to rule out the possibility that some developers’ names, mailing addresses, and/or email addresses may have been accessed.

Wow, so a good chunk of what you’d need to open conversations with people’s banks, or credit card companies, or talk your way through most any phone tree. This is great info for social engineering and phishing. Someone could use this info to impersonate you and convince one of your friends or relatives to reveal some information they’d rather not share with strangers.

In the spirit of transparency, we want to inform you of the issue. We took the site down immediately on Thursday and have been working around the clock since then.

Waiting three days to tell everyone is a real class move. Phishers and impersonators have a three-day head-start on you. Enjoy!

Now, why would they need to stop the world, yank everything down, and beaver away for three days? It sounds like it took them about this long to figure out what even happened. Who knows how long the actual intrusion occurred and lasted before they noticed it happened on Thursday.

In order to prevent a security threat like this from happening again, we’re completely overhauling our developer systems, updating our server software, and rebuilding our entire database

I’m not clear how they can make a threat vanish by any means. Apple can make changes to better defend against a threat, but there will still be someone out there who wants developers’ credit card info, billing info, home address, phone, email – whatever personally identifiable information they can get their hands on, they’ll take.

So it’s great they’re fixing things. Somehow. Maybe. We still have no idea what actually happened, for all Apple rushed to shower us with transparency.

But it sounds like they got caught with their pants down, and are now trying to fix years of neglecting the server-side of their developer-facing services. Technical debt bites hard.

Keep an eye on your credit card and bank statements, and put your friends and family on guard for any funny business from someone claiming to be you. I hope we learn more details of this debacle soon, and I wish good luck to the poor folk at Apple who have been rousted from their beds and whip-cracked through the weekend.

Screenshot of the intrusion maintenance message

Moving House

Fri, 19 Jul 2013 00:00:00 +0000

Arch Linux logo

For the last two years, this blog lived in shared hosting over at DreamHost.

This worked great. I had never had my own domain or server-in-the-sky. There’s a learning curve to this “now your stuff lives out there” experience.

But I learned, and shared hosting grew constricting.

I wanted to run ZNC to keep up with the office chitchat. This is rightly verboten under DreamHost’s shared hosting plan.

VPS

Why not move to a VPS? Because $$$. Every plan I found cost as much as my home Internet connection does per month, if not more. I can’t justify that for hosting a mere blog.

But a discussion on App.Net introduced me to Digital Ocean, and boy was I happy to meet them.

$5/mo VPS. A sweet little server all to my self, at half the price I was paying for shared hosting. My fingers itched to make the jump, and the first weekend after I learnt of it, I started moving in.

Gateway Arch

I dove headfirst in. All options are Linux. I’m not wedded to anything; I’ve used some Ubuntu on and off for the last several years, because that seems to be everyone’s favorite distro for VMs. But I have the teensiest host Digital Ocean offers, and Ubuntu is far from my notion of svelte. I want a change: Arch it is.

If you’re a Machead like me, you might not be acquainted with Arch. Here is Arch in a nutshell:

Simplicity: Provide a lightweight base which an individual user can shape to meet their needs.
Correctness: Aim for compact, simple implementations of distro services.
User-Centered: Hand over complete control and responsibility. Lightly opposed to “user-friendly”.
Open: Select or build simple, open-source tools.
Freedom: Allow users to choose everything about their system.

As someone who has flirted on and off with TinyCore Linux – seriously, why don’t people base VMs they expect others to download off something that’s only 12 MB, and only that much if you need a GUI? – this appealed to me. Hard.

I got to skip the install process and jump straight into a VM with some networking tools already installed and some details of integration with the hosting platform already worked out.

Migrating from DreamHost to Digital Ocean

So now I had a working machine to call my own, it was time to migrate everything over from shared hosting.

I only had four services to move over:

The blog, a simple blob of static pages produced by Octopress.
Git, which I use to provide myself with private project hosting.
Piwik, which I use as an alternative to Google Analytics.
Tiny Tiny RSS, about which see Farewell, Google Reader

These required a few other details:

A web server.
MySQL, which both Piwik and Tiny Tiny RSS rely on for external storage.
PHP. (I’m sorry. It makes me sad too.)

The plan was to get everything up and running, then cut over DNS, then shut down the DreamHost shared hosting plan.

Arch turned out to have packages for everything I wanted to run except Piwik. Installing that is little more than unzipping an archive in the right place and rigging up your web server.

About that.

Nginx

Key to all this is the web server.

Instead of Apache, I went with Nginx - it seems lighter, and I didn’t want to mess with Apache’s configuration files.

I’d never used Nginx before, but it turns out that enough web searching can vanquish most any ignorance these days.

Most everything went fine once I found the alphabetical list of directives, but debugging what stanza handles what can become a pain. Fortunately, it’s a pain with a handy solution, as detailed by Justin Carmony in Debugging Nginx Configuration Trick:

Want to see some output to verify you’re tickling the place you thought you were when visiting a URL?

Redirect to google.com/?q=(your debug message here):

redirect ^ http://www.google.com/?q=HELLO! last; break;

To Be Continued

I reckon that’s enough for today. I plan to go over some more of the gory details in future, because they might help someone else the same way a smattering of random blog posts from all over the ‘net helped me.

Share and enjoy!

Farewell, Google Reader

Wed, 19 Jun 2013 00:00:00 +0000

Google Reader is shutting down. This is an excellent opportunity to move your blog-reading from the company store over to some fresh property you control.

If all you want is a Reader-alike, you have your pick of several options.

But I had something else – something better – in mind.

Reader-Schmeader

First, let’s get the Reader-alikes out of the way:

Stringer, a Rails app that’s easy to set up running free on Heroku, for which see Eric Dejonckheere’s excellent and well-illustrated walkthrough (in French).
Tiny Tiny RSS, a PHP conglomeration that’s easy to set up running wherever you have hosting, for which see Alan Henry’s Lifehacker article.

If that’s all you’re looking for, have at it.

A River Runs Through It

What I had in mind was somewhat different.

Past RSS reader avoidance had taught me to run away from the gotta-read-them-all whack-a-mole of mailbox-style RSS readers. I’d had a better experience with Twitter and similar social streams, even as my follow-list ballooned to be far busier than my RSS feeds ever were.

It was Dave Winer’s River2 that showed me I could have it all for RSS. The only problem: It runs under Windows. I don’t have a Windows web host, nor do I have any interest in maintaining a Windows box, so I had to look elsewhere.

So what I wanted instead of The Next (Yet Another) Reader was a solution that would:

run fine on a *nix box
present feeds as a River of News (stream-style) rather than as a mailbox
have a server component with API, so I could access my feeds however
support multiple users so I could share with family or friends

Nothing Doing

After some looking, the closest things I’d found to the River-spirit were the Planet aggregators, like Venus. Planet feeds are really aggregate-feeds that repackage several feeds into one large feed. You can see an example of a planet-river at Planet IF.

A planet really only hits the ’nix and river bullet points, which is a real let-down as far as a personal solution goes. It won’t track where I am in the stream, and it only supports multiple users inasmuch as it has no notion of users. The existing solutions also had a very Rube Goldberg sense to them; not something I could place much confidence in. And subscribing to more feeds would require editing config files.

So I looked for a good long time. In the end, I came up with this list of not-quite solutions:

Local only:
- Newsbeuter
- Canto
Server-side:
- Venus
- River (Windows only)
- Stringer
- RSS to Email

Nothing was quite what I wanted. I thought of making my own solution. Then I thought better, and instead gave Newsbeuter a chance.

Newsbeuter

Newsbeuter claims to be “the Mutt of RSS Readers”. Zed Shaw first used this title, and Newsbeuter has since embraced it.

I don’t use mutt, so I can’t speak to the title’s appropriateness, but Newsbeuter works pretty well. It mostly stays out of the way. It’s less a mailbox and more a hierarchical menu. It’s actually surprisingly iPhone-flavored in how its view stacks work.

So what of my requirements?

You could scarce dream up a more ’nix reader if you tried. (OK, rsstail might win Unix-philosophy-wise, but Newsbeuter nails the feel and is more practical.)
As far as the River of News experience goes, Newsbeuter still tracks read/unread status per article, but it’s not obnoxious about something being OMG unread. It’s configurable enough that you could almost entirely hide read status if you so wished.
Newsbeuter is local only. So much for the server-side with API dream. Well, I was about to give up on finding a reader solution entirely, so a half-measure was better than none.
And supporting multiple users doesn’t mean much without a server component.

So I had something that would sort of work. It didn’t satisfy all my dreams, but it worked. That’s a good start, I figured.

Newsbeuter viewing my Last 24 Hours query feed

RTFM

I read through Newsbeuter’s manual. The manual is more comprehensive than most.

The manual provides! It turns out I can shape Newsbeuter into a river of news reader. You just need to set up a query feed that shows you recent articles, ignore everything else, and be generous with the “mark all as read” command.

And provides again! It turns out Newsbeuter can sync with a server, so long as that server is Google Reader or Tiny Tiny RSS. Well, Google Reader was not an option, but Tiny Tiny RSS certainly was.

But it doesn’t answer all questions.

Could I use query feeds, my solution for creating a semi-river, alongside tt-rss, my solution for keeping feeds subscribed to synced across several devices? I couldn’t tell from the manual alone.

So I looked up Newsbeuter’s source on my phone. I expected to give up in frustration after a few taps and wait till I got back to my laptop to git clone and git grep. But I found the answer in no time:

ttrss_urlreader first loads all query feeds from the urls file,
then it hits the tt-rss API for the list of feed URLs

So I can totally rig up all the query feeds I want alongside using tt-rss as the main feed URL source.

The manual doesn’t answer all questions, but it answers most of them. For the rest, there’s the sourcee. Newsbeuter is written in clean, well-organized C++. It’s one rare project that won’t make you cry to read the source. Call me impressed.

Tiny Tiny RSS

Tiny Tiny RSS, you say? Well, that will let me sync my feeds across devices. I pull up the tt-rss project’s docs (oh the pain of slow slow Trac); I read the Lifehacker article I linked up above; it looks workable.

I set Tiny Tiny RSS up in no time on Dreamhost, and soon I am viewing RSS feeds in my browser. I’m not too keen on the UI; the Actions menu and Preferences sections feel like a sin bin of options, and early 2000s console-style “Loading” screens aren’t really my thing; but, it works!

Loading. Please wait…

Once configured, the UI doesn’t matter anyway, because tt-rss is just a sync point, not my reader. Its whole purpose in life is editing the feeds I subscribe to and servicing API requests from my actual readers.

But there was a hitch: I couldn’t get Newsbeuter to auth with my tt-rss install no matter what I did. So I turned to my trusty mitmproxy.

Reviewing the server’s responses quickly revealed the problem: API access wasn’t turned on for my user! You have to go under settings for your user in Tiny Tiny RSS and turn on the “external API” support manually.

Surprise! Without that, everything third-party will fail to work with Tiny Tiny RSS. And it’s disabled by default. Of course.

Enable API: Default is no!

Victory?

So in the end, I got what I wanted, after a fashion:

I run Newsbeuter locally and tt-rss remotely. Both happily hum along atop ’nix.
tt-rss doesn’t present a river, but I only use it to edit my feed list. Instead, I use Newsbeuter, which I have transformed into a River-style reader through careful configuration.
tt-rss rescues the server-side dream. A very PHP rescue in need of a UX and design team, but I’ll take “it works” over “it doesn’t even exist” any day.
And tt-rss has multiple user support.

If you’re looking for an RSS reader solution, and are interested in trying something off the beaten path, give Newsbeuter and Tiny Tiny RSS a shot. They’re more than the sum of their parts.

Big Data

Sun, 16 Jun 2013 00:00:00 +0000

One Finger Moon-Pointed

Everyone seems enamored of the accidental scaffolding we find ourselves forced to erect around our growing tower of data exhaust by our increasing cupidity for data data DATA and the treasure we hope we’ll one day find therein.

We have to develop machinery to deal with data at the latest greatest scale, but the machines are just nuts and bolts. We needed a garage to park our data-Hummer in. We had a crew build one by hand using the poor Bronze Age tools available. The final result is eccentric and baroque; it requires constant maintenance just to avoid collapsing; but it solves the problem.

The part no-one outside well-insulated research centers seems to be getting all moon-eyed over is the part that matters long-term: you have what was till recently an unthinkable amount of data. How do you extract meaning from it?

This gets us into statistics, machine learning, curve fitting, segmentation, clustering, sentiment analysis, etc. That’s the part everyone should be getting excited about.

I want to read breathless blog posts about how you have advanced what we currently know how to do statistically. Because I can’t write those posts; I can poke machines and set them on their feet and point them one way or another and dust them off when they fall down, but I know not nearly enough about statistics and numerical analysis.

So much of today’s tech news is just oohing and ahhing and cooing at the latest gilded set of Napier’s bones. We’re an industry in love with its slide rule, polishing and primping and extolling its many virtues. We forget these Rube Goldberg contraption abstractions are tools for the real pursuit of knowledge.

We’ll use Bayes’ law for the next several centuries. We’ll use today’s wonder-systems for the next several decades, and we’ll curse them as legacy rubbish long before that time is through.

See also: Bryan O’Sullivan’s “Big Fucking Deal”

One Serf Disjointed

There’s another angle on this: Our continuous tech-wow is an obscene celebration of the enclosing of the commons. You’ve worked out how to pen up a bunch of users and soak them for all they’re worth. You’ve built the airiest friendliest most inescapable prison-hotel. To do so required innovations legal political and technical the like of which we could only have dreamt of a century ago. But this is not innovation in the service of anything noble; it’s the sad extraction of value from a benighted underclass of peon users.

See also: Hammerbacher’s comments in “This Tech Bubble Is Different”; Schneier with “More on Feudal Security”.

Clean Code Is Not Enough

Sun, 09 Jun 2013 00:00:00 +0000

So much effort is expended on making code look fine at the micro-level: indentation, naming, method size, etc. So little attention is paid to the macro-level: How does everything come together to effect the desired result? Why is it there?

Developers world-wide waste countless hours debugging, not because of anything they have done, but instead because of library authors’ failure to express preconditions, effects, and expected uses. You can’t do anything to guard against this save choose libraries carefully. And all too often you have no choice.

Literate programming tried to address whole-program understanding. Its core idea: Construct a program via a linear narrative that proceeds by progressive refinement to elaborate the entire program. Code chunks can be incorporated by reference into other chunks. The source code is written for human consumption, not machine. The input to the compiler is ultimately produced by weaving chunks together. For a small example of the pretty-printed human-oriented product, see Shane Celis’ Minimal Emacsy Example Program.

But literate programming fizzled. The highest level of documentation you’ll find in many projects today is class-level. That’s not enough to fully comprehend, never mind maintain, a system. Post-hoc analyzers that visualize the rat’s nest of dependencies serve only as adjuvants in sussing out the Escher’s rat’s nest we’ve crafted ourselves.

In the hands of a capable author, a literate program explains the wherefores of the code to its recipients. As test-driven development guarantees all developed code can be tested, literate programming guarantees all developed code can be coherently explained from start to finish.

But that doesn’t mean the code is correct. And that doesn’t mean the code has been coherently specified. Humans are very limited in their ability to cross-check the validity of a complex, interconnected web of logical statements.

This is where specification languages, like Z (pronounced “zed” in this context), come into play. Alloy applies model-checking in a limited universe to the specification. But bridging the gap between specification and implementation remains in the hands of human – all too human! – implementors.

So we come to the notion of extracting programs from a verified proof. If you go at it right, you only arrive at a proof that everything does what you specified it does by way of producing an existence proof. The proof demonstrates by example that your claim, that your app does the umpteen things you say it does, must be true. Once proven, you can extract this latent program, and, ideally, run it.

But let us set aside the opium pipe. We’ve come full circle: Unless all the libraries and all the frameworks you are using in developing your program have been developed in such a way that you can incorporate their specifications directly (and what world would that happen in?), or at least documented such that you can make meaningful and tractable claims you can adopt as axioms, you’ll still find yourself falling through the sandcastles that are all we have to use as the latest app’s foundation.

We can’t, any solitary one of us, fix this alone. Until everyone starts taking responsibility for building their programs correct from the get-go and documenting them so they can be used in line with the assumptions inherent in the built product, we will all continue to suffer from our continued haze of confusion, no matter how sparkling clean and transcendently clear our own code might be.

Notes on QuickCheck

Mon, 27 May 2013 00:00:00 +0000

QuickCheck is a nifty library embedded in Haskell for specification-based random testing of programs.

Instead of hand-writing your tests as you would with one of the TestUnit frameworks, you specify computable properties about your program functions. The specification includes a description of how to generate inputs to the property equation. The test framework evaluates the property against a fixed number of inputs produced by the spec’s generator and reports whether the results were correct or not.

This compresses the amount of test code you have to write. It effectively raises the level of discourse from hand-testing input/output pairs or execution traces to making general assertions you could use to reason about your functions.

Random testing is uniquely suited to revealing mistakes in the programmer’s conception of the program, since it can produce inputs you wouldn’t dream of on your own. (Composing generators to produce a big mess of a data structure is pretty nifty, too; hand-writing those inputs is just about the most boring thing ever.)

Reference

Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the fifth ACM SIGPLAN international conference on Functional programming (ICFP ‘00). ACM, New York, NY, USA, 268-279. DOI=10.1145/351240.351266 http://doi.acm.org/10.1145/351240.351266

Notes

“Random testing is especially suitable for functional programs because properties can be stated at a fine grain.”

“domain-specific language of testable specifications”

“test data generation language”

“unless specifically stated otherwise, we always quantify over completely defined finite values.”

“the programmer must specify a fixed type at which the law is to be tested”

“+ is associative for the type Int, but not for Double!”

“In some cases, we can use parametricity [17] to argue that a property holds polymorphically.” ([17] is Wadler’s “Theorems for free!” paper.)

“implication combinator” for conditional laws

“the result type of the property is changed from Bool to Property” - handles “failed precondition” state in addition to pass/fail

“The classify combinator does not change the meaning of a law, but it classifies some of the test cases”

“collect will gather all values that are passed to it, and print out a histogram”

“risk of [bias towards trivial test cases] every time we use conditional laws, so it is always important to investigate the proportion of trivial cases among those actually tested.”

solution: “replace the condition with a custom data generator” (emphasis added)

“two infinite lists are equal if all finite initial segments are equal”

“in general it is not clear how to formulate and execute properties about structures containing bottom”

“we must rely on the user to provide instances [of the type class Arbitrary] for user-defined types”

“the size bound is simply an extra, global parameter which every test data generator may access; every use of sized sees the same bound”

“use [promote] to produce a generator for a function type, provided we can construct a generator for the result type which somehow depends on the argument value”

“think of coarbitrary as producing a generator transformer from its first argument”

“variant n g constructs a generator which transforms the random number seed it is passed in a way depending on n before passing it to g”

“mapping each constructor to an independent transformer, and composing these with transformers from each component. Other recursive datatypes can be treated in the same way.”

“it is more convenient to let test data be automatically generated using arbitrary so one is encouraged to make distinctions explicit in types”

“using QuickCheck changes the balance of convenience in the question of introducing new types in programs”

“The most serious pitfall we uncovered with this experiment was the false sense of security that can be engendered when one’s program passes a large number of tests in trivial cases.”

“a typical development cycle is to write down the specification of the circuit first, then make an implementation, QuickCheck it for obvious bugs, and lastly call the external theorem prover for verifying the correctness.”

“we can test more properties than we can formally verify!” because testing is not restricted to first-order logic, which the Lava system’s external theorem prover was restricted to

“A drawback is that we have to fix the types of these circuits” This is the “we have to test with concrete data, y’all” requirement again.

Section 5.4 Pretty Printing describes an interesting way of reaching a working Java implementation: write once functionally, write once using state + exceptions in a functional language, check equivalence, then port the state + exceptions model to an imperative language, generate test inputs, and check the final implementation.

“since the specification refers to the implementation, then the specification module must import the implementation currently under test” which is annoying for different implementations of the same abstract interface (and solved by ML functors)

“‘By taking 20% more points in a random test, any advantage a partition test might have had is wiped out.’” From [11]. Explains why we shouldn’t bother using random testing rather than carefully selected tests or automatically selected tests that meet some coverage criteria.

“often one can check that a programs output is correct much more efficiently than one can compute the output” - see [4] on “result-checking”; QuickCheck is simultaneously more general (can check associative property rather than output correctness) and more directed (exclude testing bad/irrelevant inputs)

grammar-directed testing - “context-free grammars cannot express all the - also one ugly thing to learn, and divorced from your primary language desired properties of test data” (frex def-before-use constraint)

“the Gen monad which QuickCheck is based on is not a monad at all!” - seed use/order matters - “morally” the same wrt distributions, but even still, observable differences in the output - “There is some interesting semantic theory to be done here.”

“What we cannot do is observe non-termination in a test result.” This is in part a limit of the compiler, which reports errors to humans that it won’t to programs.

“one of the major advantages of using QuickCheck is that it encourages us to formulate formal specications, thus improving our understanding of our programs” And QuickCheck provides a practical reason to write specifications where there wasn’t one before.

Errors in practice divide roughly evenly between:

bad data generators (useless to find, but necessary for testing to proceed)
bad program code (this is why we’re testing!)
bad specifications (reveal misconceptions about the program)

Spec writers need libs they wouldn’t for program code, frex finite set theory. Different requirements on the libraries: can use less performant but more concise functions in spec than in main program.

“The major limitation of QuickCheck is that there is no measurement of test coverage: it is up to the user to investigate the distribution of test data and decide whether sufficiently many tests have been run.” (bold emphasis mine)

HTTP Library Survey

Sat, 25 May 2013 00:00:00 +0000

Totentanz by Michael Wolgemut, 1493

Working with the Web seems to be all apps do these days. The stock Foundation URL loading system requires endless ceremony and a lot of work to try to patch it up to anything resembling watch-batteries-included.

It doesn’t have to be this way. Perhaps you’ve seen Kenneth Reitz’s quick and easy Requests library for Python.

Well, you’re not the first person to think that it doesn’t have to be that way in Cocoaland, either.

Better Dead than Red?

A quick survey of search results for “http” over at Cocoapods yields:

HTTPRiot
- Inspired by Ruby’s httparty.
- README’s documentation URL 404s.
- Last updated 2 years ago.
- “Get/post/delete with options” is the name of the game. See Tweet.m for how it gets woven into a model object and HRResponseDelegate for the optional (last resort) delegate methods.
ObjectiveResource
- A 4-year dead port of Ruby’s ActiveResource to Cocoa.
- Possibly also a great name for an Objectivist weekly newsletter.
LRResty
- Inspired by the Ruby RestClient library.
- README still treats ARC as new and iOS 4 as something to care about. No recent updates.
- The website is pretty though.

Perhaps you’re starting to see a pattern:

Take Ruby library
Make Obj-C library

Tacking For the Horizon

The next few take a different tack. They also seem to be actively maintained.

RequestUtils
- Hews close to the standard Foundation conventions of separating URLs from strings, but makes it easier to build a more complex request without having to repeatedly mess with a mutable URL request.
- Simplifies form and URL encoding and decoding, with options for how to handle repeated specifications of the same query key.
- Recently updated.
BBHTTP
- ARC-only.
- Uses before/success/failure/finally-style continuation blocks.
- Avoids manual string->URL coercions.
- Convenient support for downloading and working with JSON, image, and file content.
AFIncrementalStore
- Bridges Core Data to a REST backend.
- Works by your telling it how to un/pickle a resource and map objects and relationships onto paths and IDs.
- Convenient if you’re using Core Data.
- One of those Hell-cobblestones if you’re also hooking into iCloud. (And who doesn’t want to be cloud-high these days?)
AFNetworking
- Continuation-based with a reactor.
- Minor class-splosion for handling various response Content-Types, but it’s a reasonable way to bake in support for common response types.
- By far the most popular option of all of these (judging by Github stars, as you do).
- Under active development.
- I mean, you knew we’d end up here eventually.

Conclusion

The road to the Web from an iOS app is littered with corpses. We’re stuck with the Foundation mummy, but you might as well pick a fresher partner for your own Totentanz. AFNetworking certainly has some life left.

There’s also room for a Cocoa Toolbox site along the lines of the Clojure Toolbox. It’s great we’ve something of a dependency management system now, but without a well-curated guide, we’ve more of a run-down museum than a lively workshop.

The image above is a 15th Century print by Michael Wolgemut, found on Wikipedia. Dude’s been dead long enough we can at least use his work freely. Unless someone holds a submarine patent on display of over-half-a-millenium–old prints on the Internet. I mean, shopping carts on the Internet were a major brainwave, right? No-one would ever have thought of that.

Key Reordering: MongoDB's Achilles' Heel

Tue, 23 Apr 2013 00:00:00 +0000

MongoDB relies on key order but doesn’t guarantee it will preserve document key order. Steer clear.

MongoDB document keys are ordered. Subdocument queries fail to retrieve results whose keys are in a different order. Indexing only works with keys in the same order.

This uglies up the client application interface. Languages regularly provide literal syntax for unordered associative arrays (aka hashes, maps, dictionaries). To guarantee all similar documents have the same key order, you either forgo the literal syntax provided by your language to use an order-preserving data structure, or you throw a sorting layer in between the application and MongoDB.

That’s bad, but you can work around it.

What you can’t work around is MongoDB rearranging keys behind your back:

When performing update operations that increase the document size beyond the allocated space for that document, the update operation relocates the document on disk and may reorder the document fields depending on the type of update. (Update - MongoDB Manual 2.4.2)

Let’s read that again: “[T]he update operation […] may reorder the document fields.”

Just dandy.

Using MongoDB requires a fixed key order for reliable operation, but MongoDB fails to preserve key order. As convenient and simple as MongoDB is, this is enough for me to advise anyone: Stay well away from MongoDB.

Clojure/West 2013

Wed, 10 Apr 2013 00:00:00 +0000

I stumbled from the red-eye into the airport shortly after dawn. Bleary-eyed and back on the ground in Atlanta at last, Clojure/West, and its talk of Lisp and browser and JVMs, seemed already a distant memory. My laptop hadn’t left its TSA-friendly bag for the past week. My notebook was full of scribbles and mindmaps. What was a Cocoa programmer to make of it all?

Continue reading at the Big Nerd Ranch Blog →

Leak-Free Recursive Blocks

Wed, 27 Feb 2013 00:00:00 +0000

Sometimes, you want a block to be able to call itself. That means it needs a reference to itself. And that means you have a wonderful opportunity to create a strong reference cycle that will endure till the end of time, or at least till your app exits.

The Solution

__weak __block block_t recurse;
block_t block;
recurse = block = ^(id val) {
    …
    recurse(subval);
    …
}

Getting There

The prototypical recursive function example is the factorial function:

uintmax_t
factorial(uintmax_t n)
{
    NSParameterAssert(n >= 0);
    if (n == 0) return 1;
    return n * factorial(n);
}

So that’s a function. Now we want to make it a block.

As a block, that factorial self-reference is tricky. A block only gets a name when you assign it to a variable. That variable assignment won’t happen till after the block is created. So you need a __block variable:

uintmax_t (^__block factorial)(uintmax_t n) = ^uintmax_t (uintmax_t n) {
    NSParameterAssert(n >= 0);
    if (n == 0) return 1;
    return n * factorial(n);
}

Sadly, that’s a __strong __block reference by default. Oops. If you don’t want to leak, you either need to take care to break the strong reference manually by NULLing out the factorial variable – good luck – or you need a weak reference.

So you retag the block variable as weak:

uintmax_t (^__block __weak factorial)(uintmax_t n) = …

Only now there’s never a strong reference to your block, so your block is eligible for deallocation right after it’s returned.

So you need both a strong and a weak reference to your block. And the block needs to be stored in the strong reference first, so you anchor it to this world. So maybe you do this:

uintmax_t (^__block __weak weakFactorial)(uintmax_t n);
uintmax_t (^factorial)(uintmax_t n) = …
weakFactorial = factorial;

That fixes it. It works. And it’s nice that we could drop the __block from the main factorial reference, since that stays stable; it’s only the weakFactorial self-reference that gets updated after the block first captures it.

But having to do this follow-on assignment is kind of ugly, and it’s pretty forgettable way down there at the end of your block, and a few revisions later they’ll get separated, and then maybe you’ll accidentally delete the weak assignment, and then you’ll have to track down a bug. Ugh.

So try this instead:

uintmax_t (^__block __weak weakFactorial)(uintmax_t n);
uintmax_t (^factorial)(uintmax_t n);
weakFactorial = factorial = …

You’re not going to forget that assignment. It’s still ugly, but that’s what happens at the edges of ARC, and it’s at least not too fragile. You could even snippetize it, if you’re into that sort of thing.

Caveat: Asynchronous Recursion

Justin Spahr-Summers rightly notes that this approach works only for synchronous recursion:

Once your strong reference goes out of scope, wouldn’t it just stop recursing because all that’s left is a weak reference? (Twitter)

It’s actually worse than that: you’ll get a segfault, because you try to invoke a NULL block as a function. If you have some sort of concurrent or asynchronous recursion – maybe invoking the block kicks off a gradual countdown – then you’ll need to handle the case where the block is dead and has been zeroed but doesn’t know it yet.

Use the standard trick of trying to obtain a strong reference and then testing whether that reference is nil or not to decide how to proceed:

block_t recurse = weakRef;
bool zeroed = !recurse;
if (zeroed) /* bail out */;
/* use |recurse| to reference yourself, so that you don't segfault */

If you want the recursion to run to completion rather than fizzling out, I imagine you can work out the appropriate juggling act.

This isn’t a problem for the common case of synchronous block handlers, where you set the handler and don’t change it, and the object owning the strong reference is guaranteed to outlive the handler block’s recursion, but it is something to watch out for as you get cleverer or otherwise start destroying references while the block might be running.

Dollars, Not Donuts

Sat, 16 Feb 2013 00:00:00 +0000

Internationalization is hard. If you’ve always lived where you have to travel to another continent to encounter something other than your local monoculture (hello, many fellow Americans), it’s even harder.

Perhaps it should be no surprise then that everyone gets it wrong. Over and over, I see the same amateur mistakes. Two seconds using your app in some locale other than the one you developed in would make you go, “Oh. Crap.”

Home Depot accepts Swedish kronor now?

Today’s mistake is “localizing” currency amounts in your application.

Take a look at this prime shot from the Home Depot website this morning:

Home Depot product - price listed as Swedish kronor

I’m sending sv-SE as my preferred language, so I get back the site’s closest approximation to Swedish in Sweden as possible. Oh, how cool, that includes converting currencies to Swedish kronor!

Oh wait, no it doesn’t. They just rigged up their currency formatter wrong. The formatter thinks the amount they specified – an amount actually in US dollars – must be Swedish kronor (SEK), because they didn’t tell it anything more specific, and so the formatter printed the amount as SEK.

I ran into this same thing when I went to renew my car insurance. A whole table of US dollar amounts formatted as Swedish kronor.

This happens so often, across both websites and apps, that when I encounter something that actually does support displaying actual honest-to-God SEK amounts, I still assume it’s USD misrepresented as SEK until I realize the prices are wildly different from what the USD amount would be. This is so rare that it’s only happened once: thank you, Kayak, for getting this right!

If the exchange rate weren’t so far from 1 between USD and SEK (currently about 6 SEK to 1 USD), I don’t know how I’d be able to tell whether an app got it wrong or got it right.

This amateur mistake, repeated across vendors, applications, and platforms, makes for a terrible user experience for anyone working in a different region, even if they speak the same language as you.

It’s easy to get it wrong

Let’s see, how would you format a currency amount, say, $1,234.59 USD? How’s about this:

id amount = [NSDecimalNumber
             decimalNumberWithMantissa:123459
             exponent:-2 isNegative:NO];
NSString *display = [NSNumberFormatter
                     localizedStringFromNumber:amount
                     numberStyle:NSNumberFormatterCurrencyStyle];
NSLog(@"%@", display);

Very easy, right? And very wrong. That prints, “1 234,59 kr”.

Well, let’s try rigging up the currency formatter ourselves, eh?

NSNumberFormatter *fmt = [NSNumberFormatter new];
[fmt setNumberStyle:NSNumberFormatterCurrencyStyle];
[fmt setGeneratesDecimalNumbers:YES];

NSString *formattedAmount = [fmt stringFromNumber:amount];
NSLog(@"%@", formattedAmount);

Nope, still prints “1 234,59 kr”. This is likely exactly what the “convenience” converter did for us, minus the configuration setting to convert strings to decimal numbers. So that’s a lot of hoopla for no gain.

But not hard to get it right

But it just takes one more line to get it right. You need to tell the formatter what exactly it’s formatting. Raw numbers aren’t enough: it needs to know the currency you’re giving it. Like so:

NSNumberFormatter *fmt = [NSNumberFormatter new];
[fmt setNumberStyle:NSNumberFormatterCurrencyStyle];
[fmt setGeneratesDecimalNumbers:YES];
[fmt setCurrencyCode:@"USD"];

NSString *formattedAmount = [fmt stringFromNumber:amount];

Yup, it just takes a setCurrencyCode: message, and all is right with the world. This version here prints “1 234,59 US$” when using the Swedish region. The currency symbol follows the amount, as is usual in that region, and the separators for thousands and decimal are localized, as well.

Testing

Testing this is dead easy. You don’t even have to abandon your beloved mother tongue! Just change your region settings to something different. Swedish in Sweden has triggered enough localization mistakes for me, but I’m sure many other locales get messed up just as readily.

Changing languages is awkward – iOS does this semi-reboot thing, and you have to restart Mac apps to get them to pick up the change – but region settings can be changed in a jif and changed back just as easily, without forcing you to wander through menus in a language you might not understand. So switching regions is the list you can do to test how your localization – even inadvertent localization – might be wrong.

Change your region setting like so:

iOS: Settings > General > International > Region Format > Swedish > Sweden
Mac: Settings > Language and Text > Region > Swedish > Sweden

On the Mac, you’ll probably need to tic the “Show all regions” box before you can choose a region that doesn’t use your top language choice.

Sum-Up

Distinguish between localizing presentation of an amount in a unit and both the presentation and the unit.
- Compare 1 mile versus 1.6 kilometers: does using “1 km” instead of “1 mi” in a metric-using region really represent what you meant to communicate?
Declare the currency code to your currency-style number formatters.
```
[numberFormatter setCurrencyCode:@"USD"];
```
Test your app in different locales.
- Just because you’re not localizing doesn’t mean system components you’re using aren’t doing it wrong for you.

Toll-Free __bridging

Tue, 29 Jan 2013 00:00:00 +0000

Starting with OS X 10.6, you can use the __attribute__ keyword to specify that a Core Foundation property should be treated like an Objective-C object for memory management:

@property(retain) __attribute__((NSObject)) CTFrameRef frame;

This is an easy attribute to miss. It’s also one you can go a long time without finding, because it’s not hard to work around.

(Do note that you can use the NSObject attribute with anything where you’d use CFRetain/CFRelease, not just actual toll-free bridged objects. The toll you’re dodging with __attribute__((NSObject)) is purely syntactic.)

Work Arounds

You can get surprisingly far by just pretending that a CTFrameRef is an id:

@interface MyClass
@property(strong, nonatomic) id frame;  /* CTFrameRef */
@end;

__bridge

You just have to sprinkle casts in the appropriate places:

CTFrameRef frame = CTFrameCreate…
self.frame = (__bridge id)frame;
CFRelease(frame);

__bridge_transfer

You can even use the casts to save you a line of code here and there:

CTFrameRef frame = CTFrameCreate…
self.frame = (__bridge_transfer id)frame;
/* ARC now has ownership of |frame|, so it is responsible for releasing it. */

CFBridgingRelease

Or perhaps use one of the less underscore-y Core Foundation wrappers:

CTFrameRef frame = CTFrameCreate…
self.frame = CFBridgingRelease(frame);
/* Now your Create-rule-trained brain can rest easy, because there’s a balancing Release. */

Macros

But I find the casts clutter up my code, and CF memory management is not bad in small doses, so I used to use macros:

#define $ID (__bridge id)
#define $CF (__bridge void *)

The $CF macro exploits C’s willingless to coerce void * the way $ID expresses id’s willingness to be coerced. That breaks down under Obj-C++, because C++ is not so willing to coerce, so you end up doing something like this instead:

#define $CF(var, obj) lval = ((__bridge __typeof__((var)))(obj))

This ends up working OK, because you tend to assign to a variable of the Core Foundation type, make the cast once there, and then use that CF-typed var throughout the next bit of code:

CTFrameRef frame = $CF(frame, self.frame);
/* do something with |frame| */

Just Use attribute((NSObject)) Already!

But I could have saved myself all that mess had I just used __attribute__((NSObject)). Aren’t attributes a wonderful thing?

//cc -g -c -Weverything -Wno-objc-missing-property-synthesis attribute_nsobject.m
/* @file attribute_nsobject.m
 * @author Jeremy W. Sherman
 * @date 2013-01-29
 *
 * Demonstrates the wondrous simplicity of `__attribute__((NSObject))`.
 */
#import <Foundation/Foundation.h>
#import <CoreText/CoreText.h>

@interface MyClass : NSObject
@property(strong, nonatomic) __attribute__((NSObject)) CTFrameRef frame;
@end

/* Look ma, no casts! */
@implementation MyClass
- (void)storeFrame
{
    CTFrameRef frame = NULL;
    self.frame = frame;
}

- (void)loadFrame
{
    CTFrameRef frame __unused = self.frame;
}
@end

ETA: Version Concerns

Justin Spahr-Summers points out via Twitter that this story used not to have such a happy ending. Some member of the clang/objc/ARC juggling act used to fail to retain nonatomic properties. The short tale is documented in a Stack Overflow thread and has been reported as rdar://problem/11040306.

Good news: As of Xcode 4.6 and OS X 10.8.2 (which are what I have on hand to test with), the issue seems to be fixed. The compiler generates a call to objc_setProperty_nonatomic which will objc_retain the new value as expected.

The _nonatomic variant doesn’t seem to exist in my copy of 10.7.1’s objc4-493.9, so from where I’m sitting, this looks to have been fixed in part by an SPI change.

This appears to be an undocumented change affecting only Apple clang as of this time. It also seems that the fix will only work for this property-focused usage pattern; if you need a generic instruction to the compiler to use full ARC semantics for pointers of a certain type, you’ll still have to create a typedef to attach the type info to.

The ARC reference documentation continues to specify that only typedefs can be annotated to create a retainable object pointer type, and the open-source version of clang (as of r173899) still tests for this, and, per the implementation in lib/AST/Type.cpp of Type::isObjCNSObjectType(), this still seems to be the case:

bool Type::isObjCNSObjectType() const {
  if (const TypedefType *typedefType = dyn_cast<TypedefType>(this))
    return typedefType->getDecl()->hasAttr<ObjCNSObjectAttr>();
  return false;
}
bool Type::isObjCRetainableType() const {
  return isObjCObjectPointerType() ||
         isBlockPointerType() ||
         isObjCNSObjectType();
}

The Otherwise Operator

Mon, 21 Jan 2013 00:00:00 +0000

GNU C adds a binary operator ?:. Use it to fall back to a default value when a nil check fails:

id target = [self.delegate target] ?: [self.class defaultTarget];

The GCC docs present the binary ?: operator as eliding a repeated first term when using the ternary conditional operator, so

x ? x : y

can now be written

x ?   : y

and have the same effect as the full form, save that the first term, x, is only evaluated once.

From this point of view, the binary ?: exists to avoid unwanted side effects:

int z = (x++) ? (x++) : y;  // bad news
int w = (x++) ?       : y;  // OK!

But thinking of ?: as a special-purpose variant of the ternary operator misses its true calling: cleaning up nil and NULL checks. It compacts several lines of code:

id target = [self.delegate target];
if (!target) {
    target = [self.class defaultTarget];
}

down to a one-liner:

id target = [self.delegate target] ?: [self.class defaultTarget];

So: The ”otherwise” – or ”if nil then” – operator: ?:. Use it.

Queue-Specific Data

Sat, 19 Jan 2013 00:00:00 +0000

GCD has a queue-specific storage API accessed using dispatch_queue_{set,get}_specific. This replaces the thread-specific storage provided by pthread_{set,get}_specific that you cannot use with GCD blocks:

static void *sQueueKey_Client = “client”; struct my_client *client = calloc(1, sizeof(*client)); client = (struct my_client){ .val = 1 }; / use the unique static address as the key, * not the address of the string itself */ dispatch_queue_set_specific(q, &sQueueKey_Client, client, free); dispatch_async(q, ^{ struct my_client *client = dispatch_queue_get_specific(q, &sQueueKey_Client); DoStuffWith(client); });

Only there’s one new addition to the family: dispatch_get_specific looks up the value in the current context defined by the current queue. This context is broader than the single queue that dispatch_queue_get_specific will search. If a key is not set on the current queue, it will check that queue’s target queue. If it’s not found on that queue, it will move down the line to that queue’s target queue:

dispatch_queue_t io_q = dispatch_queue_create("client_io_queue", 0);
dispatch_set_target_queue(io_q, q);
dispatch_async(io_q, ^{
    /* This will check the current queue (io_q), fail to find
     * the key, then check the target queue (q) and find it. */
    struct my_client *client = dispatch_get_specific(&sQueueKey_Client);
    SendMessage(client);
});

Queue-specific value lookup sounds a lot like chasing the prototype chain in a prototypal object system like JavaScript. In Obj-C, it echoes how method implementation search runs up the inheritance chain to find an implementation for a given message.

It turns out you can abuse this to transform dispatch queue value lookup into the heart of a prototypal object system embedded within Objective-C – where it’s not terribly useful, because Obj-C already has its own object system – or C, where it could be an improvement over hand-writing OOP in C.

I wrote a small, ugly demo of this. It’s available from GitHub as jeremy-w/demo-draft. As it stands, it’s certainly not an improvement over hand-written C OOP, but it did prove an interesting exercise.

Using debugDescription with GCD and XPC objects

Tue, 08 Jan 2013 00:00:00 +0000

dispatch_debug and xpc_copy_description are inconvenient, particularly during impromptu debugging.

Mountain Lion’s Obj-C-ification of GCD and XPC objects lets you use your comfortable Obj-C tools:

NSLog with %@,
the debugDescription method, and
po obj while in the debugger.

dispatch_debug

If you first learned GCD back before Mountain Lion, you might have played around with the dispatch_debug function:

void
dispatch_debug(dispatch_object_t object, const char *message, ...);

This function is the NSLog of Grand Central Dispatch land. If you need to pin down what exactly is going on with a complex network of dispatch objects, this can be a useful tool, especially since you can use the libdispatch source code to illuminate the more cryptic debug info.

But it’s also kind of annoying: unless you remembered to set LIBDISPATCH_LOG=stderr in the environment before starting your process, you’ll have to watch the system log for your dispatch_debug output; it won’t show up in Xcode’s debug console.

Changing the value of the environment variable after startup also doesn’t seem to affect dispatch_debug's behavior, so by the time you realize you’ve forgotten to set this environment variable, it’s already too late.

xpc_copy_description

If you wanted to log information about an XPC object without leaking, you used to have to xpc_copy_description, log the string, then free the returned pointer when you’re done with it:

char *desc = xpc_copy_description(obj);
NSLog(@"%s: xpc obj %p %s", __func__, obj, desc);
free(desc);

debugDescription

Well, good news: As of Mountain Lion, GCD and XPC objects are all also NSObjects, so you can use them as the target for the %@ format specifier and as the target for the -debugDescription instance method. The latter dumps all the information you used to get from dispatch_debug.

As an example:

2013-01-08 23:53:13.451 debug[80089:707] dispatch queue: description:
<OS_dispatch_queue: com.jeremywsherman.demo[0x7f8980c07f80]>
2013-01-08 23:53:13.453 debug[80089:707] dispatch queue: debugDescription:
<OS_dispatch_queue: com.jeremywsherman.demo[0x7f8980c07f80] = {
    xrefcnt = 0x2, refcnt = 0x1, suspend_cnt = 0x0, locked = 0,
    target = com.apple.root.default-priority[0x7fff72c47d00],
    width = 0x7fffffff, running = 0x0, barrier = 0 }>

XPC objects are pretty verbose even with description, but you get a bit – sometimes quite a bit – more info if you send debugDescription:

2013-01-08 23:53:13.453 debug[80089:707] xpc connection: description:
<OS_xpc_connection: <connection: 0x7f8980e017f0> {
    name = com.jeremywsherman.conn, listener = false,
    PID = 0, EUID = 4294967295,
    EGID = 4294967295, ASID = 4294967295 }>
2013-01-08 23:53:13.454 debug[80089:707] xpc connection: debugDescription:
<OS_xpc_connection: connection[0x7f8980e017f0]: {
    refcnt = 1, xrefcnt = 2,
    name = com.jeremywsherman.conn, type = named, state = new,
    queue = 0x7f8980e00420->0x0, error = 0x0, mach = false,
    privileged = false, bssendp = 0x0, recvp = 0x0, sendp = 0x0,
    pid/euid/egid/asid = 0/4294967295/4294967295/4294967295 }
    <connection: 0x7f8980e017f0> {
    name = com.jeremywsherman.conn, listener = false,
    PID = 0, EUID = 4294967295, EGID = 4294967295, ASID = 4294967295 }>

2013-01-08 23:53:13.454 debug[80089:707] xpc bool: description:
<OS_xpc_bool: <bool: 0x7fff7244d320>: true>
2013-01-08 23:53:13.455 debug[80089:707] xpc bool: debugDescription:
<OS_xpc_bool: bool[0x7fff7244d320]: {
    refcnt = 80000000, xrefcnt = 80000000, value = true }
    <bool: 0x7fff7244d320>: true>

The odd trailer to the XPC objects’ debug descriptions is not a typo – the XPC objects really do include their regular description as a component of their debug description.

print-object (po)

debugDescription also happens to be what gets printed when you print-object (or po for short) an object while debugging.

Treating a GCD/XPC object as a regular Objective-C object is particularly handy during impromptu debugging, since you no longer need to futz about with dispatch_debug and xpc_copy_description.

Instead, just use po obj when debugging:

% lldb ./debug
(lldb) Current executable set to './debug' (x86_64).
b debug.m:34
breakpoint set --file 'debug.m' --line 34
Breakpoint created: 1: file ='debug.m', line = 34, locations = 1
(lldb) r
Process 80337 launched: '/Users/jeremy/Documents/Blog/GCDTips/debug' (x86_64)
Process 80337 stopped
* thread #1: tid = 0x1c03, 0x0000000100000db7 debug`main + 135
  at debug.m:34, stop reason = breakpoint 1.1
    frame #0: 0x0000000100000db7 debug`main + 135 at debug.m:34
   31           Log(@"xpc connection", conn);
   32
   33           xpc_object_t pred = xpc_bool_create(true);
-> 34           Log(@"xpc bool", pred);
   35       }
   36       return 0;
   37   }
(lldb) fr var
(dispatch_queue_t) q = 0x0000000100107fa0
(xpc_connection_t) conn = 0x0000000100400830
(xpc_object_t) pred = 0x00007fff7244d320
(lldb) po q
(dispatch_queue_t) $0 = 0x0000000100107fa0 <OS_dispatch_queue:
com.jeremywsherman.demo[0x100107fa0] = { xrefcnt = 0x1, refcnt = 0x2,
suspend_cnt = 0x0, locked = 0, target =
com.apple.root.default-priority[0x7fff72c47d00], width = 0x7fffffff,
running = 0x0, barrier = 0 }>
(lldb) po conn
(xpc_connection_t) $1 = 0x0000000100400830 <OS_xpc_connection:
connection[0x100400830]: { refcnt = 1, xrefcnt = 1,
name = com.jeremywsherman.conn, type = named, state = new,
queue = 0x100400530->0x0, error = 0x0, mach = false, privileged = false,
bssendp = 0x0, recvp = 0x0, sendp = 0x0,
pid/euid/egid/asid = 0/4294967295/4294967295/4294967295 }
<connection: 0x100400830> { name = com.jeremywsherman.conn,
listener = false, PID = 0, EUID = 4294967295, EGID = 4294967295,
ASID = 4294967295 }>
(lldb) po pred
(xpc_object_t) $2 = 0x00007fff7244d320 <OS_xpc_bool: bool[0x7fff7244d320]:
{ refcnt = 80000000, xrefcnt = 80000000, value = true } <bool:
0x7fff7244d320>: true>

Introducing OS Object

Tue, 01 Jan 2013 00:00:00 +0000

GCD came along with 10.6 and made concurrent programming easy. ARC came along with 10.7 and let us mostly forget about this whole refcounting business.

But 10.7’s GCD was left behind in manual retain-release land. (XPC was too, but GCD is our hero this time.) 10.8 fixed that oversight via a clever hack hidden away in <os/object.h>.

Behold, I bring you an object!

The magic happens in the interaction between two macros, OS_OBJECT_DECL and OS_OBJECT_DECL_SUBCLASS.

OS_OBJECT_DECL is used to declare the base object type of your refcounted C library. It conceptually creates a new root class:

OS_OBJECT_DECL(dispatch_object);

Once you’ve declared a root class using OS_OBJECT_DECL, you use OS_OBJECT_DECL_SUBCLASS to declare new subclasses:

OS_OBJECT_DECL_SUBCLASS(dispatch_queue, dispatch_object);
OS_OBJECT_DECL_SUBCLASS(dispatch_source, dispatch_object);

And magically, you now have types dispatch_object_t, dispatch_queue_t, and dispatch_source_t.

Presto-Changeo

As far as casts are concerned, these new types behave just like NSObject, NSString, and NSNumber. If you declare variables like so:

NSObject *o;
NSNumber *n;
NSString *s;

The compiler will allow you to implicitly upcast without complaint:

/* hunky dory */
o = n;
o = s;

but not down or crosswise:

dispatch_cast.m:13:4: warning: incompatible pointer types
assigning to 'NSNumber *__strong' from 'NSObject *__strong'
    [-Wincompatible-pointer-types]
        n = o;
          ^ ~
dispatch_cast.m:14:4: warning: incompatible pointer types
assigning to 'NSNumber *__strong' from 'NSString *__strong'
    [-Wincompatible-pointer-types]
        n = s;
          ^ ~

Similarly, with these declarations:

dispatch_object_t o;
dispatch_queue_t q;
dispatch_source_t s;

This is fine:

/* hunky dory */
o = q;
o = s;

But this is not:

q = o;
q = s;

The error messages hint at how this is implemented:

dispatch_cast.m:27:4: warning: incompatible pointer types
assigning to '__strong dispatch_queue_t'
(aka 'NSObject<OS_dispatch_queue> *__strong')
from '__strong dispatch_object_t'
(aka 'NSObject<OS_dispatch_object> *__strong')
    [-Wincompatible-pointer-types]
        q = o;
          ^ ~
dispatch_cast.m:28:4: warning: incompatible pointer types
assigning to '__strong dispatch_queue_t'
(aka 'NSObject<OS_dispatch_queue> *__strong')
from '__strong dispatch_source_t'
(aka 'NSObject<OS_dispatch_source> *__strong')
    [-Wincompatible-pointer-types]
        q = s;
          ^ ~

OSObject Is Protocols!

And that’s the trick, you see. There aren’t any classes, just protocols. Because protocols can be declared as conforming to other protocols, we have a protocol hierarchy parallel to our class hierarchy. By using a protocol-qualified type – NSObject<OS_dispatch_queue> * meaning, “Any NSObject, so long as it conforms to OS_dispatch_queue” – we can make our hierarchy concrete in terms of which OS objects can be pointed at by which pointers.

Why NSObject and not id? Because ARC needs to be able to use retain/release/autorelease, and NSObject provides a convenient declaration of those and other methods.

Or Is It?

Of course, there would have to be more to this OSObject thing than just protocols for ARC to work: whatever type-level hackery you might perpetrate, the message send [pointer retain] is only going to work if the whole Objective-C message send machinery can use what’s at *pointer as an Obj-C object.

Consequently, things look a lot different from inside libdispatch. There are covert class interfaces and corresponding implementations that go along with the public protocols.

A shame you can’t just sprinkle a few macros over a C library that uses refcounting and have it work automagically with ARC. Now, there’s a thought…

Reconsidering +new

Sat, 08 Sep 2012 00:00:00 +0000

Received wisdom teaches that Objective-C’s alloc-init two-step is important for both clarity and extensibility. And even if those two reasons don’t sway you, it’s both childish and déclassé to continue using +new past a certain programmer-age.

Or is it?

Yes, creating a new object requires allocating its storage and then initializing it. But they’re not really distinct any more. It’s not like we do:

id obj = [Foo alloc];
if (!obj) error("allocation failed");
obj = [obj init];
if (!obj) error("init failed");

And zones are dead, so separating alloc and init so you can do:

id obj = [[Foo allocWithZone:fooZone] init];

doesn’t really matter any more, either.

And it breaks down even further when you look at Core Foundation analogs. There is no CFAlloc() followed by CFArrayInit(). Core Foundation just has Create methods that take an allocator to handle the “different zones” concern. Normally, you just pass NULL or kCFAllocatorDefault for the allocator argument, but either allocators have better support than zones at this time, or Apple just doesn’t care enough to write “don’t use allocators any more” anywhere.

Since these are equivalent:

CFMutableArrayRef array = CFArrayCreateMutable(
    NULL, 0, &kCFTypeArrayCallbacks);
NSMutableArray *array = [[NSMutableArray alloc] init];

I see no reason not to just do a “single call” alloc-init in Foundation-land, too:

NSMutableArray *array = [NSMutableArray new];

This for the common case. When you need to pass args in during construction, back to alloc-initWith… it is!

Aside: As a practical motivation for +new, when I’m throwing together a quick commandline program to see whether something behaves one way or another, [Blah new] types a lot faster than [[Blah alloc] init], particularly if I forget the double-bracket at the start and have to back up and fix it.

Aside 2: Many of the Foundation types let you get away with a compromise, like [NSMutableArray array]. In ARC-land, this is effectively no different than writing [NSMutableArray new] – what if you later need an arrayWithObjects:! what if you later need to allocate it in a different zone! –, but I never see anyone inveighing against -array, or -string, or -dictionary. So.

Migrating to Obj-C Literals

Mon, 27 Aug 2012 00:00:00 +0000

Obj-C literals make your code cleaner and more compact, but hand-updating a large codebase to take advantage of Obj-C literals would be a bore, and all too easy to mess up during a distracted moment.

This is what automated refactoring tools were designed for. And Apple has provided us with an oft-overlooked arrow in our devtools quiver that’s just what we need here: tops.

Check out man tops. The tool has a decent understanding of Obj-C syntax and accepts scripts that let you rewrite code to use new method calls, new functions, and what-have-you. The examples make it look like this tool was invented to ease the transition from NeXT-style Obj-C to Cocoa, like this gem:

replace "NXGetNamedObject(<b args>)" with same
    error "ApplicationConversion: NXGetNamedObject() is obsolete.
           Replace with nib file outlets."

That should take some of you way back.

Anyway, with this tool, modernizing your code can be as simple as:

tops -semiverbose -scriptfile literals.tops **/*.(h|m|hpp|mm)

Want to check that it will do the right thing? Throw -dont into the args.

Want to watch over its shoulders as it rewrites your code? Replace -semiverbose with straight-up -verbose.

Now, for that magical script file:

And here’s an Obj-C file to test it against:

Prefix Notation

Sat, 04 Aug 2012 00:00:00 +0000

I don’t believe your spoken language syntax order has any bearing on whether you find Lisp prefix notation hard.

Consider thesis 2 of Tim Bray’s “Eleven Theses on Clojure”:

In school, we all learn 3 + 4 = 7 and then sin(π/2) = 1 and then many of us speak languages with infix verbs. So Lisp is fighting uphill.

I call bunkum.

I don’t want to single out Tim Bray here. I’ve seen this other places before. It’s a popular folk explanation. But his is the straw that broke the camel’s back.

Folks often reach for natural language or arithmetic notation to explain why Lisp prefix notation is golly gee so hard. The argument goes like this:

I speak a language with syntactic order subject-verb-object, or SVO for short, which is like infix notation.
I write arithmetic using infix notation.
Lisp uses prefix notation, which is not infix notation.
Therefore, Lisp is hard to read.

The heart of the argument is a mismatch between spoken language sentence order and Lisp syntactic form order makes reading Lisp hard.

But if we remove the mismatch, does Lisp get any easier? Let’s see:

I speak a language with syntactic order verb-subject-object, which is like prefix notation.
Lisp uses prefix notation, which is the same as my language’s!
Therefore, Lisp is:
- Hard?
- Easy?
- Error: Invalid Deduction?

Does speaking Gaelic condemn you to unassuageable puzzlement at infix notation?

Does speaking German grant a supernatural facility with postfix notation?

Are Finns left out in the cold, waiting for an appropriately agglutinative programming language?

Are French speakers flocking to Linotte?

I think not.

Visualizing Consequences

Sun, 29 Jul 2012 00:00:00 +0000

The ability to visualize the consequences of the actions under consideration is crucial to becoming an expert programmer, just as it is in any synthetic, creative activity. (SICP 1.2)

There’s a larval Big Nerd Ranch reading group, and it has me reading through Ye Olde Wizard Book, Structure and Interpretation of Computer Programs. I’m pretty early yet in the text, and just happened upon the quote you find up top there starting this post.

You could blow by this pretty fast on your way to some deep wizardry.

Don’t.

This ability to stare deep into and through a line of code and watch the clockwork wheels spin is key to mastering the craft of programming. It’s what separates those who understand what their code is doing from those who continue to view the operations of their compiler or interpreter as a mystery concealed behind a veil impenetrable by mortal eyne.

Let me give it to you straight: There ain’t no deep black magic here. There is in fact nothing more quotidian than the process that takes a line of code and translates through layer upon layer of lengthy and tedious documentation into something that ultimately can be executed by the materia technologica sitting there upon your desk. Or your lap. Or held in the palm of your hand. Form factor changes; number and names of layers change; ultimate lack of magic does not.

Don’t lie to yourself that all that happens between make and ./a.out is impenetrable. It’s all there, waiting for you. It’s a long and well-trod path. Don’t turn away from it: put one foot in front of the other, work your way down one more layer of abstraction, and start to see how the sausage is made.

Behavioral Programming

Mon, 23 Jul 2012 00:00:00 +0000

I recently read an exciting article on by Harel et al. on behavioral programming.

The basic idea of behavior programming is to compose a bunch of simultaneously executing state machines. Each machine represents a behavior.

But you don’t use the plain event in/event out state chart to define these machines. Instead, you add modal operators to specify what must/may/mustn’t happen next:

must: Please carry out this action.
may: Please let me know when this happens.
mustn’t: No matter who tells you to do this, don’t; if it does happen, abend.

These operators represent a synchronization point between behaviors. Once all behavioral threads, or “bthreads” for short, have blocked expressing a modal preference, the executive picks a must-event that’s not blocked by a mustn’t operator, carries out the event, and notifies any bthread that had that event listed as a must or may event. Execution then continues till every bthread blocks again by specifying a modal operator.

I like this model because it’s a thin but powerful layer over existing models. You can implement it with pthreads and synchronous pub/sub, which could be as simple as a group of pthread conditions.

The development approach you end up with differs markedly from the model you build your bthread framework in.

Here’s the powerful part of the bthread layer: Decomposing your problem into bthreads frees you up to pick a set of coordinating events, then start coding scenarios around those events. Fire up your program, see how it does, and then fix any bugs that testing/simulation/model checking shows up and go again.

The state machine ness also lets you react to entire event traces in order to handle things like a “win rather than defend” strategy in tic-tac-toe, which is one of the basic examples given in the article.

There’s room for plenty of cleverness in how the executive selects the next event and how to test and check bthread programs, but the core idea is elegant and exciting. The full article is worth a read.

David Harel, Assaf Marron, Gera Weiss. [Behavioral Programming.][bprog] *Communications of the ACM,* Vol. 55 No. 7, Pages 90-100. doi 10.1145/2209249.2209270. . Retrieved 2012-07-23.

What's this about @import?

Wed, 29 Feb 2012 00:00:00 +0000

So today in the company IRC channel my illustrious colleague Mark Dalrymple (of Advanced Mac OS X Programming fame) mentioned this new-fangled @import compiler directive. News of this compiler directive appears to be spreading through the Objective-C developer community mostly by way of Twitter-pigeon.

As it happened, I had not heard of @import. But then the inimitable Mikey Ward (alias: Wookiee) asked me about it. Two persons independently inquiring? Now I had to look into it.

It appears modules are filtering into Objective-C by way of C++, the same way Objective-C is rumored to be inheriting you-pick-the-base-type enums from C++TNG. Only this time the feature isn’t part of any standard.

I get this idea from an exchange on the cfe-dev mailing list in late December, which I have condensed into a single apocryphal message:

If you check recent (the last 6 months or so) commits to clang by Doug Gregor, you’ll find some work to implement C++ modules is already underway.

I’m not sure how much it’s based on any specific proposal.

To misquote Doug [Gregor] (can’t find the email, I think it might’ve been on IRC): “The semantics are obvious enough, so I’m implementing those. After that we can haggle over the syntax”

(In case it’s driving you crazy, “cfe” is short for “c/clang front-end”, which is all the clang tool you use from the commandline is: a driver for a whole mess of surprisingly unmessy library code.)

At this point, Doug was kind enough to chime in:

Most of the work I’m doing is in three places. The Serialization module, which takes care of serializing/deserializing an already-parsed AST, is the hardest part: it’s the infrastructure that allows one to compile a module on its own, storing the serialized AST to disk, and then load that module into another translation unit later on. This part is likely to be the same regardless of how modules behave. [Clang will produce and cache module AST files on the fly. Authors and build systems will remain ignorant of these AST files.]

The module map part of the Lex module handles the mapping between headers and modules. It’s mainly a transitional a little sub-language that allows one to describe the relationships between headers (which are used everywhere today) and modules.

The easy part is the parsing of module imports, labeling what is exported/hidden, and name-lookup semantics. It’s also the part that people will want to discuss endlessly, so for now the various keywords are uglified so that we don’t commit to any one syntax.

Once you chase on down through the code, you find yourself at the abstract syntax tree level staring at the ImportDecl class. What does it do? Well,

[it] describes a module import declaration, which makes the contents of the named module visible in the current translation unit. An import declaration imports the named module (or submodule). For example: @import std.vector; Import declarations can also be implicitly generated from #include/#import directives.

That’s right: #include/#import are going to become legacy syntax for this nifty new modules system. Then, instead of playing “find the header that includes the symbol you want to use”, you will be able to basically just import the functionality directly.

The actual mapping from module name to file is handled by the ModuleLoader. Judging by the tests, there’s going to be a way to explicitly manage this mapping using module stanzas in a ModuleMap:

module category_left {
  header "category_right.h"
  export category_top
}

You also get another way to manage symbol visibility via export directives in the module stanzas, as you can see there.

Possibly the awesomest future visibility control is that over those pesky preprocessor macros. That’s right: the proposed syntax is

#define MODULE_H_MACRO 1
#define MODULE_H_PUBMACRO 2
#__private_macro MODULE_H_MACRO
#__public_macro MODULE_H_PUBMACRO

The private/public macro preprocessor directives update the visibility of the named macro. If you have multiple macros, you have to issue multiple directives – there’s no support for privatizing multiple macros something like:

#__private_macro XYZZY PLUGH PLOVER  /* <-- THIS DOES NOT WORK */

I haven’t the faintest notion when we’ll see this live and slaving away under Xcode, but I am looking forward to the coming sleeker, faster import process.

References galore, from top to bottom:

Homepage of the fellow behind much of this: Doug Gregor.
Commit switching from __import_module__ to @import syntax.
Declaration of the modules language option as a C (not C++) extension.
Preprocessor lexer support. Search for LexAfterModuleImport.
Parsing support for the syntax. Search for ParseModuleImport.
Semantic implementation. Search for ActOnModuleImport.
ModuleLoader declaration.
Tests exercising the upcoming modules extension.

Memory Allocation in `sam`

Tue, 28 Feb 2012 00:00:00 +0000

Bad news, bucko. Malloc is merely “adequate.” And it’s only adequate if you’re writing simple programs. Real programmers write their own memory manager. It’s the first thing they do after they ditch their shaving kit and start growing their Samson neckbeard.

Don’t believe me? Listen up:

The C language has no memory allocation primitives, although a standard library routine, malloc, provides adequate service for simple programs. For specific uses, however, it can be better to write a custom allocator.

Rob PikeThe Text Editor sam (1987)

Sam was just some text editor from the 80s, and its memory management was way more rocking than your Twitter client’s will ever be.

Sam memory management was so rocking that it filled two arenas. That’s right: two arenas. Your memory management needs are insignificant, puny, and plebeian, serviced adequately by the C standard library. Sam got true rockstar treatment: two arenas; two custom allocators; high maintenance, premium memory management.

The first arena holds staid structs of fixed length. It’s filled first-fit. Nothing magic there.

The second arena holds variably sized objects like strings. In an editor, strings are always changing, growing, splitting, combining. A regular bunch of problem children. So it’s managed by a garbage-compacting allocator.

These arenas are erected side-by-side in memory, with the second arena getting the higher addresses. When the first-fit arena needs more space, it just bumps the compacting arena up in memory.

The real magic is how these two arenas are used together. Take for example a variable-length array. Sam handles this by creating a struct with a length and a pointer. The struct is allocated in the struct arena of course, but its pointer points into the compacting arena. The allocator knew to go back and rewrite the struct pointer whenever it moved its memory during compaction, and the programmer knew (or learned really fast the hard way) to always use the struct’s pointer field directly each time rather than caching it away somewhere.

Now that’s some pretty boss hacking: elegant, but at what many today might consider an advanced, “don’t go there without a friend” low level.

P.S.: I would encourage you to check out sam's source to see how it’s done, but yesterday’s arenas are no more. The current source just calls malloc once for each allocation.

`sam`'s Structural Regular Expressions

Mon, 27 Feb 2012 00:00:00 +0000

This is the first post in a series of quotes from papers. These are the great turns of phrase, the intriguing idea I’ve run into nowhere else, the start of something that could have been great but probably fizzled.

First up: structural regular expressions, as introduced in the GUI text editor sam

In other UNIX programs, regular expressions are used only for selection, as in the sam g command, never for extraction as in the x or y command. For example, patterns in awk are used to select lines to be operated on, but cannot be used to describe the format of the input text, or to handle newline-free text. The use of regular expressions to describe the structure of a piece of text rather than its contents, as in the x command, has been given a name: structural regular expressions. When they are composed, as in the above example, they are pleasantly expressive. Their use is discussed at greater length elsewhere.

Rob PikeThe Text Editor sam (1987)

x extracts every chunk of text matching the regex provided to it. Each chunk has the rest of the editing pipeline run on it. Want to change every n in a hunk of text to an m? Select it all in the window with button 1, focus the sam command window with button 1, and type in:

x/n/ c/m/

Hit return and this command pipeline runs on the (implicit range) dot, also known as “the current selection.” x grabs an n, c then changes it to an m. You can layer on more commands, including g (guard) as a by-the-way if statement. The command text stays in the command window in case you want to run it again.

Boom! Instant macro, no memorization of registers required. Take that, vim qX…q @X.

This search for symbiosis between mouse and keyboard is what led to sam. Most UNIX editors bolt mouse input onto an established keyboard-centric paradigm. Sam rethinks editing to make the mouse an integral part of it. (Acme would later take this mouse integration to new heights. We’ll get to acme in time.)

Back to structural regular expressions now. Pike has a whole paper on the topic that I will doubtless get to eventually. But just this little bit is tantalizing enough.

I mean, think about awk, think about how you use regular expressions there, or how you use them in your editor du jour. Are these tools really making the most of regular expressions? awk and friends just perform record splitting on a set of separator characters. Imagine how limited your regexes would be if all you got to do was specify what to stick between two character class braces: [your characters here]{1,}. That’s all you get with this simple record separator construction.

And it’s not like we’ve made great strides: Search-and-replace in an IDE like Xcode or Eclipse gives you even less expressiveness.

I look forward to reading more about structural regular expressions in future. For more on sam, see:

Rob Pike’s paper introducing the editor. (Also available in PDF.)
sam at cat-v

Pasting HTML into Markdown

Wed, 08 Feb 2012 00:00:00 +0000

So I was writing a comment on Reddit today, and the easiest way to answer was to quote a list of search results, links and all. One problem: Reddit’s post interface uses Markdown, not HTML. That’s swell when you’re writing your comment fresh, but if you want to paste something in from a webpage, it’s no good.

I’d run into this once or twice before, but I always took the simple way out: just rewrite the one or two links in the text by hand. No big deal.

But these search results were just a list of links. And as a programmer, I am vocationally virtuously lazy.

That’s when I remembered Pandoc.

Pandoc is a tool for converting between markup languages. I grabbed it as a Swiss army knife alternative to the more questionable Markdown formatters out there. (Markdown’s reference implementation is in Perl. I have trouble regarding any Perl as anything but a fragile hack.) I actually used it the first time as part of avoiding wordprocessors: instead of emailing a PDF/Word doc/Pages doc (in order of increasing uckiness), I just write up a Markdown doc, format it into a standalone webpage, slap in some CSS, and email it off.

Veering back on course, I recalled it could be used not only from Markdown to HTML but from HTML to Markdown. And how to get pasted text from the browser into HTML? I didn’t want to muck with View Source, so just opened up TextEdit and let its erstwhile annoying habit of preserving pasted formatting work to my benefit. Copy from Aurora, paste into TextEdit, save as HTML, then pandoc -f html -t markdown foo.html | pbcopy, back to Aurora, and paste, and beautiful Markdown appears.

Long story short: Use pandoc to convert HTML into Markdown for your Reddit or Stackoverflow or blogging needs.

The Artful Edit

Mon, 06 Feb 2012 00:00:00 +0000

Susan Bell’s The Artful Edit offers a brief but thorough introduction to editing your own writing.

The book is structured around three steps: gaining perspective, macro-editing, and micro-editing. Macro-editing addresses the structure of the work. It requires elucidating then shaping that structure and the characters and themes that build it. Micro-editing examines word choice, continuity, and other concerns at the level of the individual paragraph, sentence, and even word.

Each practical chapter ends with a bulleted summary and exercises. The summary frees you to focus on reading the book. Without it, you’d regularly interrupt your reading to scratch down notes. The exercises give concrete practices to improve your editing.

After each chapter comes an interlude wherein various authors reflect on writing and editing. These leaven the book’s didactic tone, but all are forgettable save the last, Michael Ondaatje’s “One Doesn’t Just Write a Book, One Makes a Book.”

The chapter on gaining perspective covers the usual approaches – bury the work to revisit later – and some unusual approaches – string your work across your study, step back till it’s just squiggles, and examine its topography.

The macro-editing chapter seamlessly blends literary criticism with instruction in the structural elements of writing. Before-and-after passages from The Great Gatsby demonstrate each element, while excerpts from letters between F. Scott Fitzgerald and his editor Max Perkins illustrate the editing process.

The micro-editing chapter tries to maintain the style of the macro-editing chapter but fails. It drags, and I was glad to move on.

The last two chapters turn from the mechanisms of editing to its variety and historical background.

Second to last is a chapter of interviews with authors and artists about their editing process. The story of Walter Murch editing the film The Conversation stands out. The director demanded a refrain repeat exactly the same throughout the film. He threw out one take because the actor’s accentuation differed. Murch decided against the director’s instructions to cut this take back in over the last seconds of the film. The different word stress recontextualized the refrain and so the film. The other interviews reinforce the variety of approaches to writing and editing, but none stays with you the way Murch’s will.

The last chapter recaps the role of the editor since ancient Rome. It ends with Robin Robertson editing Adam Thorpe’s Ulverton. The history entertained; the story behind Ulverton grabbed me. The men developed unconvential ways of editing this intricate work, including extensive color-coded diagrams tracking leitmotif, themes, and lineages across fictional centuries. Their cooperation parallels Fitzgerald and Perkins’, bringing the book back to where it started: the necessary pleasure of editing.

Enjoy a simpler ifMUD experience with this tintin++ script

Sat, 28 Jan 2012 00:00:00 +0000

I have posted a tintin++ script to simplify connecting to and interacting with ifMUD at https://gist.github.com/1692854. To use it, just chmod u+x tt-ifmud and execute the file from a terminal.

Script features:

Highlights dis/connects, exit directions, and start of item/player/exit list.
Highlights AFK and new message announcements.
Simplifies going AFK: just afk msg or away msg, and it will @away and zone you.
Defines the standard i and inv abbreviations for inventory.

Why tintin++? Process of elimination!

tinyfugue’s website had no useful information and was even more outdated than tintin++'s.
Savitar was shareware, and it’s looking long in the tooth these days.
The Atlantis interface was awkward, and I had trouble wrangling it’s flexible but underdocumented scripting interface to do what I want.
tintin++‘s interface is simple enough that you can figure out how to do pretty much what you want by brute force.

Getting tintin++:

Mac OS X: brew install tintin (Fink and MacPorts users, you’re on your own.)
Other platforms: Likewise on your own.

I hope this saves some other ifMUD newbie some time. Those of you who haven’t checked out ifMUD, check it out!

Amazon-enhancing your library

Sun, 30 Oct 2011 16:00:17 +0000

You have a great Amazon wishlist. But Amazon books cost money, and you can’t read them till someone delivers them. Lame. I demand free and instant gratification!

Enter the library! But if your library website is anything like mine, the online browsing experience is nowhere near as awesome as Amazon’s. And who wants to laboriously search each and every title they’ve wished for on Amazon? There’s gotta be a better way to figure out which of your Amazon books you can get for free.

Enter Ruby! With a little prep work, you can fetch all the titles from Amazon, run them through your library’s search interface, and get some useful information:

Fetching Amazon wishlist… 39 titles found. Searching library for 39 titles… Found 9 titles with results: - The Bicycling Guide to Complete Bicycle Maintenance & Repair: For Road & Mountain Bikes (2 results) - The Weed Tree (2 results) - Broken Bells (9 results) - Capacity (24 results) - Civilian (27 results) - Cults (38 results) - High Society (54 results) - A History of Reading (89 results) - Mirror (373 results)

I started with a wishlist of 39 items and winnowed it down to just 9 that the library might have. The frustrating part has been done for me: checking book after book only to find the library has absolutely nothing for 30 of them. Sure, I still have to check whether Enon’s High Society was actually among the 54 results for that search, but at least I know there are results. More hacking time could undoubtedly improve the false positive rate and add a “request this for me” feature, but as-is, this quick hack will pay off over and over.

Grab the source to “dekazon” (Dekalb county library district + Amazon) and use it as a starting point to hack something similar up for your library/college/home. Bringing datasets together is a beautiful thing.

Absolute and relative paths

Mon, 26 Sep 2011 06:00:38 +0000

Summary:

A path is a list of components (directory or file names) separated by a slash (/).
An absolute path starts from the root directory and works its way down: /A/B/file.
A relative path starts from some contextually-determined parent directory and works its way down from there: A/B/file.
In a terminal context, the default parent directory for a relative path is the present working directory, which you can print using the pwd command.

A path is how you refer to a file or directory. A path like /A/B/file is a concise description of how to find a file:

start at the system root directory /,
move to the A directory,
next move to the B directory,
and end with file.

As you can see, a path is just a list of components (A, B, file) separated by slashes (/).

/A/B/file is an absolute path. An absolute path is in truth relative to the filesystem root directory /. No matter the context, no matter what terminal you paste this path into, it will always specify the same path and so the same file.

A/B/file is a relative path. It tells you to move down two directories and end with the file named “file”. It doesn’t tell you how to get to two directories above file: a relative path relies on context to provide its starting point. When you pair it with that starting point, you resolve it into an absolute path.

The terminal understands relative paths as relative to its present working directory, just as a Finder window provides a context to situate the filenames displayed in it. When you double-click the Downloads folder in a window showing the contents of the /Users/Me folder, the Finder understands that you want to view the contents of the /Users/Me/Downloads folder. Likewise, if your working directory is /Users/Me/, then a command to ls Downloads/ will be understood by starting with the absolute path to your present working directory /Users/Me/ and appending the relative path Downloads/ to end up with /Users/Me/Downloads/. The command is carried out as if you had typed ls /Users/Me/Downloads/, but you just saved yourself some typing by relying on the context provided by the present working directory.

Use an absolute path when you want to refer to the same file or directory in the same location regardless of the current working directory. This is useful when you want to reference the same path from multiple contexts. The meaning of an absolute path does not change with the circumstances: it is fixed no matter which user is logged in, no matter what your present working directory is, no matter what computer you are working with.

Use a relative path when you want to refer to a file relative to some folder. You’ll see this a lot in step-by-step instructions that rely on the Terminal: step 1 will have you change to some directory whose absolute path is unknown to the author (“After unzipping the file, change to the unpacked directory…"), and step 2 will have you execute some command relative to that path ("…and make the awesome-script file executable by running chmod u+x awesome-script”).

Browsing files from the commandline

Fri, 09 Sep 2011 22:01:18 +0000

Open a folder in Finder. Double-click on it, and you get a window. Look at that window’s title bar. In the middle of the title bar, it says maybe, “Documents”. Where’s that? Maybe you switch to column view and arrow left for a while, or maybe you hit Cmd-Up to go up a few folders, or command-click the file icon in the title bar and look at the drop-down. After clicking and tapping and staring for a bit, you finally figure it out: I’m in the Documents folder in my user directory on the hard drive named My Mac.

That just took way too long.

Go open Terminal.app. You’ll see a mostly empty tab with some text and a cursor, something like:

Documents$ _

That text is called a prompt, because it’s giving you a bit of information (where you’re at) and then asking you, “What next?” Pretend you are that terminal, that cursor. You are standing in the middle of the filesystem. Where are you? You could do like you did in Finder and poke around for a while and figure out the answer. But you don’t need to do that; this is the Terminal.

Instead, you can just ask directly, “Where am I?” Type in pwd and hit return to send the command, and you’ll see an answer come back with something like:

/Users/Me/Documents

Or, in words, the Documents folder in the Me folder in the Users folder on your main hard drive (which is not named, here). Or, in the usual words, so you can read left-to-right, “slash Users slash Me slash Documents”. You asked, it answered. “pwd” is actually short for “present working directory”. “Present working directory” is a more impersonal name for where you (via the terminal) are standing right now.

What’s in /Users/Me/Documents? Take a look around with ls, short for “list”. It will print out a list of files and folders (directories) in that directory. You can work with those files, or you can go into any folder you can see in there using cd (“change directory”).

Two folders are always present in each folder, and so ls omits them from its list by default. The folders have short names: “.” (read “dot”) means “the current directory (whatever it happens to be now)". “..” (read “dot-dot” or “double-dot”) means “the parent of the current directory”; if you’re in /Users/Me/Documents, then “.” refers to Documents, and “..” refers to /Users/Me.

Let’s move to “..". Type in cd .. (“change directory to ..") and you will see the prompt that comes back now reports that you are in the parent of your old directory. Use pwd to confirm this. If you keep going up, eventually you’ll hit the top, and cd .. will take you right back to that same place (or do nothing, depending on how you look at it). This place is called the root directory, and it’s named / (forward slash; slash or solidus to its friends). From the root directory, cd . and cd .. mean the same thing, and neither actually takes you anywhere different from where you started in /.

You can move back down the directory tree using cd name, where name is the name of a folder. From /, cd Users changes your current directory to /Users. From /Users, cd Me changes your current directory to /Users/Me. Move back up a level with cd .. and you’re back at /Users. Got it? Good.

Maybe you’re wondering about that “directory tree” phrase that I just slipped in. Why “tree”? It’s a tree because it has a root ("/") that then branches out. Each branch is a folder or file. Files can’t have branches; they are dead-end leaves. Folders can have branches, either more folders or files.

You might also hear terminology taken from genealogical trees instead: a folder can be a parent to several children, and it has a parent directory of its own. And if you hear talk of “moving up”, that means towards the root directory (cd ..); and “moving down” means away from the root directory, by cding into another directory (one other than . or ..).

So now you can ask, “Where am I?” with pwd. You can take a look around your present working directory using ls, which lists the files and folders in that directory. And you can change directories to go both up and down the directory tree using cd. And now browsing your files is one less thing you’ll need Finder for.

Get more done with Terminal.app

Fri, 05 Aug 2011 03:03:17 +0000

You use a Mac. You are comfortable with your Mac. But the terminal remains a foreign land. When forced into Terminal.app, you type what you’re told to, hit enter, and cross your fingers that you didn’t break anything.

You once felt the same way about using the Finder. With exposure and guidance, you will become as comfortable typing commands into the terminal as clicking around the Finder.

Why go through the pain of learning the terminal when you’re already comfortable with the Finder?

Do you want to be a power user? If you’re not using the terminal, you’re not making the most of your computer. Want to rename all your documents to start with the date they were created? Not hard from the terminal; slow as can be using the Finder.
- Do you want to be able to choose the best tool? Without the terminal, what you can do is limited to what there are graphical tools to do. There are great tools that don’t – or even can’t – have a graphical interface. You want to be able to use those tools.
- Do you want to be able to use your computer from anywhere? You can use the terminal when you can’t use the Finder. You can get terminal access to your Mac from your iPhone, from your office, from your parents’ ancient Win95 machine.
- Do you want to be comfortable using any computer? Once you can use a Mac terminal, you can use a Linux terminal. Even a Windows terminal won’t be all that strange. You can bring your commandline knowledge to bear on any platform.

Power, choice, availability, and portability. That’s why you should learn the terminal.

In the coming weeks, we’re going to throw out Finder, Aqua, and the Apple Human Interface Guidelines. We’ll hear tales of the days of tele-typewriters, time-sharing, and bearded wizards. We’ll learn their secrets, we’ll learn their language, we’ll learn their tools: We’ll learn to be wizards. (Beard optional.)

How to give Lion a new voice

Tue, 02 Aug 2011 01:10:36 +0000

Tired of your Mac’s voice? Lion introduces some great new voices.

My favorite is Virginie. It pronounces English well but gives your Mac an exotic feel. Each time I hit Build, my Xcode purrs, “Build started,” in a suave French accent.

To change you Mac’s voice:

Open System Preferences.
Click on the Speech preference pane, the one with the microphone icon. It should be on the right-hand side of the System section.
Click on the Text to Speech tab.
Click the drop-down next to System Voice and pick “Customize….”

Now, try out all the voices and pick your favorites. Once Software Update finishes downloading them, head back to the preference pane to mark your favorite as Lion’s new voice.

Once your new voices are installed, you face a tough decision: Which voice should be the Voice of Lion? To make your final decision, test each voice with a phrase like, “Hello, how are you?” Pick the voice from the drop-down, then switch to Terminal and use the say command to have your Mac speak the phrase.

This is how I ended up with a Lion that speaks French. Miaou.