Distributed Programming & CALM

By: Jeremy W. Sherman. Published: . Categories: distributed-programming gcd.

Distributed programming doesn’t get much talk in those terms in Cocoaland.

If you’re writing an iPhone app with a server, guess what: You’re writing a distributed system. For your users’ sake, I hope it’s also an offlineable system.

And we can view a multithreaded program as distributed programming, only with the distribution being far more local. Ordering issues rear their head when you start pushing data in chunks through concurrent queues, and the notion of producer-consumer punctuations (see below, Consistency without Borders) is practically useful if for no other reason than, “oh yeah, you can hide that activity spinner now, no more search results for ‘z*’ are coming”.

The End of the API

Some recent delvings on my part started when I read an article by cemerick, “Distributed Systems and the End of the API”. He brought up CRDTs and CALM. I’d heard of CRDTs before (thanks Patrick!), but not CALM.

I looked up CALM and found a good summary on the Bloom lang page and its intro: http://www.bloom-lang.net/calm/

BLOOMing CALM

From there read a blog post introducing the idea of CALM. I found it most useful for its many links. I started with the keynote slides mentioned therein. Those didn’t make more than 80% sense till I read the companion paper. After that, they’d got my attention good, so then I read the CIDR 2010 paper for more background on Bloom.

Bloom is a language implemented in/over Ruby that boils down to the Dedalus flavor/extension of Datalog, which makes time explicit in each relation and avoids the mess you find in Prolog where there’s this execution algorithm outside the system you need to worry about and play with. The entire thing is declarative, but the real point is that it’s straightforward to visualize the dataflow and analyze a Bloom program for points where the code computes a non-monotonic result.

“Non-monotonic” means it might need to change its mind about the output as new results arrive, which means you need some degree of coordination to ensure that non-monotonic computation actually got all the data needed to render a final judgment. And coordination has costs, especially if it’s between datacenters, or with a non-responding peer whose hard-drive just ate it.

Programs are made of memories, guesses, and apologies

From there I chased a reference to Building on Quicksand, which introduced the notion of programs being structured around:

This is even more obviously true in distributed programs, where you can’t keep every actor on the same page. Also points out that sometimes the right response is for a program to throw up its hands, email a human, and say, “Something is wrong. Figure it out, and apologize to user 1312347 for this weirdness.”

A Tale of N Consistencies

And then back to the Databeta blog, where I found “Consistency Without Borders” and accompanying paper.

This is a call for more research into assisting developers to grapple with consistency between two extremes. The first extreme is “let’s establish consistency only at the database layer in terms of reads and writes”, which is generally too conservative and expensive and too hard to safely and faithfully “compile” your program’s operations into. The second is, “let’s just handle all the consistency in our app”, which is also easy to get wrong, expensive, and not at all reusable.

Consistency without Borders looks at 3 different middle-grounds:

The paper also highlights the lack of data to assist in choosing between the many flavors of consistency. What’s the cost in making at trade-off? Can we afford more/less consistency in this case?

(I have yet to read their intriguing reference to the LADIS ’08 write-up “Towards a Cloud-Computing Research Agenda” about the extreme expense and danger of full consistency in an industrial context.)

Then moved to start looking at Peter Alvaro’s Blazes (slide deck. This is the flow-level analyzer mentioned in Consistency Without Borders.

Blazes looks for non-monotonic operations that aren’t protected by coordination based on annotations of code. This is a beginning towards assisting with debugging the issues encountered in distributed systems, vs. those you can readily debug with gdb or lldb. Once your code is all correct, there’s that small matter of, “Oh yeah, and that thing it’s correctly doing, is that semantically correct?”

Where It All Goes Wrong

But you still have to have correctly annotated everything you’re using. Good luck with the balls of closed-source mud you get to work with in GUI programs.

A major issue in organically grown software projects is even stating semantic properties, never mind ensuring they’re preserved across the application. We get tied up in matters like “did I just create a strong reference cycle” and futz about with that, do some refactoring, extract some things, whatever, and can continue in this vein for a good while till we have a serious mess in light of what the actual purpose of the application is. Leastwise, that’s what I seem to see too often. More on that later.