Google Room and The Definitive Guide to SQLite

By: Jeremy W. Sherman. Published: 1 Dec 2018. Categories: reading-notes. Tags: sqlite sql android.

Why read a book on SQLite?

The short answer: Because I needed to write a migration for a Google Room database.

Limited past exposure

I’ve mostly worked on Mac & iOS apps:

iOS: The data all lives server-side, and persisting it is someone else’s problem.
macOS: A database, even an embedded one like SQLite, just never was the right tool.

What do you mean, not the right tool?

Some folks reach for a database straight out of the gate. (Most backend frameworks, like Rails, sure seem pretty keen on baking in that architectural decision.)

But simple file serialization is often a great choice:

easy to work with (cat, ls, rg)
simple to manage (rm, cp, mv, ed)
easy to backup and restore
no DBA headaches
good enough surprisingly often

Most static blog generators seem to agree with this sentiment.

Even in cases where you think you might eventually need a database, if you don’t jump to conclusions, you may just never get pushed into that corner. One example is as Robert Martin tells of the acceptance testing tool FitNesse: they put off picking a DB for a couple years, then ultimately realized the project didn’t need one after all. It shipped as, and still is, a flat-file wiki.

Yes, SQLite would still have DBA headaches

You’d think it wouldn’t. You might think the same thing about a Berkeley DB key–value store. But the problems still sneak up on ya. Don’t take my word for it – consider MJD’s thoughts at the end of a bug hunt undertaken to get his blog generator populating the “subtopics” sidebar again:

I am sick of DB files. I am never using them again. I have been bitten too many times. From now on I am doing the smart thing, by which I mean the dumb thing, the worse-is-better thing: I will read a plain text file into memory, modify it, and write out the modified version whem I am done. It will be simple to debug the code and simple to modify the database.

What about CoreData?

Cocoa’s CoreData defaults to using SQLite as an implementation detail in its quest to simultaneously provide object graph persistence. But that’s an implementation detail: the file format is undocumented and subject to change at Apple’s whim. (As a bonus, you get platform data-lock from this. I’ve had cases where CoreData would have made sense, except that the file format needed to work outside of Apple platforms. Ah, well.) And the file format issues are in addition to the fun with thread containment, though that’s gotten to be substantially less of a problem over the years. So I had no reason to get cozy with SQLite to date, due to CoreData abstracting it entirely, and due to even CoreData winding up not being the right tool for me entirely too often (did I mention I like flat files and avoiding data lock-in?).

But Now, Room

But Google’s blessed persistence framework is Room. Unlike CoreData, Room is explicitly a SQLite wrapper. It aims to smooth some rough edges (writing DDL, un/marshaling data between rows and POJOs) and codify best practices around using SQLite on a (maybe pretty crummy) mobile device.

Some Rough Edges Smoothed

It’ll write DDL for you, based on your entity classes, so you don’t have to. (You can even crib from this when writing your migration.)
It’ll check your query syntax at compile-time, so you don’t have to wait till runtime for an attempted query to blow up in your face.
It’ll save off the schema description for you as a simple numbered JSON file, so you can look at it, and so the provided test tooling can help you easily test your migrations.
It’ll un/marshal data between database rows and POJOs (or POKOs, I suppose, with Kotlin).
It’ll vend a reactive stream for your query (by watching the table, rerunning the query, and pushing out the new value), so you can easily bind your UI to the DB. You got your choice of receiving your reactive stream as the very rich RxJava 2 data types or as Google’s simpler LiveData.

Some Best Practices Codified

You can’t hang yourself with an N+1 query, because it just won’t do them for you.
Room will throw an exception if you try to work with the DB on the UI thread. No magic debugging preference is required; it just does this, all the time. (Though I’ll grant that its exception doesn’t have anywhere near the panache of __Multithreading_Violation_AllThatIsLeftToUsIsHonor__.)
Making sure you don’t accidentally change something and forget to bump your schema version and provide a migration

Some Rough Edges Remain

It’ll generate the schema for you, but to move between schemas, you’ve gotta write the up-migration by hand. (You don’t have to write a down-migration. Room doesn’t support down-migration. Ever onward!)

So that’s what led me to read through a book on SQLite over a couple evenings. What follows are the notes I jotted down when I finished using LogsIt.

Aside: Room’s Design Is Swell

I really like Room’s design. It builds on top of a rock-solid and well-understood piece of tech (SQLite), there’s zero magic, you can readily dump the DB and poke around using the sqlite3 CLI tool to test and explore your queries, it codifies rather than prescribes best practice…

My favoritest thing, though, is that it vends POJOs. You don’t have to wed the heart of your app to a vendor framework just to get easy queries and streaming UI updates for free. You don’t have thread-bound crash-bombs lobbing through 99% of your app, or fight to keep it at arm’s length (as Dave DeLong advocates in point 8 of “The Laws of Core Data”) to avoid that. There’s no “will it fault? will it boom?” concern. They’re just objects. Plain, simple objects.

Anyway, on to the reading notes.

Reading Notes

Read a Book: November 14, 2018 at 21:57 Notes: Mike Owens, Grant Allen. The Definitive Guide to SQLite, Second Edition. Apress, November 2010. 9781430232254. Via Safari.

Read for background when writing migrations for Android Room.

Skipped in-depth coverage of C API, of other language bindings, and of the iOS & Android walkthroughs. Also kinda glazed over the shared cache stuff.

New to me:

Manifest typing and coercions. Odd middle ground between strict and duck typing that ends up as “will store anything you throw at it, but acts like strict if you do too”.
Blobs degrade to a linked list when they exceed the db pagesize.
Gory details of locking schemes and failure modes. You can deadlock yourself if you use multiple connections. Fun times!
Limited use of indexes. Only plays in with equality tests in order. Skipping an indexed column or testing a different relation will push the rest to linear scans.
You prolly want to “begin immediate” for write workloads.
.dump to backup a db and .read to restore.
Some SQL I wasn’t super aware of: case, is/not null, nullif() and coalesce(), create table as select, everywhere you can ab/use subqueries, and the overall “fixed function pipeline” (to borrow an idea from older OpenGL) for queries, esp wrt group-by and having and aggregate functions. And how the temp_store supports that.
The execSql wrapper can take a list of semicolon-separated statements.
Enabling WAL can fail: Maybe the fs lacks a required feature.
WAL vs rollback journal tradeoffs. WAL reduces contention to writer-writer, though a checkpoint can trip a longer than desired pause, and an old but still open read transaction can prevent flushing data to the actual DB (and elevate read times due to seeking through scads of pages).
Don’t access a SQLite DB over a network filesystem. Just don’t.
Bump your cache to match largest write workload to keep exclusive lock used only for flushing. Use the analyzer to see how many pages that is.
Maybe match the DB pagesize to OS. It’s 1024 by default, or half a Darwin page.
You can easily throw in custom functions, aggregates, and collations. SQLite is even more flexible than I knew! (Pair with check constraints for superpowers.)
Android CursorWindow maxes out at 1 MB of data. You can crash if you try to work with too much data.
Harder to discover introspection features:
- sqlite_master_table (not listed in .tables)
- pragmas table_info, index_list, index_info, database_list
- hooks and especially sqlite_trace()
Many things about bound params:
- Manually numbered bound params: ?1, ?2. Lets you reuse an arg without having to bind it multiple times.
- Named bound params. :name or @name. You have to resolve them to numbers with an API call, but that’s way more readable. And plays nice with dictionaries.
- TCL bound params. $name. Says “capture from scope”. May only be used by the TCL library?