MongoDB document keys are ordered. Subdocument queries fail to retrieve results whose keys are in a different order. Indexing only works with keys in the same order.
This uglies up the client application interface. Languages regularly provide literal syntax for unordered associative arrays (aka hashes, maps, dictionaries). To guarantee all similar documents have the same key order, you either forgo the literal syntax provided by your language to use an order-preserving data structure, or you throw a sorting layer in between the application and MongoDB.
That’s bad, but you can work around it.
What you can’t work around is MongoDB rearranging keys behind your back:
When performing update operations that increase the document size beyond the allocated space for that document, the update operation relocates the document on disk and may reorder the document fields depending on the type of update. (Update - MongoDB Manual 2.4.2)
Let’s read that again: “[T]he update operation […] may reorder the document fields.”
Just dandy.
Using MongoDB requires a fixed key order for reliable operation, but MongoDB fails to preserve key order. As convenient and simple as MongoDB is, this is enough for me to advise anyone: Stay well away from MongoDB.
]]>__weak __block block_t recurse;
block_t block;
recurse = block = ^(id val) {
…
recurse(subval);
…
}
The prototypical recursive function example is the factorial function:
uintmax_t
factorial(uintmax_t n)
{
NSParameterAssert(n >= 0);
if (n == 0) return 1;
return n * factorial(n);
}
So that’s a function. Now we want to make it a block.
As a block, that factorial self-reference is tricky.
A block only gets a name when you assign it to a variable.
That variable assignment won’t happen till after the block is created.
So you need a __block variable:
uintmax_t (^__block factorial)(uintmax_t n) = ^uintmax_t (uintmax_t n) {
NSParameterAssert(n >= 0);
if (n == 0) return 1;
return n * factorial(n);
}
Sadly, that’s a __strong __block reference by default. Oops.
If you don’t want to leak, you either need to take care to break
the strong reference manually by NULLing out the factorial variable
– good luck – or you need a weak reference.
So you retag the block variable as weak:
uintmax_t (^__block __weak factorial)(uintmax_t n) = …
Only now there’s never a strong reference to your block, so your block is eligible for deallocation right after it’s returned.
So you need both a strong and a weak reference to your block. And the block needs to be stored in the strong reference first, so you anchor it to this world. So maybe you do this:
uintmax_t (^__block __weak weakFactorial)(uintmax_t n);
uintmax_t (^factorial)(uintmax_t n) = …
weakFactorial = factorial;
That fixes it. It works. And it’s nice that we could drop the __block
from the main factorial reference, since that stays stable; it’s only the
weakFactorial self-reference that gets updated after the block first
captures it.
But having to do this follow-on assignment is kind of ugly, and it’s pretty forgettable way down there at the end of your block, and a few revisions later they’ll get separated, and then maybe you’ll accidentally delete the weak assignment, and then you’ll have to track down a bug. Ugh.
So try this instead:
uintmax_t (^__block __weak weakFactorial)(uintmax_t n);
uintmax_t (^factorial)(uintmax_t n);
weakFactorial = factorial = …
You’re not going to forget that assignment. It’s still ugly, but that’s what happens at the edges of ARC, and it’s at least not too fragile. You could even snippetize it, if you’re into that sort of thing.
Justin Spahr-Summers rightly notes that this approach works only for synchronous recursion:
Once your strong reference goes out of scope, wouldn’t it just stop recursing because all that’s left is a weak reference? (Twitter)
It’s actually worse than that: you’ll get a segfault, because you try to invoke a NULL block as a function. If you have some sort of concurrent or asynchronous recursion – maybe invoking the block kicks off a gradual countdown – then you’ll need to handle the case where the block is dead and has been zeroed but doesn’t know it yet.
Use the standard trick of
trying to obtain a strong reference and then testing whether that reference
is nil or not to decide how to proceed:
block_t recurse = weakRef;
bool zeroed = !recurse;
if (zeroed) /* bail out */;
/* use |recurse| to reference yourself, so that you don't segfault */
If you want the recursion to run to completion rather than fizzling out, I imagine you can work out the appropriate juggling act.
This isn’t a problem for the common case of synchronous block handlers, where you set the handler and don’t change it, and the object owning the strong reference is guaranteed to outlive the handler block’s recursion, but it is something to watch out for as you get cleverer or otherwise start destroying references while the block might be running.
]]>Perhaps it should be no surprise then that everyone gets it wrong. Over and over, I see the same amateur mistakes. Two seconds using your app in some locale other than the one you developed in would make you go, “Oh. Crap.”
Today’s mistake is “localizing” currency amounts in your application.
Take a look at this prime shot from the Home Depot website this morning:

I’m sending sv-SE as my preferred language, so I get back the site’s closest approximation to Swedish in Sweden as possible. Oh, how cool, that includes converting currencies to Swedish kronor!
Oh wait, no it doesn’t. They just rigged up their currency formatter wrong. The formatter thinks the amount they specified – an amount actually in US dollars – must be Swedish kronor (SEK), because they didn’t tell it anything more specific, and so the formatter printed the amount as SEK.
I ran into this same thing when I went to renew my car insurance. A whole table of US dollar amounts formatted as Swedish kronor.

This happens so often, across both websites and apps, that when I encounter something that actually does support displaying actual honest-to-God SEK amounts, I still assume it’s USD misrepresented as SEK until I realize the prices are wildly different from what the USD amount would be. This is so rare that it’s only happened once: thank you, Kayak, for getting this right!
If the exchange rate weren’t so far from 1 between USD and SEK (currently about 6 SEK to 1 USD), I don’t know how I’d be able to tell whether an app got it wrong or got it right.
This amateur mistake, repeated across vendors, applications, and platforms, makes for a terrible user experience for anyone working in a different region, even if they speak the same language as you.
Let’s see, how would you format a currency amount, say, $1,234.59 USD? How’s about this:
id amount = [NSDecimalNumber
decimalNumberWithMantissa:123459
exponent:-2 isNegative:NO];
NSString *display = [NSNumberFormatter
localizedStringFromNumber:amount
numberStyle:NSNumberFormatterCurrencyStyle];
NSLog(@"%@", display);
Very easy, right? And very wrong. That prints, “1 234,59 kr”.
Well, let’s try rigging up the currency formatter ourselves, eh?
NSNumberFormatter *fmt = [NSNumberFormatter new];
[fmt setNumberStyle:NSNumberFormatterCurrencyStyle];
[fmt setGeneratesDecimalNumbers:YES];
NSString *formattedAmount = [fmt stringFromNumber:amount];
NSLog(@"%@", formattedAmount);
Nope, still prints “1 234,59 kr”. This is likely exactly what the “convenience” converter did for us, minus the configuration setting to convert strings to decimal numbers. So that’s a lot of hoopla for no gain.
But it just takes one more line to get it right. You need to tell the formatter what exactly it’s formatting. Raw numbers aren’t enough: it needs to know the currency you’re giving it. Like so:
NSNumberFormatter *fmt = [NSNumberFormatter new];
[fmt setNumberStyle:NSNumberFormatterCurrencyStyle];
[fmt setGeneratesDecimalNumbers:YES];
[fmt setCurrencyCode:@"USD"];
NSString *formattedAmount = [fmt stringFromNumber:amount];
Yup, it just takes a setCurrencyCode: message, and all is right with the
world. This version here prints “1 234,59 US$” when using the Swedish region.
The currency symbol follows the amount, as is usual in that region,
and the separators for thousands and decimal are localized, as well.
Testing this is dead easy. You don’t even have to abandon your beloved mother tongue! Just change your region settings to something different. Swedish in Sweden has triggered enough localization mistakes for me, but I’m sure many other locales get messed up just as readily.
Changing languages is awkward – iOS does this semi-reboot thing, and you have to restart Mac apps to get them to pick up the change – but region settings can be changed in a jif and changed back just as easily, without forcing you to wander through menus in a language you might not understand. So switching regions is the list you can do to test how your localization – even inadvertent localization – might be wrong.
Change your region setting like so:
On the Mac, you’ll probably need to tic the “Show all regions” box before you can choose a region that doesn’t use your top language choice.
Declare the currency code to your currency-style number formatters.
[numberFormatter setCurrencyCode:@"USD"];
Test your app in different locales.
__attribute__ keyword to specify
that a Core Foundation property should be treated like an Objective-C object
for memory management:
@property(retain) __attribute__((NSObject)) CTFrameRef frame;
This is an easy attribute to miss. It’s also one you can go a long time without finding, because it’s not hard to work around.
(Do note that you can use the NSObject attribute with anything where you’d
use CFRetain/CFRelease, not just actual toll-free bridged objects. The toll
you’re dodging with __attribute__((NSObject)) is purely syntactic.)
You can get surprisingly far by just pretending that a CTFrameRef is an id:
@interface MyClass
@property(strong, nonatomic) id frame; /* CTFrameRef */
@end;
You just have to sprinkle casts in the appropriate places:
CTFrameRef frame = CTFrameCreate…
self.frame = (__bridge id)frame;
CFRelease(frame);
You can even use the casts to save you a line of code here and there:
CTFrameRef frame = CTFrameCreate…
self.frame = (__bridge_transfer id)frame;
/* ARC now has ownership of |frame|, so it is responsible for releasing it. */
Or perhaps use one of the less underscore-y Core Foundation wrappers:
CTFrameRef frame = CTFrameCreate…
self.frame = CFBridgingRelease(frame);
/* Now your Create-rule-trained brain can rest easy, because there’s a balancing Release. */
But I find the casts clutter up my code, and CF memory management is not bad in small doses, so I used to use macros:
#define $ID (__bridge id)
#define $CF (__bridge void *)
The $CF macro exploits C’s willingless to coerce void * the way $ID
expresses id’s willingness to be coerced. That breaks down under Obj-C++,
because C++ is not so willing to coerce, so you end up doing something like
this instead:
#define $CF(var, obj) lval = ((__bridge __typeof__((var)))(obj))
This ends up working OK, because you tend to assign to a variable of the Core Foundation type, make the cast once there, and then use that CF-typed var throughout the next bit of code:
CTFrameRef frame = $CF(frame, self.frame);
/* do something with |frame| */
But I could have saved myself all that mess had I just used
__attribute__((NSObject)). Aren’t attributes a wonderful thing?
//cc -g -c -Weverything -Wno-objc-missing-property-synthesis attribute_nsobject.m
/* @file attribute_nsobject.m
* @author Jeremy W. Sherman
* @date 2013-01-29
*
* Demonstrates the wondrous simplicity of `__attribute__((NSObject))`.
*/
#import <Foundation/Foundation.h>
#import <CoreText/CoreText.h>
@interface MyClass : NSObject
@property(strong, nonatomic) __attribute__((NSObject)) CTFrameRef frame;
@end
/* Look ma, no casts! */
@implementation MyClass
- (void)storeFrame
{
CTFrameRef frame = NULL;
self.frame = frame;
}
- (void)loadFrame
{
CTFrameRef frame __unused = self.frame;
}
@end
Justin Spahr-Summers points out via Twitter that this story used not to have such a happy ending. Some member of the clang/objc/ARC juggling act used to fail to retain nonatomic properties. The short tale is documented in a Stack Overflow thread and has been reported as rdar://problem/11040306.
Good news: As of Xcode 4.6 and OS X 10.8.2 (which are what I have on hand
to test with), the issue seems to be
fixed. The compiler generates a call to objc_setProperty_nonatomic which
will objc_retain the new value as expected.
The _nonatomic variant doesn’t seem to exist in my copy of 10.7.1’s
objc4-493.9, so from where I’m sitting, this looks to have been fixed in part
by an SPI change.
This appears to be an undocumented change affecting only Apple clang as of this time. It also seems that the fix will only work for this property-focused usage pattern; if you need a generic instruction to the compiler to use full ARC semantics for pointers of a certain type, you’ll still have to create a typedef to attach the type info to.
The ARC reference documentation continues to specify that only typedefs can be
annotated to create a retainable object pointer type,
and the open-source version of clang (as of r173899) still tests for
this, and, per the implementation in lib/AST/Type.cpp of
Type::isObjCNSObjectType(), this still seems to be the case:
bool Type::isObjCNSObjectType() const {
if (const TypedefType *typedefType = dyn_cast<TypedefType>(this))
return typedefType->getDecl()->hasAttr<ObjCNSObjectAttr>();
return false;
}
bool Type::isObjCRetainableType() const {
return isObjCObjectPointerType() ||
isBlockPointerType() ||
isObjCNSObjectType();
}
]]>?:. Use it to fall back to a default value when
a nil check fails:
id target = [self.delegate target] ?: [self.class defaultTarget];
The GCC docs present the binary ?: operator as eliding a repeated
first term when using the ternary conditional operator, so
x ? x : y
can now be written
x ? : y
and have the same effect as the full form, save that the first term, x, is
only evaluated once.
From this point of view, the binary ?: exists to avoid unwanted side effects:
int z = (x++) ? (x++) : y; // bad news
int w = (x++) ? : y; // OK!
But thinking of ?: as a special-purpose variant of the ternary operator
misses its true calling: cleaning up nil and NULL checks. It compacts
several lines of code:
id target = [self.delegate target];
if (!target) {
target = [self.class defaultTarget];
}
down to a one-liner:
id target = [self.delegate target] ?: [self.class defaultTarget];
So: The ”otherwise” – or ”if nil then” – operator: ?:. Use it.
dispatch_queue_{set,get}_specific. This replaces the thread-specific storage
provided by pthread_{set,get}_specific that you cannot use with GCD blocks:
static void *sQueueKey_Client = "client";
struct my_client *client = calloc(1, sizeof(*client));
*client = (struct my_client){ .val = 1 };
/* use the unique static address as the key,
* *not* the address of the string itself */
dispatch_queue_set_specific(q, &sQueueKey_Client, client, free);
dispatch_async(q, ^{
struct my_client *client =
dispatch_queue_get_specific(q, &sQueueKey_Client);
DoStuffWith(client);
});
Only there’s one new addition to the family: dispatch_get_specific looks up
the value in the current context defined by the current queue. This context is
broader than the single queue that dispatch_queue_get_specific will search.
If a key is not set on the current queue, it will check that queue’s target
queue. If it’s not found on that queue, it will move down the line to that
queue’s target queue:
dispatch_queue_t io_q = dispatch_queue_create("client_io_queue", 0);
dispatch_set_target_queue(io_q, q);
dispatch_async(io_q, ^{
/* This will check the current queue (io_q), fail to find
* the key, then check the target queue (q) and find it. */
struct my_client *client = dispatch_get_specific(&sQueueKey_Client);
SendMessage(client);
});
Queue-specific value lookup sounds a lot like chasing the prototype chain in a prototypal object system like JavaScript. In Obj-C, it echoes how method implementation search runs up the inheritance chain to find an implementation for a given message.
It turns out you can abuse this to transform dispatch queue value lookup into the heart of a prototypal object system embedded within Objective-C – where it’s not terribly useful, because Obj-C already has its own object system – or C, where it could be an improvement over hand-writing OOP in C.
I wrote a small, ugly demo of this. It’s available from GitHub as jeremy-w/demo-draft. As it stands, it’s certainly not an improvement over hand-written C OOP, but it did prove an interesting exercise.
]]>dispatch_debug and xpc_copy_description
are inconvenient,
particularly during impromptu debugging.
Mountain Lion’s Obj-C-ification of GCD and XPC objects lets you use your comfortable Obj-C tools:
NSLog with %@,debugDescription method, andpo obj while in the debugger.If you first learned GCD back before Mountain Lion,
you might have played around with the dispatch_debug
function:
void
dispatch_debug(dispatch_object_t object, const char *message, ...);
This function is the NSLog of Grand Central Dispatch land.
If you need to pin down what exactly is going on with a complex
network of dispatch objects, this can be a useful tool,
especially since you can use the libdispatch source code
to illuminate the more cryptic debug info.
But it’s also kind of annoying: unless you remembered to set
LIBDISPATCH_LOG=stderr in the environment before starting your
process, you’ll have to watch the system log for your
dispatch_debug output;
it won’t show up in Xcode’s debug console.
Changing the
value of the environment variable after startup
also doesn’t seem to affect dispatch_debug’s behavior,
so by the time you realize you’ve forgotten to set this
environment variable,
it’s already too late.
If you wanted to log information about an XPC object without leaking,
you used to have to xpc_copy_description,
log the string,
then free the returned pointer when you’re done with it:
char *desc = xpc_copy_description(obj);
NSLog(@"%s: xpc obj %p %s", __func__, obj, desc);
free(desc);
Well, good news: As of Mountain Lion,
GCD and XPC objects are all also
NSObjects, so you can use them
as the target for the %@ format specifier
and as the target for the -debugDescription
instance method. The latter dumps
all the information you used to get from
dispatch_debug.
As an example:
2013-01-08 23:53:13.451 debug[80089:707] dispatch queue: description:
<OS_dispatch_queue: com.jeremywsherman.demo[0x7f8980c07f80]>
2013-01-08 23:53:13.453 debug[80089:707] dispatch queue: debugDescription:
<OS_dispatch_queue: com.jeremywsherman.demo[0x7f8980c07f80] = {
xrefcnt = 0x2, refcnt = 0x1, suspend_cnt = 0x0, locked = 0,
target = com.apple.root.default-priority[0x7fff72c47d00],
width = 0x7fffffff, running = 0x0, barrier = 0 }>
XPC objects are pretty verbose even with description, but you get
a bit – sometimes quite a bit – more info if you send debugDescription:
2013-01-08 23:53:13.453 debug[80089:707] xpc connection: description:
<OS_xpc_connection: <connection: 0x7f8980e017f0> {
name = com.jeremywsherman.conn, listener = false,
PID = 0, EUID = 4294967295,
EGID = 4294967295, ASID = 4294967295 }>
2013-01-08 23:53:13.454 debug[80089:707] xpc connection: debugDescription:
<OS_xpc_connection: connection[0x7f8980e017f0]: {
refcnt = 1, xrefcnt = 2,
name = com.jeremywsherman.conn, type = named, state = new,
queue = 0x7f8980e00420->0x0, error = 0x0, mach = false,
privileged = false, bssendp = 0x0, recvp = 0x0, sendp = 0x0,
pid/euid/egid/asid = 0/4294967295/4294967295/4294967295 }
<connection: 0x7f8980e017f0> {
name = com.jeremywsherman.conn, listener = false,
PID = 0, EUID = 4294967295, EGID = 4294967295, ASID = 4294967295 }>
2013-01-08 23:53:13.454 debug[80089:707] xpc bool: description:
<OS_xpc_bool: <bool: 0x7fff7244d320>: true>
2013-01-08 23:53:13.455 debug[80089:707] xpc bool: debugDescription:
<OS_xpc_bool: bool[0x7fff7244d320]: {
refcnt = 80000000, xrefcnt = 80000000, value = true }
<bool: 0x7fff7244d320>: true>
The odd trailer to the XPC objects’ debug descriptions is not a typo – the XPC objects really do include their regular description as a component of their debug description.
debugDescription also happens to be what
gets printed when you print-object
(or po for short) an object
while debugging.
Treating a GCD/XPC object as a regular
Objective-C object is particularly
handy during impromptu debugging,
since you no longer need to futz about
with dispatch_debug and xpc_copy_description.
Instead, just use po obj when debugging:
% lldb ./debug
(lldb) Current executable set to './debug' (x86_64).
b debug.m:34
breakpoint set --file 'debug.m' --line 34
Breakpoint created: 1: file ='debug.m', line = 34, locations = 1
(lldb) r
Process 80337 launched: '/Users/jeremy/Documents/Blog/GCDTips/debug' (x86_64)
Process 80337 stopped
* thread #1: tid = 0x1c03, 0x0000000100000db7 debug`main + 135
at debug.m:34, stop reason = breakpoint 1.1
frame #0: 0x0000000100000db7 debug`main + 135 at debug.m:34
31 Log(@"xpc connection", conn);
32
33 xpc_object_t pred = xpc_bool_create(true);
-> 34 Log(@"xpc bool", pred);
35 }
36 return 0;
37 }
(lldb) fr var
(dispatch_queue_t) q = 0x0000000100107fa0
(xpc_connection_t) conn = 0x0000000100400830
(xpc_object_t) pred = 0x00007fff7244d320
(lldb) po q
(dispatch_queue_t) $0 = 0x0000000100107fa0 <OS_dispatch_queue:
com.jeremywsherman.demo[0x100107fa0] = { xrefcnt = 0x1, refcnt = 0x2,
suspend_cnt = 0x0, locked = 0, target =
com.apple.root.default-priority[0x7fff72c47d00], width = 0x7fffffff,
running = 0x0, barrier = 0 }>
(lldb) po conn
(xpc_connection_t) $1 = 0x0000000100400830 <OS_xpc_connection:
connection[0x100400830]: { refcnt = 1, xrefcnt = 1,
name = com.jeremywsherman.conn, type = named, state = new,
queue = 0x100400530->0x0, error = 0x0, mach = false, privileged = false,
bssendp = 0x0, recvp = 0x0, sendp = 0x0,
pid/euid/egid/asid = 0/4294967295/4294967295/4294967295 }
<connection: 0x100400830> { name = com.jeremywsherman.conn,
listener = false, PID = 0, EUID = 4294967295, EGID = 4294967295,
ASID = 4294967295 }>
(lldb) po pred
(xpc_object_t) $2 = 0x00007fff7244d320 <OS_xpc_bool: bool[0x7fff7244d320]:
{ refcnt = 80000000, xrefcnt = 80000000, value = true } <bool:
0x7fff7244d320>: true>
]]>But 10.7’s GCD was left behind in manual retain-release land. (XPC was too, but
GCD is our hero this time.) 10.8 fixed that oversight via a clever hack hidden
away in <os/object.h>.
The magic happens in the interaction between two macros, OS_OBJECT_DECL and
OS_OBJECT_DECL_SUBCLASS.
OS_OBJECT_DECL is used to declare the base object type of your refcounted C
library. It conceptually creates a new root class:
OS_OBJECT_DECL(dispatch_object);
Once you’ve declared a root class using OS_OBJECT_DECL, you use
OS_OBJECT_DECL_SUBCLASS to declare new subclasses:
OS_OBJECT_DECL_SUBCLASS(dispatch_queue, dispatch_object);
OS_OBJECT_DECL_SUBCLASS(dispatch_source, dispatch_object);
And magically, you now have types dispatch_object_t, dispatch_queue_t, and
dispatch_source_t.
As far as casts are concerned, these new types behave just like NSObject,
NSString, and NSNumber. If you declare variables like so:
NSObject *o;
NSNumber *n;
NSString *s;
The compiler will allow you to implicitly upcast without complaint:
/* hunky dory */
o = n;
o = s;
but not down or crosswise:
dispatch_cast.m:13:4: warning: incompatible pointer types
assigning to 'NSNumber *__strong' from 'NSObject *__strong'
[-Wincompatible-pointer-types]
n = o;
^ ~
dispatch_cast.m:14:4: warning: incompatible pointer types
assigning to 'NSNumber *__strong' from 'NSString *__strong'
[-Wincompatible-pointer-types]
n = s;
^ ~
Similarly, with these declarations:
dispatch_object_t o;
dispatch_queue_t q;
dispatch_source_t s;
This is fine:
/* hunky dory */
o = q;
o = s;
But this is not:
q = o;
q = s;
The error messages hint at how this is implemented:
dispatch_cast.m:27:4: warning: incompatible pointer types
assigning to '__strong dispatch_queue_t'
(aka 'NSObject<OS_dispatch_queue> *__strong')
from '__strong dispatch_object_t'
(aka 'NSObject<OS_dispatch_object> *__strong')
[-Wincompatible-pointer-types]
q = o;
^ ~
dispatch_cast.m:28:4: warning: incompatible pointer types
assigning to '__strong dispatch_queue_t'
(aka 'NSObject<OS_dispatch_queue> *__strong')
from '__strong dispatch_source_t'
(aka 'NSObject<OS_dispatch_source> *__strong')
[-Wincompatible-pointer-types]
q = s;
^ ~
And that’s the trick, you see. There aren’t any classes, just protocols.
Because protocols can be declared as conforming to other protocols, we have a
protocol hierarchy parallel to our class hierarchy. By using a
protocol-qualified type – NSObject<OS_dispatch_queue> * meaning, “Any
NSObject, so long as it conforms to OS_dispatch_queue” – we can make our
hierarchy concrete in terms of which OS objects can be pointed at by which
pointers.
Why NSObject and not id? Because ARC needs to be able to use
retain/release/autorelease, and NSObject provides a convenient declaration of
those and other methods.
Of course, there would have to be more to this OSObject thing than just
protocols for ARC to work: whatever type-level hackery you might perpetrate,
the message send [pointer retain] is only going to work if the whole
Objective-C message send machinery can use what’s at *pointer as an Obj-C
object.
Consequently, things look a lot different from inside libdispatch. There are covert class interfaces and corresponding implementations that go along with the public protocols.
A shame you can’t just sprinkle a few macros over a C library that uses refcounting and have it work automagically with ARC. Now, there’s a thought…
]]>+new past a certain
programmer-age.
Or is it?
Yes, creating a new object requires allocating its storage and then initializing it. But they’re not really distinct any more. It’s not like we do:
id obj = [Foo alloc];
if (!obj) error("allocation failed");
obj = [obj init];
if (!obj) error("init failed");
And zones are dead, so separating alloc and init so you can do:
id obj = [[Foo allocWithZone:fooZone] init];
doesn’t really matter any more, either.
And it breaks down even further when you look at Core Foundation analogs. There
is no CFAlloc() followed by CFArrayInit(). Core Foundation just has Create
methods that take an allocator to handle the “different zones” concern.
Normally, you just pass NULL or kCFAllocatorDefault for the allocator
argument, but either allocators have better support than zones at this time, or
Apple just doesn’t care enough to write “don’t use allocators any more”
anywhere.
Since these are equivalent:
CFMutableArrayRef array = CFArrayCreateMutable(
NULL, 0, &kCFTypeArrayCallbacks);
NSMutableArray *array = [[NSMutableArray alloc] init];
I see no reason not to just do a “single call” alloc-init in Foundation-land, too:
NSMutableArray *array = [NSMutableArray new];
This for the common case. When you need to pass args in during construction,
back to alloc-initWith… it is!
Aside: As a practical motivation for +new, when I’m throwing together a quick
commandline program to see whether something behaves one way or another,
[Blah new] types a lot faster than [[Blah alloc] init], particularly if I
forget the double-bracket at the start and have to back up and fix it.
Aside 2: Many of the Foundation types let you get away with a compromise, like
[NSMutableArray array]. In ARC-land, this is effectively no different than
writing [NSMutableArray new] – what if you later need an arrayWithObjects:!
what if you later need to allocate it in a different zone! –, but I never see
anyone inveighing against -array, or -string, or -dictionary. So.
This is what automated refactoring tools were designed for. And Apple has
provided us with an oft-overlooked arrow in our devtools quiver that’s just
what we need here: tops.
Check out man tops. The tool has a decent understanding of Obj-C
syntax and accepts scripts that let you rewrite code to use new method calls,
new functions, and what-have-you. The examples make it look like this tool was
invented to ease the transition from NeXT-style Obj-C to Cocoa, like this gem:
replace "NXGetNamedObject(<b args>)" with same
error "ApplicationConversion: NXGetNamedObject() is obsolete.
Replace with nib file outlets."
That should take some of you way back.
Anyway, with this tool, modernizing your code can be as simple as:
tops -semiverbose -scriptfile literals.tops **/*.(h|m|hpp|mm)
Want to check that it will do the right thing? Throw -dont into the args.
Want to watch over its shoulders as it rewrites your code? Replace
-semiverbose with straight-up -verbose.
Now, for that magical script file:
And here’s an Obj-C file to test it against:
Consider thesis 2 of Tim Bray’s “Eleven Theses on Clojure”:
In school, we all learn 3 + 4 = 7 and then sin(π/2) = 1 and then many of us speak languages with infix verbs. So Lisp is fighting uphill.
I call bunkum.
I don’t want to single out Tim Bray here. I’ve seen this other places before. It’s a popular folk explanation. But his is the straw that broke the camel’s back.
Folks often reach for natural language or arithmetic notation to explain why Lisp prefix notation is golly gee so hard. The argument goes like this:
The heart of the argument is a mismatch between spoken language sentence order and Lisp syntactic form order makes reading Lisp hard.
But if we remove the mismatch, does Lisp get any easier? Let’s see:
Therefore, Lisp is:
Does speaking Gaelic condemn you to unassuageable puzzlement at infix notation?
Does speaking German grant a supernatural facility with postfix notation?
Are Finns left out in the cold, waiting for an appropriately agglutinative programming language?
Are French speakers flocking to Linotte?
I think not.
]]>The ability to visualize the consequences of the actions under consideration is crucial to becoming an expert programmer, just as it is in any synthetic, creative activity. (SICP 1.2)
There’s a larval Big Nerd Ranch reading group, and it has me reading through Ye Olde Wizard Book, Structure and Interpretation of Computer Programs. I’m pretty early yet in the text, and just happened upon the quote you find up top there starting this post.
You could blow by this pretty fast on your way to some deep wizardry.
Don’t.
This ability to stare deep into and through a line of code and watch the clockwork wheels spin is key to mastering the craft of programming. It’s what separates those who understand what their code is doing from those who continue to view the operations of their compiler or interpreter as a mystery concealed behind a veil impenetrable by mortal eyne.
Let me give it to you straight: There ain’t no deep black magic here. There is in fact nothing more quotidian than the process that takes a line of code and translates through layer upon layer of lengthy and tedious documentation into something that ultimately can be executed by the materia technologica sitting there upon your desk. Or your lap. Or held in the palm of your hand. Form factor changes; number and names of layers change; ultimate lack of magic does not.
Don’t lie to yourself that all that happens between make and ./a.out is
impenetrable. It’s all there, waiting for you. It’s a long and well-trod path.
Don’t turn away from it: put one foot in front of the other, work your way down
one more layer of abstraction, and start to see how the sausage is made.
The basic idea of behavior programming is to compose a bunch of simultaneously executing state machines. Each machine represents a behavior.
But you don’t use the plain event in/event out state chart to define these machines. Instead, you add modal operators to specify what must/may/mustn’t happen next:
These operators represent a synchronization point between behaviors. Once all behavioral threads, or “bthreads” for short, have blocked expressing a modal preference, the executive picks a must-event that’s not blocked by a mustn’t operator, carries out the event, and notifies any bthread that had that event listed as a must or may event. Execution then continues till every bthread blocks again by specifying a modal operator.
I like this model because it’s a thin but powerful layer over existing models. You can implement it with pthreads and synchronous pub/sub, which could be as simple as a group of pthread conditions.
The development approach you end up with differs markedly from the model you build your bthread framework in.
Here’s the powerful part of the bthread layer: Decomposing your problem into bthreads frees you up to pick a set of coordinating events, then start coding scenarios around those events. Fire up your program, see how it does, and then fix any bugs that testing/simulation/model checking shows up and go again.
The state machine ness also lets you react to entire event traces in order to handle things like a “win rather than defend” strategy in tic-tac-toe, which is one of the basic examples given in the article.
There’s room for plenty of cleverness in how the executive selects the next event and how to test and check bthread programs, but the core idea is elegant and exciting. The full article is worth a read.
David Harel, Assaf Marron, Gera Weiss. Behavioral Programming. Communications of the ACM, Vol. 55 No. 7, Pages 90-100. doi 10.1145/2209249.2209270. http://cacm.acm.org/magazines/2012/7/151241-behavioral-programming/fulltext. Retrieved 2012-07-23.
]]>@import compiler directive. News of this compiler directive
appears to be spreading through the
Objective-C developer community mostly by way of Twitter-pigeon.
As it happened, I had not heard of @import.
But then the inimitable Mikey Ward (alias: Wookiee) asked me about
it. Two persons independently inquiring? Now I had to look into it.
It appears modules are filtering into Objective-C by way of C++, the same way Objective-C is rumored to be inheriting you-pick-the-base-type enums from C++TNG. Only this time the feature isn’t part of any standard.
I get this idea from an exchange on the cfe-dev mailing list in late December, which I have condensed into a single apocryphal message:
If you check recent (the last 6 months or so) commits to clang by Doug Gregor, you’ll find some work to implement C++ modules is already underway.
I’m not sure how much it’s based on any specific proposal.
To misquote Doug [Gregor] (can’t find the email, I think it might’ve been on IRC): “The semantics are obvious enough, so I’m implementing those. After that we can haggle over the syntax”
(In case it’s driving you crazy, “cfe” is short for “c/clang front-end”, which
is all the clang tool you use from the commandline is: a driver for a whole
mess of surprisingly unmessy library code.)
At this point, Doug was kind enough to chime in:
Most of the work I’m doing is in three places. The Serialization module, which takes care of serializing/deserializing an already-parsed AST, is the hardest part: it’s the infrastructure that allows one to compile a module on its own, storing the serialized AST to disk, and then load that module into another translation unit later on. This part is likely to be the same regardless of how modules behave. [Clang will produce and cache module AST files on the fly. Authors and build systems will remain ignorant of these AST files.]
The module map part of the Lex module handles the mapping between headers and modules. It’s mainly a transitional a little sub-language that allows one to describe the relationships between headers (which are used everywhere today) and modules.
The easy part is the parsing of module imports, labeling what is exported/hidden, and name-lookup semantics. It’s also the part that people will want to discuss endlessly, so for now the various keywords are uglified so that we don’t commit to any one syntax.
Once you chase on down through the code, you find yourself at the abstract
syntax tree level staring at the ImportDecl class. What does it do? Well,
[it] describes a module import declaration, which makes the contents of the named module visible in the current translation unit. An import declaration imports the named module (or submodule). For example:
@import std.vector;Import declarations can also be implicitly generated from #include/#import directives.
That’s right: #include/#import are going to become legacy syntax for
this nifty new modules system. Then, instead of playing “find the header that
includes the symbol you want to use”, you will be able to basically just import
the functionality directly.
The actual mapping from module name to file is handled by the ModuleLoader.
Judging by the tests, there’s going to be a way to explicitly manage this
mapping using module stanzas in a ModuleMap:
module category_left {
header "category_right.h"
export category_top
}
You also get another way to manage symbol visibility via export
directives in the module stanzas, as you can see there.
Possibly the awesomest future visibility control is that over those pesky preprocessor macros. That’s right: the proposed syntax is
#define MODULE_H_MACRO 1
#define MODULE_H_PUBMACRO 2
#__private_macro MODULE_H_MACRO
#__public_macro MODULE_H_PUBMACRO
The private/public macro preprocessor directives update the visibility of the named macro. If you have multiple macros, you have to issue multiple directives – there’s no support for privatizing multiple macros something like:
#__private_macro XYZZY PLUGH PLOVER /* <-- THIS DOES NOT WORK */
I haven’t the faintest notion when we’ll see this live and slaving away under Xcode, but I am looking forward to the coming sleeker, faster import process.
References galore, from top to bottom:
__import_module__ to @import syntax.LexAfterModuleImport.ParseModuleImport.ActOnModuleImport.Malloc is merely “adequate.” And it’s only adequate if
you’re writing simple programs. Real programmers write their own memory
manager. It’s the first thing they do after they ditch their shaving kit and
start growing their Samson neckbeard.
Don’t believe me? Listen up:
The C language has no memory allocation primitives, although a standard library routine, malloc, provides adequate service for simple programs. For specific uses, however, it can be better to write a custom allocator.
Sam was just some text editor from the 80s, and its memory management was way
more rocking than your Twitter client’s will ever be.
Sam memory management was so rocking that it filled two arenas. That’s
right: two arenas. Your memory management needs are insignificant, puny, and
plebeian, serviced adequately by the C standard library. Sam got true rockstar
treatment: two arenas; two custom allocators; high maintenance, premium memory
management.
The first arena holds staid structs of fixed length. It’s filled first-fit. Nothing magic there.
The second arena holds variably sized objects like strings. In an editor, strings are always changing, growing, splitting, combining. A regular bunch of problem children. So it’s managed by a garbage-compacting allocator.
These arenas are erected side-by-side in memory, with the second arena getting the higher addresses. When the first-fit arena needs more space, it just bumps the compacting arena up in memory.
The real magic is how these two arenas are used together. Take for example a
variable-length array. Sam handles this by creating a struct with a length
and a pointer. The struct is allocated in the struct arena of course, but
its pointer points into the compacting arena. The allocator knew to go back
and rewrite the struct pointer whenever it moved its memory during compaction,
and the programmer knew (or learned really fast the hard way) to always use the
struct’s pointer field directly each time rather than caching it away somewhere.
Now that’s some pretty boss hacking: elegant, but at what many today might consider an advanced, “don’t go there without a friend” low level.
P.S.: I would encourage you to check out sam’s source to see how it’s done,
but yesterday’s arenas are no more. The current source just calls
malloc once for each allocation.
First up: structural regular expressions, as introduced in the GUI text
editor sam:
In other UNIX programs, regular expressions are used only for selection, as in the sam g command, never for extraction as in the x or y command. For example, patterns in awk are used to select lines to be operated on, but cannot be used to describe the format of the input text, or to handle newline-free text. The use of regular expressions to describe the structure of a piece of text rather than its contents, as in the x command, has been given a name: structural regular expressions. When they are composed, as in the above example, they are pleasantly expressive. Their use is discussed at greater length elsewhere.
x extracts every chunk of text matching the regex provided to it. Each chunk
has the rest of the editing pipeline run on it. Want to change every n in a
hunk of text to an m? Select it all in the window with button 1, focus the
sam command window with button 1, and type in:
x/n/ c/m/
Hit return and this command pipeline runs on the (implicit range) dot, also
known as “the current selection.” x grabs an n, c then changes it to an
m. You can layer on more commands, including g (guard) as a by-the-way if
statement. The command text stays in the command window in case you want to
run it again.
Boom! Instant macro, no memorization of registers required. Take that,
vim qX…q @X.
This search for symbiosis between mouse and keyboard is what led to sam. Most
UNIX editors bolt mouse input onto an established keyboard-centric paradigm.
Sam rethinks editing to make the mouse an integral part of it. (Acme would
later take this mouse integration to new heights. We’ll get to acme in time.)
Back to structural regular expressions now. Pike has a whole paper on the topic that I will doubtless get to eventually. But just this little bit is tantalizing enough.
I mean, think about awk, think about how you use regular
expressions there, or how you use them in your editor du jour.
Are these tools really making the most of regular expressions?
awk and friends just perform record splitting on a set of separator
characters. Imagine how limited your regexes would be if all you got to do was
specify what to stick between two character class braces: [your characters
here]{1,}. That’s all you get with this simple record separator
construction.
And it’s not like we’ve made great strides: Search-and-replace in an IDE like Xcode or Eclipse gives you even less expressiveness.
I look forward to reading more about structural regular expressions in future.
For more on sam, see:
sam at cat-vI’d run into this once or twice before, but I always took the simple way out: just rewrite the one or two links in the text by hand. No big deal.
But these search results were just a list of links. And as a programmer, I am vocationally virtuously lazy.
That’s when I remembered Pandoc. Pandoc is a tool for converting between markup languages. I grabbed it as a Swiss army knife alternative to the more questionable Markdown formatters out there. (Markdown’s reference implementation is in Perl. I have trouble regarding any Perl as anything but a fragile hack.) I actually used it the first time as part of avoiding wordprocessors: instead of emailing a PDF/Word doc/Pages doc (in order of increasing uckiness), I just write up a Markdown doc, format it into a standalone webpage, slap in some CSS, and email it off.
Veering back on course, I recalled it could be used not only from Markdown to
HTML but from HTML to Markdown. And how to get pasted text from the browser
into HTML? I didn’t want to muck with View Source, so just opened up TextEdit
and let its erstwhile annoying habit of preserving pasted formatting work to my
benefit. Copy from Aurora, paste into TextEdit, save as HTML, then pandoc -f
html -t markdown foo.html | pbcopy, back to Aurora, and paste, and beautiful
Markdown appears.
Long story short: Use pandoc to convert HTML into Markdown for your Reddit or
Stackoverflow or blogging needs.
The book is structured around three steps: gaining perspective, macro-editing, and micro-editing. Macro-editing addresses the structure of the work. It requires elucidating then shaping that structure and the characters and themes that build it. Micro-editing examines word choice, continuity, and other concerns at the level of the individual paragraph, sentence, and even word.
Each practical chapter ends with a bulleted summary and exercises. The summary frees you to focus on reading the book. Without it, you’d regularly interrupt your reading to scratch down notes. The exercises give concrete practices to improve your editing.
After each chapter comes an interlude wherein various authors reflect on writing and editing. These leaven the book’s didactic tone, but all are forgettable save the last, Michael Ondaatje’s “One Doesn’t Just Write a Book, One Makes a Book.”
The chapter on gaining perspective covers the usual approaches – bury the work to revisit later – and some unusual approaches – string your work across your study, step back till it’s just squiggles, and examine its topography.
The macro-editing chapter seamlessly blends literary criticism with instruction in the structural elements of writing. Before-and-after passages from The Great Gatsby demonstrate each element, while excerpts from letters between F. Scott Fitzgerald and his editor Max Perkins illustrate the editing process.
The micro-editing chapter tries to maintain the style of the macro-editing chapter but fails. It drags, and I was glad to move on.
The last two chapters turn from the mechanisms of editing to its variety and historical background.
Second to last is a chapter of interviews with authors and artists about their editing process. The story of Walter Murch editing the film The Conversation stands out. The director demanded a refrain repeat exactly the same throughout the film. He threw out one take because the actor’s accentuation differed. Murch decided against the director’s instructions to cut this take back in over the last seconds of the film. The different word stress recontextualized the refrain and so the film. The other interviews reinforce the variety of approaches to writing and editing, but none stays with you the way Murch’s will.
The last chapter recaps the role of the editor since ancient Rome. It ends with Robin Robertson editing Adam Thorpe’s Ulverton. The history entertained; the story behind Ulverton grabbed me. The men developed unconvential ways of editing this intricate work, including extensive color-coded diagrams tracking leitmotif, themes, and lineages across fictional centuries. Their cooperation parallels Fitzgerald and Perkins’, bringing the book back to where it started: the necessary pleasure of editing.
]]>chmod u+x tt-ifmud and execute the file from a terminal.
Script features:
afk msg or away msg, and it will @away and zone you.Why tintin++? Process of elimination!
Getting tintin++:
brew install tintin (Fink and MacPorts users, you’re on your own.)I hope this saves some other ifMUD newbie some time. Those of you who haven’t checked out ifMUD, check it out!
]]>