Saturday, February 21, 2009

...and then two come along at once (a.k.a. Core Data travails)

Per previous posts, I really like what the fine people at Apple have done with Core Data, in general.

One small thorn in the side though is working with Core Data in a multi-threaded environment.
Now, the docs make it quite clear that there are some limitations and choices to be made. Managed Object Contexts (the in-memory state of your Core Data data set) are not thread-safe. The documentation highlights a number of options basically involving having a sort of apartment model for MOCs. However it then goes on to say that, although discouraged, careful locking can be used to allow a MOC's managed objects to be passed between threads and used. The 'careful' locking in this case would include not just locking the MOC instance when you are executing queries (fetch requests), but also any time you explore an managed object's properties (because this can cause faulting on the object and automatic fetching of more detail).

So, given the discussion on locking, that I felt I understood even if it had to be applied diligently and completely, I had decided to take a 'shared MOC with extensive locking' approach. The approach seemed to be working out (despite the bracketing locks dotted all through the code to protect any access to managed object state changes - even through reading). However, a nasty problem emerged - a hang while updating a table in the UI. Yesterday, investigation revealed that a background thread had acquired the MOC lock in order to perform a query, and that the main thread was drawing the table and was waiting to acquire the lock on the MOC in order to read a string attribute on a managed object. A deadlock was occurring because the act of querying the MOC in the background thread caused something in Core Data to write data to the table, causing it to want to update, but an internal lock on changes to the table view was not obtainable because the table was repainting in the main thread. Urg.

So what was causing the MOC to send a data update to the table (via its controller)? Apparently, this is what it does when the table is bound (Cocoa bindings) to model data via a controller object. In other words, apparently you cannot use MOC locking to facilitate multi-threaded use of Core Data if you are using binding at all. At least I think that's the practical upshot of all of this (i.e. it may be possible to find hooks in the updating/drawing code in all UI objects bound to Core Data such that further locks could be inserted to prevent such a deadlock - but clearly this approach is coming apart at the seams. This is presumably one of the reasons Apple suggests locking is "strongly discouraged" (though AFAICS they don't elaborate on the nature of beast that lurks if you try it).

Once the full realisation had hit, naturally the question remained as to what to do about it. The apartment model is one approach, but we require near instantaneous updating our our UI as the model changes in thread's MOC, and it is (currently) unknown what the effect would be (nor how to best set up a MOC-per-thread: where to store references to the MOCs and whether any manual committing/synchonisation and is required). For the meantime I elected to back out to the simplest approach of having only the main thread actually make ANY use of the MOC. However, at this stage we have several bits of code running in background threads that were locking the MOC and proceeding to do fetches and use managed objects. What was the easiest way of having that code execute on the main thread, while the code around it continued to be running on a background thread?

To punt some kinds of method invocation to the main thread, Cocoa provides - performSelectorOnMainThread:withObject:... style methods. Like the other methods in the "performSelector" family, these are fine for one-way message sends with only one or two arguments. These methods can never return a value, which is also a common case, and they are limited to object parameters (e.g. no scalar values). This limitation got me wondering what it would take to build something that would marshal potentially any message to another thread, and deal with the return value correctly. Of course, it turns out that this already exists in the form of Cocoa Distributed Objects - but then the challenge was to come up with something that worked effectively in the case of punting an instance method message send to the same object on the main thread.

Here's what I came up with:
#define MISELF \
NSConnection *miself_service; \
id miself;

// Initialise miself ivars on main thread, wait to ensure complete
#define MISELF_INIT \
[self performSelectorOnMainThread:@selector(miselfServeInstanceFromMainThread) withObject:nil waitUntilDone:YES];

// Initialise miself service and set ivars to connection and proxy
#define MISELF_METHOD \
- (void)miselfServeInstanceFromMainThread { \
NSString *serviceName = [self description]; \
miself_service = [NSConnection serviceConnectionWithName:serviceName rootObject:self]; \
miself = [NSConnection rootProxyForConnectionWithRegisteredName:serviceName host:nil]; \
}

These three macros define a little toolkit for converting any class to provide inter-thread thunking for its instance methods.
To set up a class, you put the MISELF macro in the ivars declaration (in the class interface), the MISELF_INIT macro into the -init method or methods, as required to have it always called once on any object initialisation, and the MISELF_METHOD goes anywhere you would start a method definition in the implementation block.

With these in place, the power of Cocoa Distributed Objects is available to punt method invocations (from within the class) to the main thread by using this form:
[miself doSomethingWith:@"Marvin" andThisNumber:42]
...instead of:
[self doSomethingWith:@"Marvin" andThisNumber:42]

At the very least, this has made some of the remedial work, of getting code that runs in a background thread to do its Core Data business on the main thread, rather easy. I still have to learn whether the MOC-per-thread approach would the right way to go ultimately, but as least I've just ridden out of Deadlocksville. Hopefully I won't be back for a while.

P.S. Apple developer documentation ranges from really excellent to the, well... less than really excellent. Overall, I'd say the Core Data docs are better than average, but they could still do with explaining multithreaded options much more clearly, particularly when it comes to how exactly one would typically create the MOC-per-thread scenarios. Also, while the locking option is certainly "strongly discouraged", there is no elaboration as to what this means, and in fact the conversation winds on to talking about how locking can be implemented. Given the importance of binding in Cocoa these days, there really should be some specific mention of the (effective) incompatibility of binding with locking (in an attempt to deal with multiple threads - the only reason for locking in the first place).

No Toot for a long time

Well, I should probably pick this blog up again. The hiatus has been filled with further learning and a measure of head scratching as I've continued to spend time getting to grips with Apple's Cocoa. It's been over a year since I started to play with it, and as might be expected I've come a long way, but feel like I have so much further to go. I don't remember ever feeling like I had so much to learn when faced with other frameworks (such as Java JFC, AWT and Swing). This is surely because Cocoa is such a rich framework and much higher-level than most others - or perhaps I'm just getting a little bit older and have killed too many brain cells with beer? Anyway, maybe a little retrospective "A year in the Mocha" could be forthcoming to capture some of the highs and lows in my Cocoa journey this last year.