Wednesday, January 25, 2012

Threads vs. Processes

I did a consulting job in Oregon a few years back where the manager was impressed that my solution for a sample piece of code involved multiple threads.  He explained that he generally disliked use of threads, because they share too much information.  This is very true, and it is both a strength and a weakness of threads: you can share information (so each thread can see what the others are doing) but also, you can share information (so each thread can interfere with what the others are doing).

The antisocial software that I am trying to rehabilitate has one of those threading difficulties--and it took a while to figure it out.  The user interface has a Save button; you click it, and it both saves data to the database, and then checks to see if the offender's home address is the same as any other offender in the system.  If there are other matches, it throws up a popup window that displays the other offenders at the same home address.

Of course, that popup window is done through a separate thread--but the abusive software parent responsible for this piece of code did not think about the fact that both threads were sharing the same database connection object.  If the retrieve matching addresses thread finished its SQL operation before the save thread started, everything worked just fine.  But as the number of matching addresses increased much above ten, the odds were excellent that the retrieve thread would still be retrieving data when the save thread closed the connection.

The database connection is now closed: but the retrieve thread is still retrieving data.  The results were highly unpredictable, with at least four different error messages that might appear, depending on timing.  Only occasionally did the SQL Exception come up "Connection already closed" which was the tipoff that something was not right.

Sad to say, there are more than a thousand popup windows in this system, and trying to figure out which of those are this sort of multithreaded monstrosity will keep me busy for decades.  Unlike Sisyphus, at least I get to retire in a few years.


w said...

Job security.... :-(

w said...

I take it this address issue is because many offenders often come from multi-family dwellings or release facilities, etc????

Clayton said...

Sometimes it because parole needs to know if someone is moving into a residence with another convicted felon, or taking a job where another convicted felon works. In many cases, ex-felons are moving into halfway houses.

It turns out that convicted felons often have a very hard time finding a place to rent, complicating attempts to rehabilitate them and re-integrate them into civilized society.

w said...

Yeah, I know and that has been the claim of some of those at the Occupy movement as to at least some of the tent city "tenants"---mostly sex offenders I believe.

When I first came to Idaho in the 80's I stayed in a well known community living facility in West Boise that was full of paroles (I was there because of the cheap rent until I got on my feet as I was tired of being homeless in southern CA).

Then there has been the single family homes that were being used to move sex offenders into neighborhoods full of children. I lived in a house not too far from one of those and the neighbors were not told this was being done. The outcries forced the operation to be shut down. Unfortunately too many people let their little kids run around with no supervision and this house was a stones throw for a very busy school bus stop!

Rob K said...

In principle I really dislike threads. Unix created a way to do multi-programming very simply by forking, which protected the memory space of the different processes from each other. But on Windows process creation is expensive, so threads came to be popular, since threading was cheap.

Those who fail to learn from Unix are damned to re-implement it, poorly.

hga said...

Rob K: Learn the wonders of Functional Programming and you'll find threading much easier to deal with. Wheres adding threading to conventional OO is an unmitigated disaster (well, without exercising iron discipline), since behavior (code) and data are both seriously intertwingled and smeared all over the place.

If you must do OO, especially with threads, use a serious design process like OOSE or RUP. (Which are also great approaches to mitigating risk; of course, all this assumes greenfields projects, whereas far too much toxic waste has been dumped into Clayton's code base as well as far too many I've worked with and no doubt you.)

Side note on UNIX(TM): one might say that those who fail to learn Multics are damned to re-implement it, poorly (dynamic linking is a great example). Forking and some of the virtues that come with it are an artifact of squeezing Multics and large address space concepts down into very small minicomputers, where to get many things done you had to divide and conquer via multiple processes. Things are so different today ... well, your current Sandy Bridge CPU has I and D Level 1 caches that in total are 8K bigger than the available address space on the PDP-11/20, a machine in the middle of UNIX(TM)'s evolution.

Anyway, that point is relevant to threading in that effective use of the latter is a key to getting good performance out of today's SMP/ccNUMA chips since we're running out of pure speed increases from either clock cycles or the width of individual CPUs.

Rob K said...

hga, I never said that I had problems with multithreading. I've done functional programming, and a whole lot of multithreading in C++ on Win32, and more than a bit of multiprogramming (i.e. forking child processes to do tasks) on Solaris and Linux. I like functional programming. I use a lot of concepts from it. But it's not magic.

My point was that multithreading was a cruddy work-around for Windows expensive process creation.

Multithreading is really no different than multiprogramming, except that when you use multiple processes, instead of one process with multiple threads inside it sharing the same memory space, you get protection of the memory space of each thread of execution, giving you better decoupling and a cleaner interface. Multithreading is a return to the bad old days of sharing the same memory space and tromping on each others memory. So then we get TLS. Hey look, protected memory space for each thread of execution! Didn't we have that already? bleh. So whether I fork a sub-process to do some work, or I spin off a thread, or even do RPC to a remote server on an entirely different computer, it's still another thread of execution, and the interaction of those threads of execution still has to be managed through the exact same synchronization abstractions. It's nothing new. We've been doing it for years, and the same problems apply.