I Said "Parallelise" Not "Paralyse" Part 2 - Classification
MartinPacker 11000094DH Visits (7655)
I hope you don't get the idea I'm overly into rigour, talking about Classification. But I think it has to be done - to provide terminology for this series of posts.
This is the second of four posts on Batch Parallelism, following on from Motivation.
If I think about how parallelism works in batch it broadly falls into two camps:
(If you look these two terms up in Wikipedia (possibly for the spelling) you get to see under a rather tasty graphic the words "Clam chowder, a heterogeneous material".)
Let me explain what I mean by these two, in terms of batch classification.
Almost all customers run more than one batch job at a time. Personally, I've never seen anyone feeding through a single job at a time.
But a lot of the time it's separate suites (or applications, if you prefer). Or certainly it's running dissimilar jobs alongside each other.
You can further divide this case - in a way which actually makes it less abstract: 1
By "linked" I'm mainly talking about data flows, though it could be operational cohesiveness.
This is the case where work is very strongly related. There are two subcases:
Which Do YOU Do?
I think most customers do "heterogeneous" to a very considerable degree. That's because it comes naturally and is the way the business has grown and driven things.
Less common (and I was recently pressed to give a view on how common) is "homogeneous". That's because it takes real effort.
The answer I gave was something along the lines of "I don't know for certain but I guess about 30% of customers do homogeneous". 2The reason I gave that answer is because I suspect homogeneous parallelism gets added to applications to make them perform.
It's my view that applications and going to have to become more homogenously parallel in the future - because of the dynamic I described in Part 1: Over time the speed up required of individual actors (typically batch jobs) is likely to outstrip that delivered by technology.
To become more homogeneously parallel we're going to have to understand the batch applications much better. (Actually that's true also of efforts for more heterogeneous parallelism as well. Parts 3 and 4 of this series will address some of this understanding - and provide some guidance on what's going to need to be understood. And hopefully will make this classification seem less dry and more helpful.
1 There's probably a rule that says the leaf nodes of classification schemes yield a higher proportion of concrete examples.
2 The "I don't know for certain" part of it is because I recognise I see a "self-selecting group" or "biased sample" of customer situations: Those that are particularly thorny or exceptionally critical.