Friday, 1 May 2015

stream of abuse

I know it's a "cool new feature" and all, but really guys, think before you put everything in a stream.

I wont link to the article but i came across one that included some code that used streams everywhere for some examples. Apparently it's "obvious" how using streams simplifies the code and so on.

For a start the code isn't really any simpler; it avoids needing to calculate the output size but that isn't a very complicated calculation. Unless all you knew was streams it certainly isn't any more expressive or concise either.

I converted one function directly to arrays and it was shorter and runs at twice the speed for a given test case. It also uses about 1/6th of the memory and roughly 30 000 fewer memory allocations (oh my poor breaking gc, still, java alloc is rather fast isn't it?).

Apart from that ... oh boy it does some bad bad things to a poor innocent atomic counter.

My first thought was "very poor and unscalable use of atomic counter". Then I looked closer.

    // names changed to protect the innocent
    AtomicInteger count=new AtomicInteger();
    int bob[] =
            Thing t=list.get(count.getAndIncrement());
            int p0=f.a; int p1=f.b; int p2=f.c;
            int t0=t.a; int t1=t.b; int t2=t.c;
            return IntStream.of(p0, t0, p1, t1, p2, t2);

So what this is attempting to do is iterate through two and merge pairs at matching indices into a flattened array of integers.

Except one of those arrays has been forced into a stream (for no reason other than it can be done it seems) - thus losing the position information. So to "recover" this information so that it can be correctly indexed into the other array an atomic counter has been added. But this solution is both dangerous and confusing. It's confusing because it looks like it should be concurrent code - that's exactly what atomic counters are for and also a common desired side-effect of using streams, but this loop is not being invoked in a concurrent context. It's probably there because it was left-over from such an attempt. It's dangerous because it is vanishingly unlikely to actually work if it actually was invoked on a parallel stream because atomic counters by definition just don't provide the required constraints and thus there is no guarantee of obtaining matching pairs. At best it looks like a thoroughly cheeky and worst-practice approach to work around the final rules for lambdas.

Whatever it is, it's sick. Don't ever do this (in any language).

In the source example this is part of a larger function where the previous two loops generate one each of these lists independently and in-order and then after this merge they are simply discarded. i.e. there is not any practical reason for any of this intermediate garbage creation apart from saving a little arithmetic in pre-calculating the required size of the array required. The two loops could be retained and just write interleaved results and without losing the expressiveness of the implementation.

But for arguments sake if you really did want to merge two array lists of a known length to an interleaved integer array, here's one approach that has worked for a few decades and is still about as good as it's going to get:

 int[] bob = new int[foo.size() * 6];
 for (int i=0, j=0; i < foo.size(); i++) {
   Point2D f = foo.get(i), t=list.get(i);
   bob[j++] = f.a; bob[j++] = t.a;
   bob[j++] = f.b; bob[j++] = t.b;
   bob[j++] = f.c; bob[j++] = t.c;   

Lets also say for arguments sake that the stream example did actually work in parallel, and you would gain anything from it (hint: you wont, see the end), so you got that for free right? What about that ancient and daggy old for loop?

 int[] bob = new int[foosize()*6];
 IntStream.range(0, foo.size()).parallel().forEach((i)-> {
   int j = i*6;
   Point2D f = foo.get(i), t=list.get(i);
   bob[j++] = f.a; bob[j++] = t.a;
   bob[j++] = f.b; bob[j++] = t.b;
   bob[j++] = f.c; bob[j++] = t.c;   

Code re-use

One argument for using streams is code-reuse - which falls over once you have side-effects and so the initial example is no better than the last in that respect.

If you really absolutely positively must use streams for this because of whatever reason (there are some legitimate ones): write a proper spliterator and use to turn it into a stream. It will have to take the two lists in it's constructor and iterate over a container object which holds the matching pair (like Entry<,>).

One will note however that for example the linked list spliterator just breaks the iterable into batches of arrays which are then spliterated over. Whilst this conceptually may seem that it could be a win due to locality of reference and doing the work piece-meal: in reality the spliterators are run to their limit before anything starts for scheduling purposes. So all it's really doing is breaking a single allocation and copy loop into many (sqrt(n)) smaller ones; which is always guaranteed to run slower and use less memory. i.e. you could just flatten both lists to arrays and then use the per-item or array methods above and get the same result - more efficiently - and with less programmer effort.

So whilst streams can save a lot of effort in writing concurrent code; it still many of the same gotchas that can introduce performance side-effects for the ignorant. For example if you're using thread synchronisation primitives at all in any stream processing chain including custom spliterators or collectors then you're literally "doing it wrong" as:

the entire point of the parallel part of the stream framework is to exterminate this unscalable approach

(I thought it needed a bit more emphasis than em/strong/underline could provide :)


I couldn't leave it there so I tried implementing an "Entry" spliterator: one that takes two streams and passes each one into the stream as an Entry.

Because you really don't want to run this on a linked list i forced it to take arraylists. But there are few cases (random deletes or deletes from the head) where you would want to use linked lists and the container approach to linked lists creates pretty shit lists since you lose the ability to delete randomly in O(1). So even when they might be a win for the name of the data structure they often aren't due to the implementation details. But I digress.

So using a spliterator which is under 10 lines of significant code:

    bob = SpliteratorEntry<>(
        (ArrayList<Thing>) foo,
        (ArrayList<Thing>) list), false)
    .map(e -> {
        Thing f = e.getKey();
        Thing t = e.getValue();
        int p0 = f.a; int p1 = f.b; int p2 = f.c;
        int t0 = t.a; int t1 = t.b; int t2 = t.c;
        return IntStream.of(p0, t0, p1, t1, p2, t2);
}).flatMapToInt(i -> i).toArray();

It doesn't make any appreciable difference to the running time or to those underwhelming allocation overheads; but at least it now allows for a reusable function and it isn't just plain "wrong".

FWIW making this concurrent just makes it a bit slower while using more cpu cycles and energy. But I could have guessed "not worth it" beforehand since it just isn't doing enough work to compensate for all the overheads on such a small problem (a few thousand items).

No comments: