Dealing with higher order operations coupled with primitives

From:

"Aaron W. Hsu" <arcfide@sacrideo.us>

Newsgroups:

comp.lang.java.programmer

Date:

Thu, 21 Jun 2012 22:08:37 -0500

Message-ID:

<6s2dnZ1-8r4ofH7SnZ2dnUVZ_s6dnZ2d@giganews.com>

My inexperience with Java has got me into a bit of a funk, both in terms
of its static typing system and in terms of Objects for higher-order
programming.

To give a simplified example of what I am dealing with, I want to have an
class that stores data in arrays of primitives (since I do not want to
incur the boxing overhead of Objects), but I also want to have multiple
types. I seem to have two options. I can subclass the main object to
specialize to a type, or I can use an internal type field that indicates
the type of the object. I have tried it both ways, and both seem to be
annoying. It does not seem that I can use Generics with primitive types,
so that also seems out. Is there a better way?

Either way, I am finding it annoying to program because I want to have an
abstraction for functions that operate over this structure. So I might
have a class as follows:

public abstract class Function {

public abstract void apply(Data result, Data argument);

}

Now, this works in the general case pretty well. I have a few constraint:

1) I want to make sure the interface has as low an overhead as possible;
2) I want to make sure that the interface allows me to reduce allocations
as much as possible.

I am using the result argument instead of returning a newly allocated
Data instance because of #1. Specifically, I might have something like:

public void apply(Data result, Data arg) {
  res.fun1(arg);
  res.fun2();
  res.fun3();
}

Where I can reuse the already allocated, potentially very large internal
primitive data structure in res multiple times, including across function
applications, instead of requiring a copy of that large structure (these
structures could, for instance, be gigabyte arrays).

Of course, the reason that I am using Function is that I want to be able
to define certain methods that take functions. For example, a form of
map, say:

void map(Function fun, Data arg)

Here if there are many parts to arg, I may fun.apply(tmp, arg.get(...))
many many times. In my benchmarks, this seems to be much slower than what
I could get without this abstraction. I am trying to reduce this
overhead as much as possible.

One attempt I have made with this is to have an FunctionPrimitive
interface. This interface provides apply methods whose arguments are
primitives, rather than Data elements. This means that I do not have to
incur the overhead of applying on intermediate or temporary Data elements
that I have to get rid of again sometime later. It's the unboxing and
boxing issue again. Doing this actually nets me some pretty big wins,
but it's still too much overhead.

Part of it is the dynamic checking that I have to do with each function
to ensure that I have the right type all of the time, but I am not sure
what other things may be happening. For example, I do not want my
FunctionPrimitive interface to have something like:

Data apply(int arg)

Because this requires creating an intermediate Data element that I will
just destroy later on. This is slightly better:

void apply(Data result, int arg)

But it still requires creating at least one temporary and then copying
out of it each time I invoke apply(). It also means that I cannot use the
subclassing of Data approach that I mentioned above because I cannot know
ahead of time, and therefore cannot preallocate the result Data object
before calling apply. This almost seems to necessitate a private type
field in Data. Then I can have some setter:

void set(int)
void set(boolean)

and so forth, which do the right things based on their input, and that
the apply() method will use to set the elements in the result object.

But I want to do better than that. If there is some sort of information,
maybe and index or key, that I can provide:

void apply(Data result, Key idx, int arg)

Then I can actually pass the Data object that is going to be used
eventually to the apply method and just have it put in the right data.

Even with all of this, though, in a simple benchmark, this abstraction
stuff still makes me about five times slower (2 versus less than 10
seconds). I really want to cut this overhead down.

Can someone suggest how I might improve my design? As a simple real life
instance, say I want to apply a function:

f(0) = 0
f(n) = n + f(n-1)

I want to apply this function over the numbers in the range [0,n) and
then print the last one, just for verification.

The simple, naive Java might look like this, I suppose:

public class IotaSumReduce {

  public static int sum(int n) {
    int res = 0;
    for (int i = 0; i < n; i++) res += i;
  }

  public static void main(String[] args) {
    int count = 100000;
    int[] res = new int[count];

    for (int i = 0; i < count; i++) res[i] = sum(i);
    System.out.println(res[count - 1]);
  }
}

Now, let's say that I want to map this to something a little crazier,
say, a functional style. A not so efficient version:

(car (reverse (map (lambda (n) (fold-left + 0 (iota n))) (iota 100000))))

But we can go crazier, let's do APL, which is nearly a copy of the above
in approach:

??????{+/??????}?????100000

Now, in my benchmarks, the APL version above actually runs reasonably
quickly for being interpreted: about four seconds, give or take. The Java
version runs in about 1.7 seconds, the C version on O2, about 3 seconds.
So, all of these are reasonably close at this point. However, let's say
that I wanted to take that functional APL style program and try to use my
above approach.

public class AplIotaSumReduce {

  private static class Sum extends Function {
    void apply(AplArray res, AplArray arg) {
      Function plus = new AplArray.PlusFunction();
      res.iota(arg);
      res.reduce(plus, res);
    }
  }

  public static void main(String[] args) {
    AplArray res = new AplArray();
    Function sum = new Sum();

    res.iota(100000);
    res.each(sum, res);
    System.out.println(res.getInt(99999);
  }
}

This program runs much much slower for me. If Function plus is used,
then we are talking about minutes, and not seconds. If I use the
FunctionPrimitive optimization I talk about above, I can get that down to
around 9 - 10 seconds. That's much better, but that's still not good
enough, I really need to get into the range of the naive C, APL, or Java
programs. I am wondering what re-engineering might allow me to get there?
I would appreciate any pointers that you all may have. And sorry for the
other languages, like Scheme and APL, they are just fun examples to use,
as they illustrate the higher-order paradigm I am trying to achieve, and
they have a succinct mapping to this model I am playing with.

--
Aaron W. Hsu | arcfide@sacrideo.us | http://www.sacrideo.us
Programming is just another word for the lost art of thinking.