Re: Obfuscators

From:

Joshua Cranmer <Pidgeot18@verizon.invalid>

Newsgroups:

comp.lang.java.programmer

Date:

Fri, 19 Sep 2008 23:18:44 -0400

Message-ID:

<gb1q2k$i0f$1@news-int2.gatech.edu>

bbound@gmail.com wrote:

On Sep 19, 9:35 pm, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:

Now, I'm as big a supporter of open source as most people here--I
regularly contribute code to Thunderbird--but even I realize that there
are numerous faults in open source. Its primarily fault is poor
usability

Was. That is improving rapidly these days.

The antecedent of "it" in the second sentence was open source as a
collective term, not Thunderbird (which I find quite usable, although I
do have some bias there).

Coders hired by Red Hat are no less interested in usability than
coders hired by Microsoft.

And coders working in their own free time for no money, who make up the
majority of open source coders, have much less interest in usability
than either.

Which isn't useful. It's negative-sum activity -- it consumes your
time and resources at the same time as reducing those available to
others. Negative-sum activity is never Pareto-optimal. Translation:
it's bad.

An obfuscator is a run of a tool that won't take any more time than a
compile step. I tested using the proguard source; it took 4 seconds to
compile. It took me about 4.5 seconds to obfuscate using proguard.

Time claims that jad spent about 10 seconds decompiling both versions,
but proguard doesn't do any flow obfuscation.

Let's see the effect of name obfuscation:
The original code (modified indenting to fit width here):
private VariableStringMatcher createAnyTypeMatcher(
   StringMatcher nextMatcher)
{
   return new VariableStringMatcher(new char[] {
     ClassConstants.INTERNAL_TYPE_ARRAY }, null, 0, 255,
     new OrMatcher(
       new VariableStringMatcher(INTERNAL_PRIMITIVE_TYPES,
         null, 1, 1, nextMatcher),
       new VariableStringMatcher(new char[] {
         ClassConstants.INTERNAL_TYPE_CLASS_START }, null,
         1, 1,
         new VariableStringMatcher(null, new char[] {
           ClassConstants.INTERNAL_TYPE_CLASS_END }, 0,
           Integer.MAX_VALUE,
           new VariableStringMatcher(new char[] {
             ClassConstants.INTERNAL_TYPE_CLASS_END },
             null, 1, 1, nextMatcher)))));
}

Decompiled code:
private VariableStringMatcher createAnyTypeMatcher(StringMatcher
   stringmatcher)
{
   return new VariableStringMatcher(new char[] { '[' }, null, 0, 255,
     new OrMatcher(
       new VariableStringMatcher(INTERNAL_PRIMITIVE_TYPES, null, 1, 1,
         stringmatcher),
       new VariableStringMatcher(new char[] { 'L' }, null, 1, 1,
         new VariableStringMatcher(null, new char[] { ';' }, 0,
           0x7fffffff,
           new VariableStringMatcher(new char[] { ';' }, null, 1, 1,
             stringmatcher)))));
}

Arguably a bit more readable. And now, the obfuscated code:
private static do a(en en)
{
   return new do(new char[] { '[' }, null, 0, 255,
     new aE(
       new do(a, null, 1, 1, en),
       new do(new char[] { 'L' }, null, 1, 1,
         new do(null, new char[] { ';' }, 0, 0x7fffffff,
         new do(new char[] { ';' }, null, 1, 1, en)))));
}

My first thought on seeing this (jad's actual indentation decision is
confusing) was "What the hell does a lot of nested new statements?" I
literally picked the file at random to read; the only way I could go
back to the original source was to note that another variable in the
same value contained primitive type values and guess that it had to with
class name parsing.

Looking at the first source code, I can guess that this constructs a
regex similar to the following: \[*([VZBCSIJFD]|L.+;) without reading
any other code. In the last example, I have no clue what's going on.

So the comprehension step explodes in terms of time it takes me. And if
we had flow obfuscation and needed to decompile from javap information?
My estimate is that would take me about 3-ish minutes, well up from our
original time measured in seconds (for the entire codebase!)

In summary: it took the author 4 seconds--doubling my time--at the
expense of increasing the decompilation time by well over 120 seconds,
or 20 times what it would have taken me. If you're counting the entire
application, that time is well higher, so the percent increase is on the
order of 1,000,000 percent! It's a /very/ easy win, as far as I'm concerned.

My arguments against it are manifold:
* Limited effectiveness

Obfuscation is sufficiently effective

No, it is not.

See above.

* Often, obfuscation is not done (solely) to prevent copyright
infringement anyway, but for even more evil purposes:

So you admit that there are valid reasons for obfuscating?

No. I just indicated that there are three reasons, one evil and two
even more evil. That's hardly the endorsement you seem to be implying.

So you're saying that trying to prevent copyright infringement is evil,
if I'm understanding right. Why do you think so?

It's like Bittorrent: a predominant use is for illegal purposes

Civil disobedience, while technically illegal, is not immoral.

I could say so much here, but there's really no point since I already
know the future of the argument, having gone through this once before.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth