Re: Threads design question
Luc Van Bogaert wrote On 02/12/07 15:12,:
Hi,
I'm wondering if anyone can provide some help or suggestions....
I have a method that generates a HTML report file for any given number
of Project or Task objects. This involves some I/O activity (creating
files, reading & writing to streams, deleting temporary files, etc.)
and some calculations (mostly String methods). Creating a typical
report takes about a second or two, but this could increase as the
data builds up, so I'd like to generate the reports using a seperate
thread.
I'm wondering what would be the best approach...
I could just use one thread and have it sequentally generate all the
reports. Or I could create several threads, one for each report to be
generated. As each report requires access to the file system, I'm
thinking the second approach doesn't make much sense anyway.
Could anyone please comment on what option I should take?
There's another approach that's intermediate between
"N reports on 1 thread" and "1 report on each of N threads,"
and that's "N reports on M threads." Typically, your program
would do something like this:
Create a Queue or similar data structure
Create and start M threads. Each thread checks
the queue, finds it empty, and calls wait() to
await developments.
When you want to create a report, add a "job ticket"
with the necessary information to the Queue and call
notify() to alert one of the M threads that there's
work to do.
A thread awakens, checks the Queue and finds it
non-empty, removes a "job ticket," and goes to work
generating the appropriate report. When it's done,
it cycles back to check the Queue and see if there's
more work.
The nice thing about this structure is that it lets you
adjust M to make best use of your system, independently of N.
On a small system you might use M=1; on a large system you
might choose M=10. Choose any (positive) value for M and
just start putting job tickets on the Queue; the worker(s)
will eventually get the work done.
As for the larger question of whether to do this work on
independent threads or just call your method in-line, there
are trade-offs:
- Writing a threaded program is more difficult. It's a
good deal easier in Java than in many other languages,
but it must be admitted that a threaded structure is
more involved and offers more chances to make errors.
That whole Queue business would be unnecessary if you
did things in-line; you can go wrong by forgetting to
synchronize access to it, or by misusing wait() and
notify(), and so on.
+ Using threads can reduce latency. If somebody clicks
a button and your event handler generates the report
in-line, the GUI is essentially frozen for as long as
the report generation takes. If instead you delegate
the job to a separate thread, you can go back to keeping
the GUI "alive" right away.
+ Using threads can increase throughput. If several
reports are being generated simultaneously by different
threads, the time that thread T1 spends waiting for I/O
need not be idle time because thread T2 can put it to
use to do some computing. Even if all of T1, T2, ...
T_M are I/O-bound, they can at least give the file system
M things to work on at a time instead of just one; this
usually enables the file system to increase its own level
of parallelism. Result: More total work per unit time.
You'll need to make your own decision about what's important
to your situation.
--
Eric.Sosman@sun.com