Re: How to get file count under a directory?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 29 Sep 2009 00:45:15 -0700 (PDT)

Message-ID:

<cd1adf86-8b53-4875-aabd-62f743e70ac7@a21g2000yqc.googlegroups.com>

On Sep 28, 9:18 pm, Marcel M=FCller <news.5.ma...@spamgourmet.com>
wrote:

rockdale wrote:

I have an application which writes log files out. If then
log file size is great than let's say 1M, the application
will create a new log file with sequence number. the log
file format likes mylogfile_mmddyy_1.txt,
mylogfile_mmddyy_2.txt. ....without upper limit.

don't do that.

Use a time stamp and use a naming convention that follows a
canonical sort order. E.g. mylogfile_yyyy-mm-dd_hh-mm-ss.txt.
The guys that must service your application will appreciate
greatly. Furthermore you should prefer UTC time stamps for
logging to avoid confusion with daylight saving.

That sounds like a good idea. I'm used to putting the date in
the logfile name, and using a sequential number (with a fixed
number of digits, so a straight sort will put them in order),
but using the time does sound better.

Now the problem is if my application get restarted, I need
to know what is the largest sequence number of my log file.

Either create always a new log if the application gets
restarted or forbear from the size limit and use a time limit
instead. I would recommend the latter. If your application is
under heavy load the files grow larger. What's bad with that?

Files that are too large are hard to read and to manipulate.
Depending on the application, a time limit might either result
in an occasional file which is awkwardly large, or a lot of very
small files.

That doesn't mean that you should forego using time completely.
If there are particular moments when the application is largely
quiescent, those are good times to rotate the log; it reduces
the probability of a sequence which interests someone spanning
two different files. (Ideally, of course, the files should be
small enough so that the reader can easily concatenate two of
them, in cases where what interests him spans a rotation.)

From the service point of view it is a big advantage to have a
deterministic relation between the file name (in fact
something like a primary key) and the content. And it is even
better if the canonical file name ordering corresponds to
their logical order.

I am thinking in a loop from 1 to like 100000, check if the
file exist, if it does not , then I get the max sequence
number I need.

From that you see how bad the idea is. Everyone who searches
for a certain entry has to do the same loop, regardless if
program or human. In fact you have absolutely no advantage
over putting all logs of a day into a single file in this
case.

The readers can do a binary search. For that matter, so could
the program. (But again depending on the application, there may
be so few files that it isn't worth it.)

But this method looks very awkward. Is there another way to
do this(get the max number for a series of similar files)?

No. And since most file systems do not maintain a defined sort
ordering, there is no cheaper solution in general. You could
scan the entire directory content, but this is in the same
order.

My applicaiton is running on windows platform but did not
using MFC function very much.

That makes no difference here.

Using rotating logs with a fixed time slice is straight
forward to implement, although in case of application
restarts. You could use a simple and fast hash function on the
time stamp, that controls log file switches.

You don't even need that. On program start-up, it's easy to
calculate the last rotation time from current time; just open
that file for append. There is some argument, however, for
always opening a new log file on program start-up.

Every time the hash changes a virtual method that switches the
log could be invoked. Only his method implements the full
rendering of the file name scheme.
This makes it very easy and with good performance to implement
different cycle times, e.g once per week, once per day and
once per hour.

And if you are even smarter you could add a functionality that
cleans up old log automatically once they exceed a configured
age. This prevents from the common issue of full volumes.

This is usually done by means of a cronjob (or whatever it is
called under Windows---it surely exists), using a fairly simple
script. Typically, the log files will go through a stage where
they are compressed, before being completely deleted. (E.g.
compress anything older than a day, and delete anything older
than a week.)

--
James Kanze