Re: analysis of java application logs
On 23 Mai, 09:50, Ulrich Scholz <d...@thispla.net> wrote:
I'm looking for an approach to the problem of analyzing application
log files.
I need to analyse Java log files from applications (i.e., not logs of
web servers). These logs contain Java exceptions, thread dumps, and
free-form log4j messages issued by log statements inserted by
programmers during development. Right now, these man-made log entries
do not have any specific format.
What I'm looking for is a tool and/or strategy that supports in lexing/
parsing, tagging, and analysing the log entries. Because there is only
little defined syntax and grammar - and because you might not know
what you are looking for - the task requires the quick issuing of
queries against the log data base. Some sort of visualization would be
nice, too.
Pointers to existing tools and approaches as well as appropriate tools/
algorithms to develop the required system would be welcome.
I once did a project for our Ruby Best Practices blog. The code is
over there at github:
https://github.com/rklemme/muppet-laboratories
Explanations can be found in the blog. This is the first posting of
the series:
http://blog.rubybestpractices.com/posts/rklemme/005_Enter_the_Muppet_Laboratories.html
This works different from what you want: log files are read and
written out to small log files according to particular criteria. But
you could reuse the parsing part (including detection of multi line
log statements) and write what you found into a relational database.
If you have it in the DB you can query for at least timestamp, log
level, message content and probably also thread id and class. If you
want to do custom tagging you could do that once the data is in the
database.
Since we do not know what goal your analysis has and how many
different questions to want to ask the data it's not entirely clear
whether that would be the optimal approach for your problem. One
variant to the above would be to provide the parsing process a number
of regular expressions with a label attached and label all log entries
during insertion into the database. But since modern relational
databases usually also support full text indexing and regular
expression matches that might also be solved with a view. If your
data volume is large you need to additionally make sure this remains
efficient.
Kind regards
robert