Re: Write a program that reads a Java source-code file and displays
all the comments.
anon36@yahoo.com wrote:
I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
Java (4th edition):
"Write a program that reads a Java source-code file (you provide the
file name on the command line) and displays all the comments."
This is at the end of a section about regular expressions. We have
just learnt how to use appendReplacement().
Does anyone have a solution to this exercise? or any hints?
Joshua Cranmer wrote:
The way I would do it would be to create a Reader on the file, read each
character and perform simple lexical analysis there, like so:
boolean inEOLComment = false, inCComment = false, inString = false;
for each character in stream:
if inEOLComment:
print character
if character is newline, inEOLComment = false
else if inCComment:
if character is * and next is /, inCComment = false
else print character
else if inString:
if character is \, skip next character
else if character is ", inString = false
else if character is /:
if next character is /, inEOLComment = true
else if next character is *, inCComment = true
else if character is ", inString = true
else, do nothing
(writing the actual Java code is left as an exercise to the reader)
Another approach is to borrow from the LEX / YACC approach, and have a "lexer"
extract tokens from the input, along with a token-type enum identifying it as
"String literal", "identifier/keyword", "punctuation", etc. The output of the
lexer becomes the input to the parser, which examines each token and its
identifier, and operates a state machine with, say, states of IN_COMMENT and
NOT_IN_COMMENT.
You run the parser in a loop, with the interpretation of each token depending
on the current state. So, for example, if you hit IN_SINGLE_LINE_COMMENT
state, you ignore each token up until the LINE_END token. If you reach
IN_MULTI_LINE_COMMENT state, you ignore each token until you reach the
END_COMMENT token ("*/"). Strings inside the comment do not trigger a false
END_COMMENT because you've already lexed such strings into tokens. The parser
will not see an string-embedded "*/", it'll only see a STRING_LITERAL token.
--
Lew