Re: Enhancement request
On Mon, 8 Sep 2008, Martin Gregorie wrote:
On Sun, 07 Sep 2008 18:57:55 +0100, Tom Anderson wrote:
The problem is that it's traditionally considered impossible to
implement readLine without a buffer. At least, if you want to be able
to handle multiple forms of line ending - CR, LF and CRLF.
That is a nice-to-have, though not usually all that useful in practice.
Why not? It means that you can write a program which opens text files and
reads them without having to know which platform it's on. Which, since
java is supposed to be platform-neutral, is rather useful in practice.
Now, i'm not actually sure that the above argument is true.
Likewise. ungetc() is usually implemented on a FILE*, which is buffered,
but I know I've handled line reading quite successfully from unbuffered
input.
Did you deal with input containing any combination of CR, LF and CRLF as
line terminators?
I've never stopped to question it before, but couldn't you just have a
flag in the reader, a boolean lastLineReadEndedWithCR, and the next
time you read a character, if it's set, you check to see if the
character is an LF, and if it is, throw it away.
Yes, until you meet a system that uses CR as the line separator.
No, it'd deal with that fine, that's the whole point. The logic is:
boolean lastLineReadEndedWithCR ;
static boolean isCrOrLf(char ch) {
return (ch == '\r') || (ch == '\n') ;
}
String readLine() {
StringBuffer sb = new StringBuffer() ;
char ch ;
while (!isCrOrLf(ch = read())) {
sb.append(ch) ;
}
if (ch == '\r') lastLineReadEndedWithCR = true ;
return sb.toString() ;
}
char read() {
char ch = in.read() ;
if (lastLineReadEndedWithCR) {
lastLineReadEndedWithCR = false ;
if (ch == '\n') ch = in.read() ;
}
return ch ;
}
I haven't tested that, but the idea is that you make a note if ending a
line on a CR, and then you throw away the next character read if and only
if it's an LF, on the grounds that it was part of the line terminator.
Single CRs or LFs don't result in anything being thrown away.
However, writing code to treat CR or LF as the line termination is
trivial. Its equally easy to save the last character read for comparison
purposes. Then you can discard LF if the last character was CR. This
will recognise CR, LF and CRLF as line separators.
Which is exactly what i said.
Since buffering is almost always a good idea, though, shouldn't you
almost always be using a buffer? And if so, does it matter that
FileReader doesn't have readLine?
Yes, if you're dealing with a medium to large amount of data, but its a
pain in the jacksie when you're reading from an interactive console and/
or want to deal with individual keystrokes.
In which case you won't be using readLine anyway.
Its also somewhat OTT if you're reading a small parameter file.
In that case, it's also a small buffer. Doesn't seem like a huge deal.
I do agree that it would be nicer if readLine didn't require a buffered
stream, and that it doesn't require that in principle. But i'm not all
that bothered that things are the way they are.
The fact that it's only BufferedWriter that has a newLine method, on the
other hand, makes me utterly furious!
tom
--
Baby got a masterplan. A foolproof masterplan.