Re: Reading LAST line from text file without iterating through the
file?
On 27-02-2011 09:55, Ken Wesson wrote:
On Sat, 26 Feb 2011 13:29:48 +0000, Martin Gregorie wrote:
On Sat, 26 Feb 2011 12:15:21 +0100, Ken Wesson wrote:
On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
a text file contains records. They are variable length records with a
'newline' encoding as the delimiter.
By that definition the concept of "record-based" vs. "not-record-based"
becomes completely meaningless.
It is pretty much meaningless unless you're referring to the way a
programs handles data. Consider a file containing nothing but printable
characters:
- if a C or Java program reads the file byte by byte or parses it
by reading words separated by whitespace then line delimiters are
utterly meaningless and the program doesn't care whether the file
contains records or not.
- OTOH if a different program reads the same file a line at a time, e.g
C using fgets(), Java using BufferedReader.readLine(), then this is
pure record-level access.
But the text file itself is not "record-based". You can implement a
record-based format *on top of text* -- CSV goes further that way -- but
the resulting file, crucially, can still be manipulated with tools
designed for generic operations on arbitrary text files properly. In
particular, this should be lossless on it:
import java.io.*;
public class TextFileCopier {
public static void main (String[] args) throws IOException {
if (args.length< 3) {
System.out.println("Please specify source and" +
"destination file.");
return;
}
File f = new File(args[1]);
InputStream is = new FileInputStream(f);
Reader rdr = new InputStreamReader(is);
File g = new File(args[2]);
OutputStream os = new FileOutputStream(g);
Writer wtr = new OutputStreamWriter(os);
int c;
while ((c = rdr.read()) != -1) wtr.write(c);
}
}
But this won't be lossless on the strange file formats Arne has become
obsessed with. At the reading stage, the record boundaries in those file
formats will be translated into some newline character or another, likely
\u000A. When that happens, the distinction between those and literal
\u000A characters in the source file will be lost and can never be
regained.
Surely you agree that a file format cannot be regarded as a true text
file format unless the above TextFileCopier can copy all files in that
format faithfully?
Actually I think it is a bit weird to test if a file consists
of text lines without the program being line aware.
And the code is rather bad:
- you are not using args[0] (in Java args[0] does not contain the
name of the program)
- you are not calling close on rdr and wtr
but those are easy to fix.
And I am happy to inform you that the above program
actually works with VMS variable length files.
Dump of input:
Record type: Variable
File organization: Sequential
Record attributes: Implied carriage control
End of file block: 1
End of file byte: 16
007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000
Dump of output:
VAX-11 RMS attributes
Record type: Variable
File organization: Sequential
Record attributes: Implied carriage control
End of file block: 1
End of file byte: 16
007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000
So QED.
Arne
PS: for those with a VMS system that want to test themselves,
then remember to set the logical that tells Java to use
variable length files in stream mode.
PPS: I am actually somewhat surprised that it works. It is
not that easy to get something as stream oriented as this
to work in a record world. HP's Java and C engineers
must have been rather smart.