Re: FFEF behaviour
Alex Blekhman wrote:
<rpv30@yahoo.co.in> wrote:
Actually, we are trying to read lines from a file and
extract integers from it. We are facing problems when we hit
a BOM, which somehow has crept into the file. I debugged the
piece of code and traced the failure to swscanf. So i wrote
this program to simulate the scenario.
I understand that BOMs in the middle of the file need to
be ignored, but i just wanted to understand this behaviour.
The problem is (besides 0xFEFF being invalid character) that
your format string doesn't match the input. Your format
string says that input must be 64-bit integer while there is
some non-white-space character, which cannot be interpreted
as integer.
Sorry, but you're wrong: codepoint 0xfeff is a zero-width, no-break space in
Unicode. The non-character which you are possibly referring to is 0xfffe,
which is explicitly guaranteed to not be ever used for a character. So,
since it is whitespace, it should be skipped in the input operation.
std::wstring testString;
testString = 0xfeff;
testString += L"123456";
What I wonder is how the second statement is interpreted. I would have
expected an explicit cast to a wchar_t to be necessary. Try this one
instead:
wchar_t const str[] = {
0xfeffu, L'1', L'2', L'3', L'4', L'5', L'6', L'\0'
};
I don't know why it works under Windows 2000 and doesn't
under Windows 2003 Server. It doesn't work on my Windows XP
SP2, too.
I'd say it's a quality of implementation issue and the C libs used on one
system are worse than those on the other. Maybe it's possible to
fix/upgrade/replace them.
Uli