Re: Displaying Non-ASCII Characters in C++
* tushar.saxena@gmail.com:
This post is a follow up to the post at :
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/83af6123fa945e8b?hl=ug#9eaa6fab5622424e
as my original question was answered there, but I have some additional
problems now.
Basically what I want to do is : Given an input UTF-8 encoded file
containing HTML sequences such as "&", I want to be able to
replace these sequences with their UTF-8 representations (i.e. "&")
What I have so far: Looking at some of the source code of the Mozilla
Firefox project, I have a small class that can convert the HTML
sequences into a number representing the Unicode value of that
character. i.e. "&" is represented by a Unicode value of 38
(source : http://www.ascii.cl/htmlcodes.htm)
My question: How can I use this unicode value to convert it into the
character "&" and write it to a file/display on the terminal? I tried
using something along the lines of printf("\u0012"), but that returns
the following compilation error : "\u0012 is not a valid universal
character"
Basically you have to convert from Unicode to whatever character set
your terminal (or other display) expects.
Most systems today use characters sets that are extensions of ASCII, and
in particular, ISO Latin-1 is an extension of ASCII, and Unicode is in
turn an extension of ISO Latin-1.
Characters in the ASCII range (0..127 inclusive) therefore probably need
no translation. On Windows, characters in the Latin-1 range (0..255)
can also be displayed directly by setting the console window to codepage
1252 (Windows ANSI Western), because CP 1252 is a superset of Latin-1.
On a system expecting UTF-8 you'll have to do conversion to UTF-8 for
characters outside the ASCII range, but unfortunately the C++ standard
library offers no means of doing conversions to particular character
sets; you might try the conversions ("narrow", "widen") in the locale
part of the library, and if they work, great, but if not, uh, not so
great... :-)
Cheers, & hth.,
- Alf
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?