Re: Charset problem
On Aug 29, 2:14 pm, magloca <magl...@mailinater.com> wrote:
winlin @ Thursday 28 August 2008 08:10:
Hi,
I am creating a PDF from the output received from servlet. There are
special swiss/german accented characters.
One such character's output from servlet is received as &#E1; for
which the unicode is "\u00E1".
What I do here, is replace all the occurrences of &#E1; to \u00E1 and
thus, it displays properly.
However, this only happens on windows. When I try to do the same thing
on Linux machines, it gives me garbage characters.
Those garbage characters look like from the KOI8 character set.
Can anyone help me please?
Technically, KOI-8 isn't a character set; it's a character encoding. But
I assume what you mean is that Cyrillic characters appear in the
output. Since the KOI-8 encoding (as well as Windows-1251, BTW) maps
codepoints to Cyrillic characters that in Unicode (and ISO-Latin1 et
al.) are mapped to the accented characters you want, it seems likely
that whatever it is you're using to generate the PDFs gets confused
about what encoding is in effect. Maybe you could tell us what PDF
generator you're using.
m.
Hi All,
First of all thank you all for the effort you guys are taking to help
me out...
I have further broken the problem into a small program, which gives me
different output
for Windows and Linux (running same version of JAVA - 1_4_2_16).
The output on windows shows up the actual character as expected,
however on Linux it shows up a
garbage output probably using KOI-8R encoding.
Please see if the program helps you get to the bottom of the problem.
I also read in the documentation of Character(version 5.0) that String
and Char arrays use UTF-16 encoding and hope its not a problem.
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
public class TestCharSet {
/**
* Default Constructor
*/
public TestCharSet() {
super();
}
/**
* @param args
* @throws UnsupportedEncodingException
*/
public static void main( String[] args ) throws
UnsupportedEncodingException {
//System.out.println("Special Characters:" + "=E1 =E2 =E4 =E8 =E9 =
=EA =EB =EF
=F2 =F3 =F6 =FA =FC") ;
String stateProvince = "á"; //This is the character =E1
System.out.println("State Province before conversion : " +
stateProvince) ;
String stateProvince_post =
unescapeXMLSpecialCharacters( stateProvince );
System.out.println("State Province after conversion : " +
stateProvince_post) ;
}
/**
* Replaces all occurrences of the substring in the data string
with the
* replacement string.
*
* @param data the string to check.
* @param substring the substring to replace.
* @param replacement the string the substring is replaced with.
* @return the result of the replacement(s).
*/
// @PMD:REVIEWED:AvoidReassigningParameters: Bajrang Gupta
private static String replace( String data, final String
substring,
final String replacement ) {
int index = data.indexOf( substring, 0 ) ;
while ( index >= 0 ) {
data = data.substring( 0, index ) + replacement
+ data.substring( index + substring.length() ) ;
index += replacement.length() ;
index = data.indexOf( substring, index ) ;
}
return data ;
}
/**
* Checks the string on none xml well formed characters, meaning
'<' and
* '&', and if found, escapes these characters and returns a
well formed
* xml string.
*
* @param xmlData the data string to make well formed.
* @return the well formed variant of the xml data.
*/
public static String unescapeXMLSpecialCharacters( String
xmlData ) throws UnsupportedEncodingException {
xmlData = replace( xmlData, "á", "\u00E1" ) ;
return xmlData ;
}
}