Question

XML mapping; special characters?

2

One of our customers has noted that XML files, which are imported (XML-to-Domain), converted to a different format using microflows, and exported (Domain-to-XML) using Mendix, do not carry over special characters such as Ä or Ü. Instead, they get "ï¿½" etc. I have read a few things about this on the forums but the advice dates from quite a while ago. We are using 2.5.2.1 for this. The incoming XML files have encoding="WINDOWS-1252", but Mendix seems to change this to utf-8 somewhere in the mapping process. Can I get Mendix to read incoming XML files with umlauts and such correctly, and how can I make sure that the output XML also ensures the correct encoding? Thanks UPDATE: Achiel helped me out. Some XML files have an encoding that does not match the encoding statement. Not a Mendix problem :)

asked 2010-12-21

Querijn Chorus

1 answers

Achiel · Accepted Answer · 2010-12-21

The encoding should be picked up by the sax parser. From the inputsource source code:

The SAX parser will use the InputSource object to determine how to read XML input. If there is a character stream available, the parser will read that stream directly, disregarding any text encoding declaration found in that stream. If there is no character stream, but there is a byte stream, the parser will use that byte stream, using the encoding specified in the InputSource or else (if no encoding is specified) autodetecting the character encoding using an algorithm such as the one in the XML specification.

The parser then hands the xml data to the xml importer, one 'java' char at a time. This should result in correctly encoded xml.

That being said, are you sure the data is parsed incorrectly? What does the data look like inside the database?