Question

html code corrupted when modifying a string

0

Hi, In our email module we need to modify the body text when e.g. doing a forward of an email. The original (HTML) body text is retrieved and text (header info like to and cc addresses) is added to the new body text before it is presented to the user. We encounter several issues with emails that contain MSWord copied texts (most of them included in emails received from customers). A lot of MSWord formatting is included in the HTML and when modifying the body text HTML (adding the text) HTML characters like < and > are changed to their HTML code equivalents < causing the HTML to be treated as regular text and as such messing up e.g. bullet lists in the emails. How can we prevent this change of HTML characters to their HTML code equivalent. Kind regards, Brian Remark: Although I agree with cleaning the code to remove MSWord markup, the actual issue is caused by the conversion of the < character to & l t ; (had to add spaces otherwise it renders as <). What is causing this conversion. If I can prevent this conversion, my issue is resolved as well (and I will not have to bother about cleaning externally created emails). Any ideas what might be causing the conversion and how I can prevent it? It seems like the Mendix string modification (in a variable) in the microflow is actually doing the conversion. Any remarks are appreciated.

asked 2013-06-04

Brian Golsteijn

4 answers

Brian Golsteijn · Answer 1 · 2013-09-19

After some more research it turned out that the Rich Text Editor (even the latest version) is doing the incorrect translation of the < tag and as such causing the issue. I tested using a form displaying the email in a text area (so I could read the HTML code) and the replacement did not occur. Once I add the RTE to the form (while leaving the text area in the form as well), the replacement of the < is done in as well the RTE as the textarea (referring to the same attribute).

In my code I now try to replace the specific code before it is posted to the form so the text will be displayed properly to the user. Not a nice solution, but it does work.

Chris de Gelder · Answer 2 · 2013-06-05

According to the documentation tinyMCE filters the word contents when pasted. Looks like a bug in that feature.

BTW the html created by msword can be pretty complex and huge.

Samet Kaya · Answer 3 · 2013-06-05

Maybe you could remove the MS formatting with a java-action. There are java libraries available to do this.

Ronald Catersels · Answer 4 · 2013-06-05

There is in the community commons string util section a HTMLToPlaintext java action. From the documentation: Use this function to convert HTML text to plain text. It will preserve linebreaks but strip all other markup. including html entity decoding.

Regards,

Ronald