Error processing XHTML after upgrading to Ver 9.12.0

3
I’m getting an error com.mendix.modules.microflowengine.MicroflowException: Error processing XHTML     at  TestDoc_Word (DocumentExport : 'Generate Word 2007 (.docx) document using template 'TestDoc') Advanced stacktrace:     at com.mendix.modules.microflowengine.MicroflowUtil.processException(MicroflowUtil.java:83) Caused by: com.mendix.systemwideinterfaces.MendixRuntimeException: Error processing XHTML     at com.mendix.documentexporter.focomponents.DynamicLabel$1.run(DynamicLabel.java:149) Caused by: org.xml.sax.SAXParseException: The entity "ldquo" was referenced, but not declared. This happens when the the Render XHTML property is set to true on the template, the issue is triggered by any HTML entity like   etc. this was working in ver 8.18.13 only became an issue once we upgraded to  Ver 9.12.0. The templates text with the HTML entities are created by cKEditor  Any ideas on how to fix?    
asked
10 answers
2

I ran into this same issue after an upgrade to 9.12.2.  I specifically had issues with entity “bull” and “nbsp”, which are the bullet and non-breaking space characters.  I am thinking of adding a microflow that would go through all the commonly used HTML entities used for typography and replace them with the Unicode value.  This worked in my initial testing, where I did the following before passing it into Generate Document.

replaceAll($HTMLToChange,'•','•')

replaceAll($HTMLToChange,' ',' ')

In Ryan’s example, it could be fixed with:

replaceAll($HTMLToChange,'“','“')

However I’m hoping this gets fixed in 9.12.3, so I will likely wait for the fix.  Since I am worried I will forget something that will break it in Production.

answered
1

Since Mendix Studio Pro 9.10.0, we forbid the use of external entities (like   and ”) for security reasons. Therefore, entities like ” will indeed fail.

We suggest either using a numeric symbol instead (for entities like ”)

“I am in quotes“

or using a Unicode value directly (such as mentioned earlier by Michael). The full list of character entities with their numeric values can be found here. Entities that are used to escape reserved HTML characters are supported, e.g. <, >, ', &, .etc.

answered
1

Hi Guys,

 

thank you very much for your feedback and help. 

 

I have followed the tips and tried something but found out something which I didn´t expect, therefore I want to give you feedback from my side:

 

First I replaced the critical symbol (in my case  ) by its Unicode value like written by Michael Hero: replaceAll($HTMLToChange,' ',' ') and it worked. 

 

With a bad feeling in the background if some other symbols and characters could make trouble I implemented the usage of the unescapeHTML4 routine like suggested.

 

The   problem was also solved but I got a lot of other problems for example with this text:

 

Mustermann, Max (R&D) 
<max.mueller@siemens.com>

 

it was converted into:

 

Mustermann, Max (R&amp;D) 
&lt;max.mustermann@siemens.com&gt;

 

which was originally working fine with the PDF converter but after using the unescapeHTML4 routine it was converted to:

 

Mueller, Max (R&D)

This leads to an error because the compiler expected an command because of &…

 

 <max.mustermann@siemens.com>

This lead also to an error because the compiler was thinking this is an start tag of HTML.

 

Therefore now I switched back to an replacement of critical value manually.

 

Only for your information that escapeHTML is not really solving this problem. 

 

 

answered
0

Hi Ryan, were yo able to fix this? We are running into the same issue at the moment.

answered
0

Few weeks ago I met the same problem and I had to create a branch with the previous version.

At the time I read the rumor that this will be fixed in 9.13 (I didn’t try it btw).

Vale

answered
0

Sjoerd not yet, but I’m testing 9.13.0 as per Valentina will update as soon as I know

answered
0

Unfortunately, the problem still persists in Ver 9.13.0

answered
0

Unfortunately, the problem still persists in Ver 9.21.0

answered
0

This issue can easily be solved by unescaping the HTML string with a custom java action.

Eg:

import org.apache.commons.lang.StringEscapeUtils;
import com.mendix.systemwideinterfaces.core.IContext;
import com.mendix.webui.CustomJavaAction;

public class UnescapeHTML extends CustomJavaAction<java.lang.String>
{
	private java.lang.String String;

	public UnescapeHTML(IContext context, java.lang.String String)
	{
		super(context);
		this.String = String;
	}

	@java.lang.Override
	public java.lang.String executeAction() throws Exception
	{
		// BEGIN USER CODE
		return StringEscapeUtils.unescapeHtml(String);
		// END USER CODE
	}

	/**
	 * Returns a string representation of this action
	 * @return a string representation of this action
	 */
	@java.lang.Override
	public java.lang.String toString()
	{
		return "UnescapeHTML";
	}

	// BEGIN EXTRA CODE
	// END EXTRA CODE
}

 

answered
0

Dear Thijs What you have done doesn't make a lot sense. We are wasting a huge amount of time trying to find the cause of these issues. The most recent one, was because of single quotes in text that your widget converted to a prohibit text. Why would single quotes be insecure? However you need to realise that the issue is actually of your own doing. it is the CKEditor for Mendix that is causing these issues. It is creating the very characters that you are then later on causing the whole document creation to fail. We have to go by trial and error through the text, removing bit by bit until we find what causes the issue. Surely before you prohibit these you need to stop generating them? Do you know when this will be fixed? They are really causing a lot of issues and time wasting. thank you

answered