Jump to content

Incorrect rendering of em dash within list item element of imported HTML user tool

Steven S

Recommended Posts


  • Accordance 14.0.7
  • Windows 11 [Version 10.0.22621.2134]


After importing an HTML user tool, Accordance incorrectly renders any em dash (—, U+2014) within a list item element (<li>) of an unordered list (<ul>), but only if the em dash does not immediately follow an HTML tag. For example, the list item "<li>word—</li>" is incorrectly rendered as "word,Äî"; but the list item "<li><i>word<i>—</li>" is correctly rendered as "word—".


Importing the same HTML user tool into Accordance 3.3.4 produces the correct output in both cases, so this is probably a regression in 14.


Reproduction frequency


Reproduction steps

  1. Run the File > User Files > Import User Tool... command.
  2. Set Import to HTML.
  3. Select Create a new User Tool.
  4. Set Interpret as to General.
  5. Click OK.
  6. When prompted for the HTML file to import, select the attached user-tool-test.html file (user-tool-test.html).
  7. Click Open.
  8. When prompted for the new user tool file name, accept the default value, and click OK.


Expected behavior
The four em dashes in the sample user tool should be rendered as em dashes.


Actual behavior
The first, second, and fourth em dashes in the sample user tool are rendered correctly. However, the third em dash is incorrectly rendered as the three-character sequence ",Äî" (\x2C\xC4\xEE). See the following screenshot:




Strangely, if I copy the incorrectly rendered characters ",Äî" out of Accordance and paste them into another text editor, the pasted text correctly appears as a single em dash ("—"). Note that, because of this behavior, the code points I listed above for the sequence ",Äî" may not be correct because I had to guess at which characters were displayed rather than being able to view them in a text editor. Apologies ahead of time if they're the wrong values.

Edited by Steven S
Link to comment
Share on other sites

There were some changes to em dash in 14.0.7 in the release notes. Is this what you're referring to?

Link to comment
Share on other sites

1 hour ago, Nathan Parker said:

There were some changes to em dash in 14.0.7 in the release notes. Is this what you're referring to?


I don't believe so. The only thing I saw in the 14.0.7 release notes with respect to dashes is "Copy as Verse Reference will now use a standard dash character in all cases, to prevent encoding issues with En-Dash" (note en dash, not em dash, although I wouldn't be surprised if the devs addressed multiple dash characters in that change). My issue is specifically during user tool import, and, as you can see above, it works correctly in all but one case.

Link to comment
Share on other sites

@Nathan Parker, I slightly modified my test case to demonstrate that the issue affects more than just an em dash. In the attached file, user-tool-test-2.html (user-tool-test-2.html), I repeated the same test in the OP, but this time with a normal quotation mark (U+0022) and a right double quotation mark (U+201D). When imported, 14.0.7 renders the normal quotation mark correctly in all four cases, but it renders the right double quotation mark incorrectly in the same case as was observed with the em dash. But now, instead of displaying ",Äî" (\x2C\xC4\xEE) in place of the expected character, it displays ",Äù" (\x2C\xC4\xF9).


Once again, Accordance 3.3.4 correctly renders the expected character in all cases.


Based on these new tests, I'm going to guess that the problem is not specific to an em dash, but rather to at least some subset of Unicode characters whose UTF-8 encoding consists of more than one byte.


Link to comment
Share on other sites

OK sounds good. I'll test it this week and see if I need to report it as a bug along with the other one.

  • Thanks 1
Link to comment
Share on other sites

I can confirm this, so I'll report it.

  • Like 1
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in

Sign In Now
  • Create New...