Jump to content

Problem with Unicode Greek


Donald Cobb

Recommended Posts

I'm not sure if this should be reported as a bug or a feature request. Also, my description of the situation may be somewhat approximative.

 

A few years back, the decision was made to export unicode Greek differently. If I'm getting this right, previously, when Accordance exported Greek texts with accents, it would spit out an accented unicode letter as one character. Then, sometime around 2020, I think, this was changed, so that the accents were one character and the letter they accented was another. I don't remember if it was @Mark Allison or @Joel Brown who worked on this and gave an explanation at one point [edit: it was Joel, here]. Although neither of them works with Acc. anymore, perhaps one or the other could weigh in and give a better explanation than I'm able to.

 

The problem is that, as it is now, depending on the font, the accents often no longer line up correctly. I notice this when I copy and paste from Acc. into this forum but also in Word processors, depending on the font. Up to now, this has been a minor nuisance. However, a small theological journal recently printed an article of mine and ended up having to strip most of the accents off the Greek quotes, because of insuperable problems with the accents. So it's become more than a nuisance and a real problem.

 

This problem has also been brought up in the past. The following is just one example (there are others):

 

 

Could Accordance please go back to the older system? I remember that a rationale was given for changing this (it was given here). But the problems it produces are really frustrating. At the very least, could an option be created so that those who need to could export the accents as one character with the letters they accent? This would take care of a long-standing frustration. Thank you!

 

 

 

Edited by Donald Cobb
  • Like 2
Link to comment
Share on other sites

Thanks for this. I’m showing this to @David Lang to get some insight on it. This was before I came to work here, so I’ll see what I can find out.

Link to comment
Share on other sites

Yes, this has been an ongoing issue to the point I have for sometime now been exporting Greek from Accordance without the accents. 

 

 

Link to comment
Share on other sites

It was definitely the right move to shift from exporting precomposed characters to instead exporting the industry-standard composite characters. Precomposed characters are the dark ages of Unicode.

When you encounter bad rendering, the fault is with either the font or more likely the text engine you are using. And, no surprise that your publisher had problems with it. Many publishers are famously, surprisingly bad at technical aspects of font-rendering (when you'd think they'd be good at it).

Any publisher for instance should be fluent in any number of tools that converts composite characters to precomposed characters. There are Perl scripts and other utilities like iconv that handle this, if an environment requires precomposed characters.

 

But, sure, Accordance could add an advanced export option to force export precomposed characters for those using inferior text engines or fonts or publishers.

  • Like 4
Link to comment
Share on other sites

52 minutes ago, Joe Weaks said:

It was definitely the right move to shift from exporting precomposed characters to instead exporting the industry-standard composite characters. Precomposed characters are the dark ages of Unicode.

When you encounter bad rendering, the fault is with either the font or more likely the text engine you are using. And, no surprise that your publisher had problems with it. Many publishers are famously, surprisingly bad at technical aspects of font-rendering (when you'd think they'd be good at it).

Any publisher for instance should be fluent in any number of tools that converts composite characters to precomposed characters. There are Perl scripts and other utilities like iconv that handle this, if an environment requires precomposed characters.

 

But, sure, Accordance could add an advanced export option to force export precomposed characters for those using inferior text engines or fonts or publishers.

 

Publishers are like emperors, and even an emperor whose technology is threadbare is still an emperor.

Bring on the advanced export option even if it is, umm, less advanced! :)

 

@Donald Cobb One possible workaround to investigate is to pass your paper through a converter like Joe said, but do it yourself before sending the paper off.

The linked Stack Overflow page has an answer that uses uconv. It's a Linux tool, but you might be able to install it on Windows 11 under "Ubuntu on Windows" to try it out.

Edited by Lawrence
Link to comment
Share on other sites

2 hours ago, Lawrence said:

 

Publishers are like emperors, and even an emperor whose technology is threadbare is still an emperor.

Bring on the advanced export option even if it is, umm, less advanced! :)

 

@Donald Cobb One possible workaround to investigate is to pass your paper through a converter like Joe said, but do it yourself before sending the paper off.

The linked Stack Overflow page has an answer that uses uconv. It's a Linux tool, but you might be able to install it on Windows 11 under "Ubuntu on Windows" to try it out.

 

For the bit about emperors, yes, absolutely. That's the problem. It's delicate to tell a journal: "It's not me or my Greek, it's your font," or so on. Especially when other contributors don't seem to have the same difficulty. And it could certainly be the text engine. But I've experienced this problem with Accordance unicode Greek in PowerPoint, Mellel and when I copy-paste elements onto this forum. So...

 

I'm on Mac, not Linux or Windows. Are there other options? Would sending it through something like Text Edit or some online site take care of it? The problem is, when there's lots of Greek (which in the case of my article there was), it quickly becomes tedious.

Link to comment
Share on other sites

8 hours ago, Nathan Parker said:

Thanks for this. I’m showing this to @David Lang to get some insight on it. This was before I came to work here, so I’ll see what I can find out.

 

Thank you Nathan. As I said, this has cropped up on a number of occasions with multiple posts. Seeing the suddenness with which the conversion came in 2020, I can't imagine it being too difficult to implement as an option.

Link to comment
Share on other sites

3 hours ago, Joe Weaks said:

It was definitely the right move to shift from exporting precomposed characters to instead exporting the industry-standard composite characters. Precomposed characters are the dark ages of Unicode.

When you encounter bad rendering, the fault is with either the font or more likely the text engine you are using. And, no surprise that your publisher had problems with it. Many publishers are famously, surprisingly bad at technical aspects of font-rendering (when you'd think they'd be good at it).

Any publisher for instance should be fluent in any number of tools that converts composite characters to precomposed characters. There are Perl scripts and other utilities like iconv that handle this, if an environment requires precomposed characters.

 

But, sure, Accordance could add an advanced export option to force export precomposed characters for those using inferior text engines or fonts or publishers.

 

Just as an add-on to what I wrote above. Sure, I suppose you could say that treating the letter and the accent as one and the same character is outdated. Yet, if I'm not mistaken, when using a polytonic Greek keyboard and typing directly into a word processor, etc., the two become inseparable. E.g., erasing one erases the other. I don't doubt that there's stuff going on in all this that I don't understand; I'm not a tech guy. But I also know that Greek characters exported from Acc. don't act in the same way as when I write using polytonic Greek.

Link to comment
Share on other sites

37 minutes ago, Donald Cobb said:

 

For the bit about emperors, yes, absolutely. That's the problem. It's delicate to tell a journal: "It's not me or my Greek, it's your font," or so on. Especially when other contributors don't seem to have the same difficulty. And it could certainly be the text engine. But I've experienced this problem with Accordance unicode Greek in PowerPoint, Mellel and when I copy-paste elements onto this forum. So...

 

I'm on Mac, not Linux or Windows. Are there other options? Would sending it through something like Text Edit or some online site take care of it? The problem is, when there's lots of Greek (which in the case of my article there was), it quickly becomes tedious.

 

Mac is even closer to Linux since its core is UNIX compliant. Open up a terminal and give it a try.

 

Have a look at this thread: <https://apple.stackexchange.com/questions/201590/uconv-on-mac-os-x-anywhere>.

Edited by Lawrence
  • Like 1
Link to comment
Share on other sites

14 hours ago, Joe Weaks said:

It was definitely the right move to shift from exporting precomposed characters to instead exporting the industry-standard composite characters. Precomposed characters are the dark ages of Unicode.

When you encounter bad rendering, the fault is with either the font or more likely the text engine you are using. And, no surprise that your publisher had problems with it. Many publishers are famously, surprisingly bad at technical aspects of font-rendering (when you'd think they'd be good at it).

Any publisher for instance should be fluent in any number of tools that converts composite characters to precomposed characters. There are Perl scripts and other utilities like iconv that handle this, if an environment requires precomposed characters.

 

But, sure, Accordance could add an advanced export option to force export precomposed characters for those using inferior text engines or fonts or publishers.

In theory, abandoning precomposed characters is the right thing to do. In practice, most of the world is not yet ready for this. If OakTree was a giant company like Apple, Microsoft, or Google, it could force the rest of the world to get its act in order by just doing the right thing. But OakTree doesn’t have that kind of market power, so when it does the right thing in this case, it just creates a problem for many of its users.

 

Also, converting to precomposed forms with uconv will work for a text file like Markdown, but not with a binary format like .docx. I suppose one could write a script that opens the .docx file as a zip file, reads the XML text, converts it to precomposed form, and writes it back to the zip, but this is way beyond the capability of an average user.

  • Like 1
Link to comment
Share on other sites

I spoke with David on this. He said at the moment, @Joe Weaks suggestion is the recommended workaround, and that I can file a feature request for the precomposed characters.

Link to comment
Share on other sites

  • 4 weeks later...
On 5/26/2023 at 8:55 PM, Nathan Parker said:

I spoke with David on this. He said at the moment, @Joe Weaks suggestion is the recommended workaround, and that I can file a feature request for the precomposed characters.

 

Hi @Nathan Parker, I've been meaning to get back to this for a while. Some additional details. In word processors, the way decomposed characters are handled is uneven. LibreOffice seems to do a good job all around. Mellel, depending on the font, either handles this either well (Arial, SBL Greek) or terribly (Times New Roman). I don't have a recent version of MS Word, but the one I do have acts similarly to Mellel. 

 

I find that feeding a text through TextEdit and converting it to plain text and then sending it to, e.g., Mellel seems to take care of the problem. I've noticed that, even on this forum, the decomposed characters are not reproduced the way they should be. I'm not sure how this will look on different screens, but on Safari, the accents aren't aligned. Notice especially the grave accents:

 

Τοῦ δὲ  Ἰησοῦ Χριστοῦ ἡ  γένεσις οὕτως ἦν. μνηστευθείσης τῆς μητρὸς αὐτοῦ Μαρίας τῷ Ἰωσήφ, πρὶν ἢ συνελθεῖν αὐτοὺς εὑρέθη ἐν γαστρὶ ἔχουσα ἐκ πνεύματος ἁγίου.

 

Here's the same thing sent through TextEdit:

 

Τοῦ δὲ  ⸂Ἰησοῦ Χριστοῦ⸃ ἡ  ⸀γένεσις οὕτως ἦν. μνηστευθείσης τῆς μητρὸς αὐτοῦ Μαρίας τῷ Ἰωσήφ, πρὶν ἢ συνελθεῖν αὐτοὺς εὑρέθη ἐν γαστρὶ ἔχουσα ἐκ πνεύματος ἁγίου.

 

Thanks for continuing to look into this.

 

 

Edited by Donald Cobb
  • Like 1
Link to comment
Share on other sites

Donald,

Nice addition to the discussion.
And to confirm, yes for my rendering in Safari the first example using pre-composed characters definitely renders inferior to the second line that consists of combined characters.
But this actually reinforces the point; that the base line export for generators such as Accordance needs to be pre-composed. It is not difficult to get pre-composed characters to convert and/or render well. In this instance, all you did was paste into and copy out of TextEdit. But there are many technical use cases where pre-composed characters are what's needed. Hence, any generator of content such as Accordance really needs to provide (at least) pre-composed characters. Converting pre-composed to combined characters is not 100% predictable, and has some subjective calls. But trying to get pre-composed characters from combined characters is even more unpredictable.

  • Like 2
Link to comment
Share on other sites

Thanks for the feedback! I'll pass this along.

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...