Jump to content

Incorrect Word Wrapping of Chinese Text Display


Wing Kei Wong

Recommended Posts

To illustrate the problem, let’s take John 13:3 as an example. 

 

The following three screenshots are taken with different line widths. For the case 1 and 3, the results are correct, but the case 2 is undesirable. 

 

For easy reference, John 13:2 is also displayed for word-counting purposes. The verse's 26th, 27th and 28th Chinese words (including punctuation) are marked as red, yellow and blue, respectively.

 

For case 1, the line occupies 26 Chinese words. Two verses are rightly aligned.  

image.png.434d7a0a22a728f2d8538c9e3a43004c.png

 

For case 2, the line occupies 27 Chinese words. The first line of John 13:3 becomes blank. 

image.png.9cdd111086ed303e041ec5421619aa98.png

 

For case 3, the line occupies 28 Chinese words. Two verses are aligned again.

image.png.4cd46087fc4ba2bdac5819f6ab9306f4.png

 

For case 2, the last word of the first line should be the 27th word (yellow), and the next word (blue, Chinese comma) should be wrapped to the second line. To avoid putting punctuation at the beginning of a line, the program recalculates the displaying method, but sadly it puts all content of this verse to the second line, leaving the first line blank. 

 

This problem is quite obvious, especially in paragraph mode. See the following example; there are several blank areas in the paragraph with only 11 verses.  

image.png.f06c40ff665b82b0220d6e1d4abbf96b.png

 

The following is a suggested way to fix Chinese word wrapping (We take Microsoft Word as an example here. Other word processors which is able to handle Chinese word wrapping may do as well, e.g., Pages, LibreOffice).  

image.png.6c871b90caee555cd34fc3920cd67183.png

 

As we can see, the first line is not completely blank anymore. The first 26 words can still rest on the first line. The 27th word (yellow) is wrapped to the second line even though there is still enough space in the first line for the purpose that the punctuation is not positioned at the beginning of the second line. 

 

Note that such handling may occasionally leave one-word space at the right side of a line. In other words, the result is left justified. In Accordance, the English text has already been displayed as left justified, so it is also acceptable to have the same handling for the Chinese text. 

 

This bug is tested on Mac but is also found on iOS devices. Windows platform is not yet tested.

 

Here is the platform information:

 

Macbook Air M2

macOS 13.3.1 (a)

Accordance 14.0.5

 

iPhone 13

iOS Version 16.3.1

Accordance 3.4.0

Link to comment
Share on other sites

I’ll show this one to a team member.

  • Like 1
Link to comment
Share on other sites

@Wing Kei Wong @Nathan Parker

 

In each case where there is a blank line, it seems that if it printed normally, there would be a comma at the start of the next line.

Chinese is not alphabetic, so each character represents a whole word, and spaces are not normally required between characters.

But if the word-wrap algorithm is expecting words to be delimited by spaces, perhaps it treats each Chinese character as a 'letter' and the whole clause between punctuation marks as a 'word'. This would explain the weird formatting.

 

@Wing Kei Wong If that's the case, it might help the programmers if you can indicate where they should break a long line. For example, is it acceptable to break the line just before the comma and to start the next line with the comma? If not, is it enough to have one character before the comma?

Edited by Lawrence
Link to comment
Share on other sites

Hi @Lawrence,

This is just a thought, but I don't think it is the comma since even if it needed to break at commas, multiple "blocks" of characters could fit on one line. I can attach a screenshot trying to show what I mean.

 

So that is just a thought, and I might be wrong. It also seems logical that it is connected to punctuation somehow, like you said, since the Chinese commas and periods take up more space than in European languages.

Kristin

Bildschirmfoto2023-05-13um00_19_34.png.515ad2bcce93c5010062f61c62ec438d.png

  • Like 1
Link to comment
Share on other sites

Hi @Kristin,

 

That's true - the line-break seems to bring along several clauses into the new line at times, but the last dangling comma still seems to be the trigger. Visually, the issue is more that every character (word or punctuation) takes up one square, so pulling a word from the last position in one line down to the first position in the next leaves a very visible hole at the end of a line. Two-character phrases are common, as are multi-character names in the Bible. Splitting off the last character of a phrase when there's a nice space right there at the end of the line can be cognitively jarring, particularly with names, which are sometimes typeset with a courtesy space. Hence the request for someone who is familiar with reading typeset Chinese to give the programmers some suggestions about more standard ways (in Chinese typesetting) of doing word-wrap.

 

I don't know why word-wrap is pulling multiple blocks into the new line, but if the dangling comma is the trigger, the programmers now have at least a running start to fixing it.

  • Like 1
Link to comment
Share on other sites

Thanks @Lawrence, @Kristin.
 
As @Lawrence said, I suspect that the blank line problem is triggered when the punctuation is assumed to be wrapped at the beginning of a line. To fix it, the simplest way is what I have suggested in the original post: put the nearest preceding non-punctuation word at the beginning of the next line and don't leave the first line blank.
 
 
 
 
For those who are interested in Chinese text processing, the following information may help, but it is quite long.
 
1) Every Chinese character has equal width (one square). It is sometimes called a full-block character.  Unlike the alphabet, one Chinese block character is a word by itself. A chain of block characters can be combined as a multi-character word or phrase.  (As an aside, the tagged Chinese Bible text in Accordance currently attaches the strong number merely to the last character of the corresponding Chinese word or phrase.  We can still make use of this but cannot enjoy the powerful searching features in full measure. I am looking forward to having the enhanced tagging technique applied in Chinese text someday.)
 
2) Chinese punctuation is normally a full-block character. It is also called full-width punctuation, occupying one square space, like a full-width comma, full-width period, full-width colon ... etc. For Chinese punctuation, see the following link, https://en.wikipedia.org/wiki/Chinese_punctuation
 
Yet, there exists half-width punctuation, but we don't go into detail about this topic.  As I know, the Chinese Bible texts in Accordance do not contain half-width punctuation so far.  Anyway, the width of a character is not the cause of the problem that we are now discussing. A combination of full-width and half-width characters may also be wrapped correctly.
 
3) Chinese word or phrase is not delimited by a space character or punctuation. The word-wrapping algorithm for Chinese is not necessarily considering the space character or punctuation as the end of a word or phrase. Besides, there is no such concept of maintaining a multi-character word or phrase on the same line in Chinese. (For the poetry, we will take care of the phrase, but we don't consider this right now)
 
4) As such, splitting a multi-character word or phrase is acceptable. Let's take Abraham as an example. In Chinese, this multi-character word consists of four Chinese characters. When this word is near the end of the line, we can break it at any point and put the rest at the beginning of the next line. 
 
All the following cases are possible.
image.png.83dda4602455b22e4538d4cb5bae161d.png
 
Thus, we can still wrap a long Chinese sentence at any place in need. The only concern is punctuation. 
 
5) As for handling punctuation in word-wrapping,  it belongs to the issue of Chinese typesetting. There are many ways to handle this issue in different applications, like browsers, word processors... etc. In general, there are three ways to go.
 
First, punctuation is allowed at the beginning of a line. In this case, the program simply fills up the line character by character without extra processing.
 
Second, punctuation is not allowed at the beginning of a line but is placed at the end of the original line, even exceeding the margin. It is called hanging punctuation.
 
Third, punctuation is not allowed at the beginning of a line, and the nearest preceding non-punctuation character is wrapped at the beginning of the next line. This is the most common way to do it. I suppose this is also the way that the Accordance program does. This method is fine; I think it is a good way to handle this issue. 
 
As I have said at the beginning, fixing the blank line problem is the primary concern. In fact, the line-breaking rule in Chinese is more complicated, and I don't think the Accordance program has to tackle this problem completely at this moment.
 
For the details of the line-breaking rule, one may find the following link,
Edited by Wing Kei Wong
  • Like 2
Link to comment
Share on other sites

Hi @Wing Kei Wong,

Thank you for the explanation about Chinese and the example of Abraham's name. I find that really interesting. Hopefully the Accordance team will be able to address the blank area, as I agree that needs to be resolved.

Kristin

Link to comment
Share on other sites

@Wing Kei Wong

 

Thanks for the information. Of the 3 possibilities, your first suggestion looks the easiest to code up.

If punctuation is acceptable as the first character on a new line and preferable to weird blank lines, you might want to suggest this to Accordance as the first step to implement immediately for Chinese text, with hanging punctuation etc as a future refinement.

  • Like 1
Link to comment
Share on other sites

@Lawrence

 

Yes, I agree that the first option is the quick fix. Logos program also displays Chinese in this way.  

  • Like 1
Link to comment
Share on other sites

@Wing Kei Wong Have you also reached out to our Tech Support on this one?

Link to comment
Share on other sites

I’d recommend emailing Tech Support on this one. I’m not exactly sure the best way to answer this one.

  • Thanks 1
Link to comment
Share on other sites

  • 5 months later...

May I know if there has been any progress since I reported the issue to Tech Support around five months ago? 

Link to comment
Share on other sites

What did Tech Support tell you?

Link to comment
Share on other sites

They received my email and notified me that they would analyse the case. 

Link to comment
Share on other sites

OK I would recommend following up with them via email for an answer. They are swamped at the moment, so it might be a little bit before they can look into it.

  • Thanks 1
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...