ucs2 data and bidirectional languages

80 pts.
Tags:
RPG Language Support
Unicode
I have an application that has the data stored in ucs2 fields.  Moving the data from a ucs field to an alpha field works fine on our system.  On another system, when one line is in English the data is fine, the next line being hebrew seems to be performing a move instead of a movel no matter how I code it (MOVEL or %char) and I end up with the data in the wrong positions of the field.  Before using ucs2 data the process worked fine.  The program is compiled CCSID(*CHAR:*JOBRUN).  The system settings are CHRID 941, Code Page 424, CCSID 65535.  The systems are V6R1M0.

Can anyone explain what is happening and how I fix it?  I need the code to be universal.

Answer Wiki

Thanks. We'll let you know when a new response is added.

Hebrew, as you know, is a right-to-left language and, when used in combination with left-to-right languages such as English, provides bidirectional formatting of text.

Unicode stores text in a logical order. That is, the characters are stored left-to-right in the order in which they are typed even though the end-user is seeing the text right adjusted and flowing left.

CCSID 424 stores text in visual order. That is, the characters are stored right-to-left in the order in which they are typed and seen by the end-user.

When you are “moving” the data from the Unicode variable to the alphanumeric variable the system is converting the Hebrew data from logical order to visual order (due to the CCSID differences). The first character in an ‘English only’ line has a left-to-right orientation and so the visual text starts at the left. The first character in a ‘Hebrew only’ line has a right-to-left orientation and so the visual text starts at the right. A line of text containing both English and Hebrew would have it’s orientation set by the initial logical character encountered (that was not neutral in terms of orientation). Visually the Hebrew text should be where you are finding it in the storage associated with the alphanumeric variable. This is obviously “different” from what you expected, though it is “correct”.

A question for you. How is the Hebrew data, in alphanumeric form, being used by your application? Logical ordering is generally the preferred approach for any type of processing (as opposed to visual, which is a pain for sorting, etc), in which case you may want to delay when you convert to visual form. Normally you would convert to a visual CCSID such as 424 only for display/print purposes, in which case the user should be seeing what they would expect (right-to-left Hebrew text).

Depending on what you are trying to do, other EBCDIC CCSIDs are available and will provide a left adjusted result — but they will also leave the Hebrew data in logical order (that is, not suitable for direct display or print). These other CCSIDs are 62211, 62235, and 62245.

To be able to help, more information is really needed on what the problem is.

Discuss This Question: 2  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • anncyb
    The data is moved to alphanumeric in order to perform any user specified editing (padding, numeric editing, etc.) if it does not contain double byte data. This generic 'editing field' is a length that is large enough to support multiple data field sizes, thus the problem. Editing is performed and the data is then passed to another program to be processed into a print file/stream along with the length of the data as ucs2 data. The print driver program now contains a data stream that contains extra blanks when there is the hebrew data and ends up cutting off any data expecting it to begin in the first position of the stream passed, when now it may actually start anywhere, depending on the original length of the ucs field. In order to support any language would by best option be to change the value of the generic editing field to be filled using a %subst? Or are there more options like %char that I am missing?
    80 pointsBadges:
    report
  • bvining
    Given your description of the processing I would suggest changing your job CCSID from 424 to a logical CCSID such as 62211. 62211 uses the same code page as CCSID 424 (code page 424) and will not format the buffer in the manner visual CCSID 424 is. I suspect that trying to use CCSID 424, with edit functions that appear to not be Bidi aware, will just lead to more grief in the future even if you do use %subst to position the data where the edit function expects it to be. The edit functions, if not bidi aware, would (as an example) probably insert edit characters by pushing subsequent characters "to the right". This action could cause truncation of significant characters as the "push" should be to the left (where the "trailing" blanks are with a visual CCSID). I would also suggest that converting the data from UCS2 to EBCDIC, for editing purposes, and then back to UCS2 for presentation purposes (as it appears from your note that the print data stream is UCS2) is a bit odd. It would be more typical to edit the data while it's encoded as UCS2. I'm guessing that the edit functions are currently written with the assumption that the data to be editted is encoded as EBCDIC (and probably making all kinds of assumptions on code point assignments). And as an aside, "universal" code generally has localization routines that are specific to the language being processed. Generic edit functions, that are not aware of the language being processed, are doomed to make "mistakes" due to their assumption that there are universal contants in how data should be edited -- there aren't. But I believe CCSID 62211 may get you a bit further down the path. Good Luck, Bruce Vining
    6,475 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following