X Locale Database Definition

1. GeneralAn X Locale Database contains the subset of a user’senvironment that depends on language, in X Window System.It is made up from one or more categories. Each categoryconsists of some classes and sub-classes.It is provided as a plain ASCII text file, so a user canchange its contents easily. It allows a user to customizethe behavior of internationalized portion of Xlib withoutchanging Xlib itself.This document describes;Database Format DefinitionContents of Database in sample implementationSince it is hard to define the set of required informationfor all platforms, only the flexible database format isdefined. The available entries in database areimplementation dependent.2. Database Format DefinitionThe X Locale Database contains one or more categorydefinitions. This section describes the format of eachcategory definition.The category definition consists of one or more classdefinitions. Each class definition has a pair of class nameand class value, or has several subclasses which areenclosed by the left brace ({) and the right brace (}).Comments can be placed by using the number sign character(#). Putting the number sign character on the top of theline indicates that the entire line is comment. Also,putting any whitespace character followed by the number signcharacter indicates that a part of the line (from the numbersign to the end of the line) is comment. A line can becontinued by placing backslash (\) character as the lastcharacter on the line; this continuation character will bediscarded from the input. Comment lines cannot be continuedon a subsequent line using an escaped new line character.X Locale Database only accepts XPCS, the X PortableCharacter Set. The reserved symbols are; the quotationmark("), the number sign (#), the semicolon(;), thebackslash(\), the left brace({) and the right brace(}).The format of category definition is;Elements separated by vertical bar (|) are alternatives.Curly braces ({...}) indicate zero or more repetitions ofthe enclosed elements. Square brackets ([...]) indicatethat the enclosed element is optional. Quotes ("...") areused around literal characters.The backslash, which is not the top character of theNumericString, is recognized as an escape character, so thatthe next one character is treated as a literal character.For example, the two-character sequence, ‘‘\"’’(thebackslash followed by the quotation mark) is recognized andreplaced with a quotation mark character. Any whitespacecharacter, that is not the Delimiter, unquoted andunescaped, is ignored.3. Contents of DatabaseThe available categories and classes depend onimplementation, because different platform will requiredifferent information set. For example, some platform havesystem locale but some platform don’t. Furthermore, theremight be a difference in functionality even if the platformhas system locale.In current sample implementation, categories listed beloware available.4. XLC_FONTSET CategoryThe XLC_FONTSET category defines the XFontSet relativeinformation. It contains theCHARSET_REGISTRY-CHARSET_ENCODING name and character mappingside (GL, GR, etc), and is used in Output Method (OM).fsN Includes an encoding information for Nth charset, whereN is the index number (0,1,2,...). If there are 4charsets available in current locale, 4 fontsets, fs0,fs1, fs2 and fs3, should be defined. This class hastwo subclasses, ‘charset’ and ‘font’.charsetSpecifies an encoding information to be used internallyin Xlib for this fontset. The format of value is;For detail definition ofCHARSET_REGISTRY-CHARSET_ENCODING, refer "X LogicalFont Descriptions" document.example:ISO8859-1:GLfont Specifies a list of encoding information which is usedfor searching appropriate font for this fontset. Theleft most entry has highest priority.5. XLC_XLOCALE CategoryThe XLC_XLOCALE category defines character classification,conversion and other character attributes.encoding_nameSpecifies a codeset name of current locale.mb_cur_maxSpecifies a maximum allowable number of bytes in amulti-byte character. It is corresponding toMB_CUR_MAX of "ISO/IEC 9899:1990 C Language Standard".state_depend_encodingIndicates a current locale is state dependent. Thevalue should be specified "True" or "False".wc_encoding_maskSpecifies a bit-mask for parsing wide-char string.Each wide character is applied bit-and operation withthis bit-mask, then is classified into the uniquecharset, by using ‘wc_encoding’.wc_shift_bitsSpecifies a number of bit to be shifted for convertingfrom a multi-byte character to a wide character, andvice-versa.csN Includes a character set information for Nth charset,where N is the index number (0,1,2,...). If there are4 charsets available in current locale, cs0, cs1, cs2and cs3 should be defined. This class has fivesubclasses, ‘side’, ‘length’, ‘mb_encoding’‘wc_encoding’ and ‘ct_encoding’.side Specifies a mapping side of this charset. The format ofthis value is;The suffix ":Default" can be specified. It indicatesthat a character belongs to the specified side ismapped to this charset in initial state.lengthSpecifies a number of bytes of a multi-byte characterof this charset. It should not contain the length ofany single-shift sequence.mb_encodingSpecifies a list of shift sequence for parsingmulti-byte string. The format of this value is;example:<LSL> \x1b \x28 \x4a; <LSL> \x1b \x28 \x42wc_encodingSpecifies an integer value for parsing wide-charstring. It is used to determine the charset for eachwide character, after applying bit-and operation using‘wc_encoding_mask’. This value should be unique in allcsN classes.ct_encodingSpecifies a list of encoding information that can beused for Compound Text.6. Sample of X Locale DatabaseThe following is sample X Locale Database file.## XLocale Database Sample for ja_JP.euc### XLC_FONTSET category#XLC_FONTSET# fs0 class (7 bit ASCII)fs0 {charset ISO8859-1:GLfont ISO8859-1:GL; JISX0201.1976-0:GL}# fs1 class (Kanji)fs1 {charset JISX0208.1983-0:GLfont JISX0208.1983-0:GL}# fs2 class (Half Kana)fs2 {charset JISX0201.1976-0:GRfont JISX0201.1976-0:GR}# fs3 class (User Defined Character)# fs3 {# charset JISX0212.1990-0:GL# font JISX0212.1990-0:GL# }END XLC_FONTSET## XLC_XLOCALE category#XLC_XLOCALEencoding_name ja.eucmb_cur_max 3state_depend_encoding Falsewc_encoding_mask \x00008080wc_shift_bits 8# cs0 classcs0 {side GL:Defaultlength 1wc_encoding \x00000000ct_encoding ISO8859-1:GL; JISX0201.1976-0:GL}# cs1 classcs1 {side GR:Defaultlength 2wc_encoding \x00008080ct_encoding JISX0208.1983-0:GL; JISX0208.1983-0:GR;\JISX0208.1983-1:GL; JISX0208.1983-1:GR}# cs2 classcs2 {side GRlength 1mb_encoding <SS> \x8ewc_encoding \x00000080ct_encoding JISX0201.1976-0:GR}# cs3 class# cs3 {# side GL# length 2# mb_encoding <SS> \x8f# #if HasWChar32# wc_encoding \x20000000# #else# wc_encoding \x00008000# #endif# ct_encoding JISX0212.1990-0:GL; JISX0212.1990-0:GR# }END XLC_XLOCALE7. Reference[1] ISO/IEC 9899:1990 C Language Standard[2] X Logical Font Descriptions1

Yoshio Horiuchi
IBM Japan

Copyright © IBM Corporation 1994

All Rights Reserved

License to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of IBM not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.

IBM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS, IN NO EVENT SHALL IBM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Copyright © 1994 X Consortium

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ‘‘Software’’), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED ‘‘AS IS’’, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE X CONSORTIUM BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Except as contained in this notice, the name of the X Consortium shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Software without prior written authorization from the X Consortium.

X Window System is a trademark of X Consortium, Inc.

1. GeneralAn X Locale Database contains the subset of a user’senvironment that depends on language, in X Window System.It is made up from one or more categories. Each categoryconsists of some classes and sub-classes.It is provided as a plain ASCII text file, so a user canchange its contents easily. It allows a user to customizethe behavior of internationalized portion of Xlib withoutchanging Xlib itself.This document describes;Database Format DefinitionContents of Database in sample implementationSince it is hard to define the set of required informationfor all platforms, only the flexible database format isdefined. The available entries in database areimplementation dependent.2. Database Format DefinitionThe X Locale Database contains one or more categorydefinitions. This section describes the format of eachcategory definition.The category definition consists of one or more classdefinitions. Each class definition has a pair of class nameand class value, or has several subclasses which areenclosed by the left brace ({) and the right brace (}).Comments can be placed by using the number sign character(#). Putting the number sign character on the top of theline indicates that the entire line is comment. Also,putting any whitespace character followed by the number signcharacter indicates that a part of the line (from the numbersign to the end of the line) is comment. A line can becontinued by placing backslash (\) character as the lastcharacter on the line; this continuation character will bediscarded from the input. Comment lines cannot be continuedon a subsequent line using an escaped new line character.X Locale Database only accepts XPCS, the X PortableCharacter Set. The reserved symbols are; the quotationmark("), the number sign (#), the semicolon(;), thebackslash(\), the left brace({) and the right brace(}).The format of category definition is;Elements separated by vertical bar (|) are alternatives.Curly braces ({...}) indicate zero or more repetitions ofthe enclosed elements. Square brackets ([...]) indicatethat the enclosed element is optional. Quotes ("...") areused around literal characters.The backslash, which is not the top character of theNumericString, is recognized as an escape character, so thatthe next one character is treated as a literal character.For example, the two-character sequence, ‘‘\"’’(thebackslash followed by the quotation mark) is recognized andreplaced with a quotation mark character. Any whitespacecharacter, that is not the Delimiter, unquoted andunescaped, is ignored.3. Contents of DatabaseThe available categories and classes depend onimplementation, because different platform will requiredifferent information set. For example, some platform havesystem locale but some platform don’t. Furthermore, theremight be a difference in functionality even if the platformhas system locale.In current sample implementation, categories listed beloware available.4. XLC_FONTSET CategoryThe XLC_FONTSET category defines the XFontSet relativeinformation. It contains theCHARSET_REGISTRY-CHARSET_ENCODING name and character mappingside (GL, GR, etc), and is used in Output Method (OM).fsN Includes an encoding information for Nth charset, whereN is the index number (0,1,2,...). If there are 4charsets available in current locale, 4 fontsets, fs0,fs1, fs2 and fs3, should be defined. This class hastwo subclasses, ‘charset’ and ‘font’.charsetSpecifies an encoding information to be used internallyin Xlib for this fontset. The format of value is;For detail definition ofCHARSET_REGISTRY-CHARSET_ENCODING, refer "X LogicalFont Descriptions" document.example:ISO8859-1:GLfont Specifies a list of encoding information which is usedfor searching appropriate font for this fontset. Theleft most entry has highest priority.5. XLC_XLOCALE CategoryThe XLC_XLOCALE category defines character classification,conversion and other character attributes.encoding_nameSpecifies a codeset name of current locale.mb_cur_maxSpecifies a maximum allowable number of bytes in amulti-byte character. It is corresponding toMB_CUR_MAX of "ISO/IEC 9899:1990 C Language Standard".state_depend_encodingIndicates a current locale is state dependent. Thevalue should be specified "True" or "False".wc_encoding_maskSpecifies a bit-mask for parsing wide-char string.Each wide character is applied bit-and operation withthis bit-mask, then is classified into the uniquecharset, by using ‘wc_encoding’.wc_shift_bitsSpecifies a number of bit to be shifted for convertingfrom a multi-byte character to a wide character, andvice-versa.csN Includes a character set information for Nth charset,where N is the index number (0,1,2,...). If there are4 charsets available in current locale, cs0, cs1, cs2and cs3 should be defined. This class has fivesubclasses, ‘side’, ‘length’, ‘mb_encoding’‘wc_encoding’ and ‘ct_encoding’.side Specifies a mapping side of this charset. The format ofthis value is;The suffix ":Default" can be specified. It indicatesthat a character belongs to the specified side ismapped to this charset in initial state.lengthSpecifies a number of bytes of a multi-byte characterof this charset. It should not contain the length ofany single-shift sequence.mb_encodingSpecifies a list of shift sequence for parsingmulti-byte string. The format of this value is;example:<LSL> \x1b \x28 \x4a; <LSL> \x1b \x28 \x42wc_encodingSpecifies an integer value for parsing wide-charstring. It is used to determine the charset for eachwide character, after applying bit-and operation using‘wc_encoding_mask’. This value should be unique in allcsN classes.ct_encodingSpecifies a list of encoding information that can beused for Compound Text.6. Sample of X Locale DatabaseThe following is sample X Locale Database file.## XLocale Database Sample for ja_JP.euc### XLC_FONTSET category#XLC_FONTSET# fs0 class (7 bit ASCII)fs0 {charset ISO8859-1:GLfont ISO8859-1:GL; JISX0201.1976-0:GL}# fs1 class (Kanji)fs1 {charset JISX0208.1983-0:GLfont JISX0208.1983-0:GL}# fs2 class (Half Kana)fs2 {charset JISX0201.1976-0:GRfont JISX0201.1976-0:GR}# fs3 class (User Defined Character)# fs3 {# charset JISX0212.1990-0:GL# font JISX0212.1990-0:GL# }END XLC_FONTSET## XLC_XLOCALE category#XLC_XLOCALEencoding_name ja.eucmb_cur_max 3state_depend_encoding Falsewc_encoding_mask \x00008080wc_shift_bits 8# cs0 classcs0 {side GL:Defaultlength 1wc_encoding \x00000000ct_encoding ISO8859-1:GL; JISX0201.1976-0:GL}# cs1 classcs1 {side GR:Defaultlength 2wc_encoding \x00008080ct_encoding JISX0208.1983-0:GL; JISX0208.1983-0:GR;\JISX0208.1983-1:GL; JISX0208.1983-1:GR}# cs2 classcs2 {side GRlength 1mb_encoding <SS> \x8ewc_encoding \x00000080ct_encoding JISX0201.1976-0:GR}# cs3 class# cs3 {# side GL# length 2# mb_encoding <SS> \x8f# #if HasWChar32# wc_encoding \x20000000# #else# wc_encoding \x00008000# #endif# ct_encoding JISX0212.1990-0:GL; JISX0212.1990-0:GR# }END XLC_XLOCALE7. Reference[1] ISO/IEC 9899:1990 C Language Standard[2] X Logical Font Descriptions1