Vikas Burman
Chief Technology Officer
Despite the spread of English through economic globalization, not all users of software speak English. Even though English is largely a second language throughout the world, neither all speakers are able to use the language efficiently in their work, nor everyone prefer having to use English to accomplish their daily tasks; this is particularly true at the end user level. In other words, national language identity is very much alive all across, for very practical reasons.
As Wikipedia says, Globalization is the process by which businesses or other organizations develop international influence or start operating on an international scale. It means making applications work seamlessly, regardless of the user's language and culture.
Globalization starts with the design of the application to allow its use in non-English locales; this is called Internationalization. Features such as date and currency format choices, dynamic resizing of user interface elements, ability to input, view, and display data using different character sets such as ideographic (Double-Byte Character Set) and simple text (Single-Byte Character Set) are all part of concept of Internationalization. Complex bi-directional text languages such as Arabic and Hebrew must also be seamlessly supported. Ability to cater to regional law-enforced requirements must be factored in, else the product cannot be sold in target country. Internationalization therefore is the process of producing an application’s design and code which is free of any dependency on the language and culture specific attributes in which this is being seen.
However, an internationalized application is not usable in any region of the world unless it is localized for that specific region. It must speak the local language in every sense of the word. With a solid foundation of internationalization in place, it is relatively easy to then localize it into the desired language. Localization is the process of adapting an internationalized application to a specific language, script, cultural, and coded character set environment. In localization, the same semantics are preserved while the syntax may be changed. Localization goes beyond text translation and also required to cater to local conventions as well. For instance, one can select Arabic as a language, but also Egypt as the specific locale of Arabic. Locale allows for locale-specific variations on the usage of format, currency, spellchecker, punctuation, etc., all within the same language area.
Putting Internalization and Localization together we get what we call Globalization. As the convention goes, by counting the number of letters between first and last letter of each of these words (Globalization = Internalization + Localization) this is referred as G11N = I18N + L10N. However for all practical purposes, G11N has more to it, other than I18N and L10N.
Fig 1: Globalization Components
On a very high level, globalization looks just this, however as you go deep and start implementing things, there are hundreds of small and big decisions and activities that are required to be taken care of. In this context here is a list of 127 key considerations which can come handy for any globalization project. These considerations are all organized under following next level groups.
Fig 2: Globalization Overview
1 | Globalization | |
2 | Internationalization | |
3 | Strings | |
4 | Unicode | Process everything as Unicode. |
5 | Resource Files | Loading of multiple resource files should be available. |
6 | String Comparison | Same charset strings should be compared. |
7 | Ordering and Sorting | Charset specific ordering and sorting needs to be done. |
8 | Concatenation | Language specific grammar rules are to be applied. |
9 | Numbers | |
10 | Numerals | Generally straight forward, but decimal places, format of numbers etc. are some considerations. |
11 | Currency | Currency conversions, rates as on date might be required to process. |
12 | Units | Unit conversion might be needed if same data is shown in different context. |
13 | Date & Time | |
14 | Serialization | Deserialization of correct date and time needs to be handled well. |
15 | Time zone Awareness | Ensure time zone is considered in persistence and arithmetic. |
16 | Arithmetic | Date calculations should be looked upon carefully. |
17 | Culture | |
18 | Sensitivity | In some cases, culture aware operations need to be initiated explicitly. |
19 | Insensitivity | Not everything is required to be culture sensitive. |
20 | Input | |
21 | Fonts | Font being used must support full characters of the locale which are put to use. |
22 | Keyboards and IMEs | Correct Input Method Editor (IME) should be loaded. |
23 | Keyboard Shortcuts | Shortcut keys must be locale centric. |
24 | Output | |
25 | Mirroring | Correct directional output is configured. |
26 | Media / Resource Path | Path strings on localized versions of OS may differ. |
27 | Formatting | Locale specific formatting of information will need some attention. |
28 | Persistence | |
29 | Database | Decide to use same or multiple databases for various locales. |
30 | File System | Locale centric folder structure would be required. |
31 | Cache | Multiple-locale aware cache operations can be tricky. |
32 | Configuration Settings | Locale specific configuration settings need to be stored. |
33 | Localization | |
34 | Resources | |
35 | Text | |
36 | Translation | |
37 | Phrasing | |
38 | Consistency | Consistency of translation across application need to be maintained. |
39 | Vocabulary | Right vocabulary to be used as per every locale. |
40 | Gist | Ensure that the crux of the message is maintained during translation. |
41 | Symbols | Not all symbols might be available in all fonts / may have a different meaning in other language. |
42 | Formatting | Placeholders need to be at right place in translated text as per grammar of the language. |
43 | Typography | Not all font styles may be supported by all fonts. |
44 | Culture Sensitivity | Messages may need a change to maintain culture affinity. |
45 | Transliteration | |
46 | Automated | To incorporate an automated transliteration system, if required. |
47 | Manual | Or to provide manual side-by-side entry provision. |
48 | Graphics | |
49 | Culture Sensitivity | Same graphic, icon, colours may not be applicable in all cultures. |
50 | Media | |
51 | Locale Specific Versions | Whole set to be made available for every supported locale. |
52 | Locale Neutral Version | Something that everyone understands and is done in culture neutral manner. |
53 | Fonts | |
54 | Culture Aware Default | A different default font for each culture should be configured. |
55 | Layout | |
56 | Length of localized text | English text takes less space while others may take more. |
57 | Flexible placement | Less or more text due to translation will need UI to adjust dynamically. |
58 | Font size | Not all languages would look good on same font size. |
59 | Directional flow | Some languages go right to left; adjust user interface dynamically. |
60 | Content | |
61 | Documentation | |
62 | User Documentation | End users might need a different language instructions than what administrators might want. |
63 | Online Content | Online links that application may open need localization too and links also need to be context sensitive so correct language links are opened. |
64 | Help Files | Cross-referencing across help files of multiple languages might also be required for decent fall-backs. |
65 | User Data | Locale specific user data might require to be stored and processed; causing intelligent validation rules as per grammar of the language, e.g., 50 characters limit in English might need to be adjusted to a better number for Arabic locale. |
66 | More | |
67 | Localizability Review | After localization is done, a thorough review of the algorithms, logic, user interface, validations, etc. need to be done. |
68 | User Interface | |
69 | Strings | Are right string resources being loaded? |
70 | Messages | |
71 | Information Stuffing | Does the messages making sense after data (numbers, text) are stuffed in them at run time? |
72 | Concatenation | If more than one messages are concatenated for some context, is the meaning coming right? |
73 | System Dependent Nuances | |
74 | Dialog Boxes | Are right language dialog boxes being loaded from system libraries? |
75 | Error Messages | Are system error messages coming from OS, right for the locale? |
76 | Paper Sizes | When print dialog is opened, are all paper sizes that goes for selected locale being displayed? |
77 | Folder/File Path Names | Are concatenated folder and file names that are generated by the application correct? |
78 | File Extensions | Some custom file extensions, if used, need to be checked against various locales. |
79 | Menus | |
80 | Shortcut Keys | Shortcut keys need to be adjusted as per translated text. E.g., (F)ile in English would be (D)atei in German. Shortcut keys will be different in these cases. |
81 | Embedded Objects | Are embedded objects anywhere behaving correctly in various locales? |
82 | Complex Text Nuances | Have different format and rules for different languages and cultures for all these below. |
83 | Telephone Numbers | |
84 | Addresses | |
85 | Title Conventions | |
86 | Pluralization | |
87 | Punctuations | |
88 | Capitalization | |
89 | Executable Code | |
90 | Culture Sensitive Processing | If application is doing culture sensitive operations at right places, e.g., sorting of user data? |
91 | Culture Neutral Processing | Some operations are not to be processed as culture specific and has to be neutral. E.g., the order in which files from a folder are being processed (generally by date and not by name). |
92 | Fall-back | |
93 | Culture | |
94 | Locale / Region | Are right locale specific resources being loaded? |
95 | Language | If locale resources are not available, is it falling back to correct language resources? |
96 | Font | Does font fall-back working fine? |
97 | Charset | Is right charset fall-back happening? |
98 | Media, Resources, etc. | Are defined fall-back mechanism working for media and other resource files? |
99 | Accessibility Requirements | Look for required accessibility requirements being fulfilled in runtime. |
100 | Automated Tests | |
101 | Deployment | |
102 | Locale Aware Resources | Does build script bundling all required locale specific resources? |
103 | Locale Specific System Dependencies | During installation, are locale specific system dependencies are being installed from OS? |
104 | Text Handling | |
105 | Text Input | Are text input methods working correctly for various locales? |
106 | Clipboard Operations | Does copy/paste operations working in non-English locales? |
107 | Font Independence | Does font have any major role to play in application? Is fall-back defined correctly? |
108 | DBCS Encoding | Does double-byte character set text is stored and retrieved correctly? |
109 | Buffer Size | For large text concatenation operations, is buffer overflow occurring in locales that use more characters than English? |
110 | Locale Aware Data Persistence | Is data persistence locale aware, is required locale context is stored and retrieved correctly? |
111 | Translation Services | |
112 | Real-time | Decide if some real-time translation service is to be used? |
113 | Offline | Or is offline translation is the way to go? |
114 | Packaging | |
115 | Internationalized Code | Packaging for codebase has to be separate than localized resources, so deployments are smaller in size and only required locales are installed separately. |
116 | Localized Resources | All localized items, resources, media, etc. are to be packaged separately. |
117 | Automated Build | An automated build process that can package code and resources separately and provide a user interface to install as per user's choice. |
118 | Localization Plan | |
119 | Global Defaults | Ensure global defaults for various locales are defined correctly as last fall-back. |
120 | Localized Defaults | Ensure defaults for various locales are defined as first selection. |
121 | Localization Order | Define what all locales you need to support and in what logical order they need to be processed. You may want to process similar locales together. |
122 | Security Considerations | |
123 | Internationalization | |
124 | Memory Buffers | Buffer overflow because of larger text in some locale may terminate program at some unknown location leaking sensitive information. |
125 | Localization | |
126 | Malicious String Resources | Buffer overflow because of larger text in some locale may terminate program at some unknown location leaking sensitive information. |
127 | String Delimiters | String delimiters when translated incorrectly and such text is processed, may cause trouble. |