Characters
Getting started
Background reading
-
Character encodings for beginners
What is a character encoding, and why should I care?
-
Introducing character sets and encodings
A brief introduction to some of the concepts associated with character sets and encodings and the Web, with pointers to various techniques sections.
-
Character encodings: Essential concepts
Basic introductions to concepts related to character encoding. Includes:Unicode, character sets, coded character sets, character encodings, the document character set, character escapes, xhtml & mime types, and standards vs quirks modes.
-
One of the 10 quick tips for internationalization.
-
One of the 10 quick tips for internationalization.
Go to:
top of this section • top of this page • techniques home pageChoosing and applying a character encoding
- Choose UTF-8 for all content. more
- If you really can't use a Unicode encoding, use only those legacy encodings listed in the Encoding specification. more
- Avoid the following encodings: UTF-16, UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC, CESU-8, UTF-7, BOCU-1, and SCSU. more
How to's
-
Choosing & applying a character encoding
Which character encoding should I use for my content, and how do I apply it to my content?
Useful reference links
-
Encoding, 5.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
-
HTML5, 8.2.2.2 Character encodings
Recommendations for support of particular encodings for browsers implementing HTML5.
Background reading
-
Are corporate Web sites using Unicode right now? This article is somewhat outdated, now that Unicode accounts for around 80% of pages on the Web.
-
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?
Show more links
-
Unicode over 60 percent of the web
A Google blog post by Mark Davis. A graph showing the growth of Unicode encodings to over 60% of Web pages (around 80% if you include ASCII-only pages).
-
The official registry of character encoding names.
Go to:
top of this section • top of this page • techniques home pageChanging to UTF-8
- Save the data as UTF-8, don't just change the encoding declaration. more
- Declare the encoding in your page. more
- Ensure that your server does the right thing. more
How to's
-
Changing an HTML page encoding to UTF-8
How do I change the encoding of my HTML pages to UTF-8?
-
The byte-order mark (BOM) in HTML
What is the byte-order mark, and what do I need to know about it when creating HTML?
-
Detailed guidelines for the migration of software and data to Unicode.
Background reading
-
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?
Go to:
top of this section • top of this page • techniques home pageDeclaring the character encoding for HTML
- Use the HTTP header if it is available. more
- Always use an in-document encoding declaration, even if you are also using the HTTP header. more
- Ensure that the encoding declaration fits within the first 1024 bytes of the page. more
- If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
- Do not use the
charsetattribute onaorlinkelements. more
How to's
-
Declaring character encodings in HTML
How should I declare the encoding of my HTML file? This page contains a quick reference section, followed by more detailed information.
Useful reference links
-
Encoding, 5.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
-
-
Detailed technical information for browser implementers about how pages are parsed for recognition of the character encoding.
-
Information about preferred MIME names, ASCII compatible encodings, and Unicode characters.
-
4.2.5.5 Specifying the document's character encoding
How to use the meta element to declare the encoding.
-
Background reading
-
Introduces doctypes, mime-types, and the influence of standards- vs. quirks-mode on character encoding declarations.
-
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
Show more links
-
The official registry of character encoding names.
-
-
C.1. Processing Instructions and the XML Declaration
Recommendation to avoid XML declaration for compatibility with HTML.
-
Recommendations on use of meta encoding declarations for HTML compatibility.
-
-
Polyglot Markup: HTML-Compatible XHTML Documents
How to specify the encoding of documents that work as both HTML5 and XHTML5.
-
3. Specifying a Document's Character Encoding
How to specify the encoding of documents that work as both HTML5 and XHTML5.
-
2. Processing Instructions and the XML Declaration
The XML declaration is not allowed in documents that work as both HTML5 and XHTML5.
-
-
Character Model for the World Wide Web, 4.4.1 Mandating a unique character encoding, C034
You should use encoding declarations that are available.
-
HTML: The Markup Language, 4.2. Character encoding declaration
How to declare the character encoding in HTML5.
-
Extensible Markup Language (XML) 1.0, 4.3.3 Character Encoding in Entities
How to declare encodings in XML, with particular reference to the XML declaration.
-
HTML 4.01, 5.2.2 Specifying the character encoding
Character encoding information in the HTML 4.01 spec.
Go to:
top of this section • top of this page • techniques home pageDeclaring the character encoding for a CSS style sheet
- If you use UTF-8 as the character encoding for your style sheets and your HTML pages, and declare that encoding in your HTML, there is no need to declare the encoding for your style sheet. more
- If you use
@charset, ensure that nothing (except a BOM) comes before it in the style sheet, and use the exact syntax. more - If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
- Do not use the
charsetattribute onaorlinkelements. more
How to's
-
CSS character encoding declarations
How do I declare the character encoding of a CSS style sheet?
Useful reference links
-
Encoding, 5.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
-
CSS Syntax Level 3, 3.2. The input byte stream
Character encoding information in the CSS Level 3 spec.
Show more links
-
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
-
The official registry of character encoding names.
-
CSS 2.1, 4.4 CSS style sheet representation
Character encoding information in the CSS 2.1 spec.
Go to:
top of this section • top of this page • techniques home pageUsing escapes to represent characters
- Avoid using escapes whenever possible. When you use UTF-8 it supports all the characters you need. more
- Use escapes for invisible or ambiguous characters. more
- Use CSS escapes for CSS embedded in HTML, rather than HTML escapes. more
- Always use Unicode codepoints for the numeric part of a character escape. Do not use codepoint values of non-Unicode encodings. more
- Use a single escape (representing the Unicode codepoint value) for supplementary characters. Do not escape surrogate character pairs. more
- Ensure that all
hrefattribute values have escaped ampersands in query parameters, ie.&rather than just&. more - Avoid named character entities in XHTML. more
How to's
-
Using character escapes in markup and CSS
How can I use character escapes in markup and CSS, and when should I use or not use them?
Useful reference links
-
HTML5, 8.5 Named character references
Character reference names that are supported by HTML, and the code points to which they refer.
Spec links
-
HTML5, 8.5 Named character references
Character reference names that are supported by HTML, and the code points to which they refer.
Show more links
-
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
-
-
Numeric character references and character entity references described in the HTML4 spec.
-
24 Character entity references in HTML 4
List of character entity references supported by HTML 4 and XHTML 1.0.
-
-
-
C.12. Using Ampersands in Attribute Values (and Elsewhere)
Advice on use of ampersands in href attributes in XHTML.
-
C.16. The Named Character Reference '
Advice to not use &apos in XHTML.
-
Go to:
top of this section • top of this page • techniques home pageChecking the encoding of a document
How to's
-
W3C Internationalization Checker
Shows the HTTP header information for a page, and all in-page encoding declarations. Also highlights conficts.
-
How can I check the character encoding information sent in the HTTP header of a web document?
-
Checking the character encoding using the validator
How can I check that the character encoding of my document is correct using the W3C HTML Validator?
Useful reference links
-
Shows the HTTP header information for a page .
-
Shows the HTTP header information for a page.
-
Shows the HTTP header information for a page.
Go to:
top of this section • top of this page • techniques home pageHandling the byte-order mark (BOM)
- If you use the byte-order mark with UTF-8-encoded pages, check that any scripts and back-end processes can handle the BOM. more
- If you ignored the advice above and encoded your page as UTF-16, always ensure that it starts with a BOM. more
How to's
-
The byte-order mark (BOM) in HTML
What is the byte-order mark, and what do I need to know about it when creating HTML?
Useful reference links
-
W3C Internationalization Checker
Tells you whether your page starts with a BOM, and whether there is a BOM later in the content.
Spec links
-
HTML5, 8.2.2 The input byte stream
How HTML5 detects the character encoding of a page, and mentions how browsers should handle BOM detection.
-
CSS Syntax Level 3, 3.2. The input byte stream
Character encoding information in CSS. Mentions how browsers should handle the BOM.
Show more links
-
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
-
CSS 2.1, 4.4 CSS style sheet representation
Character encoding information in the CSS 2.1 spec. Mentions how browsers should handle the BOM.
Go to:
top of this section • top of this page • techniques home pageHandling character normalization
- Ensure that all HTML class names and CSS selectors are saved using the same Unicode normalization form (NFC is recommended). more
How to's
-
What are normalization forms, and why do I need to know about them when creating HTML and CSS content??
Useful reference links
-
W3C Internationalization Checker
Tells you whether your HTML page contains non-NFC class names and ids.
Go to:
top of this section • top of this page • techniques home pageHandling encoding issues in forms
- Use UTF-8 for the character encoding of your page. more
- Consider checking on the server that form data is arriving in UTF-8. more
How to's
-
What is the best way to deal with encoding issues in forms that may use multiple languages and scripts?
Go to:
top of this section • top of this page • techniques home pageUsing Unicode control codes
- Don't use Unicode characters if there is markup to do the same job. more
- Use character escapes to represent control codes, so that they are visible. more
How to's
-
There are a range of control-like Unicode characters, some of which fulfill the same role as markup. Which should I use, and which should I avoid?
-
Unicode in XML & Other Markup Languages
Guidelines on the use of the Unicode Standard in conjunction with markup languages such as XML.
-
Unicode controls vs. markup for bidi support
To correctly format bidi text in (X)HTML or XML content, should I use Unicode control codes or markup?
-
Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do?
-
HTML, XHTML, XML and Control Codes
How do I handle control codes (ie. the 'C0' U+0000-U+001F and 'C1' U+007F-U+009F ranges) in XML, XHTML and HTML?
Go to:
top of this section • top of this page • techniques home pageWorking around unavailable characters/glyphs
How to's
-
What to do if a Unicode character or font glyph is missing.
Go to:
top of this section • top of this page • techniques home pageUsing non-ASCII web addresses
Useful reference links
-
Internationalized country code top-level domain
Wikipedia article. Contains news about recent developments.
-
Wikipedia page.
-
mod_fileiri: new Apache module under development
Martin Dürst's fileiri Apache module.
Spec links
-
RFC 3987 Internationalized Resource Identifiers (IRIs)
IETF Proposed Standard for handling of IRIs.
-
Unicode Technical Report #36 Unicode Security Considerations
Describes security issues related to phishing.
Background reading
-
An Introduction to Multilingual Web Addresses
How IDN and IRIs work, aimed at content authors and general users who want to understand the basics without too many gory technical details.
Other links
-
Lists of IDNs that work with links to the sites.
Go to:
top of this section • top of this page • techniques home pageLanguage
Getting started
Background reading
-
W3C Getting Started article.
-
W3C tutorial.
-
How to choose the right attribute values. W3C article.
-
One of the 10 quick tips for internationalization.
-
Why use the language attribute?
Why should I use the language attribute in web pages?
Go to:
top of this section • top of this page • techniques home pageDeclaring the overall language of a page
- Always declare the default language for text in the page using attributes on the
htmltag. more - Do NOT use the
metaelement with thecontentattribute set toContent-Language. more - Use language attributes rather than HTTP to declare the default language for 'text processing' (ie. when language needs to be known for things such as font choice, styling, spell-checking, hyphentation, quote mark styling, etc.). more
- Do not declare the default language of a document in the
bodyelement, use thehtmlelement. more - Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the
htmltag, or leave the languages undefined until later. - Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
- For HTML use the
langattribute only, for XHTML 1.0 served astext/htmluse thelangandxml:langattributes, and for XHTML served as XML use thexml:langattribute only. more
How to's
-
How should I set the language of the content in my HTML page?
Background reading
-
Why use the language attribute?
Why should I use the language attribute in web pages? A number of useful reasons.
-
HTTP headers, meta elements and language information
For HTML, should we put language declarations in HTTP headers and meta elements, and how are they different from those in language attributes?
Spec links
-
-
3.2.3.3 The lang and xml:lang attributes
The language attributes in HTML5.
-
How HTML5 deals with a meta element with http-equiv set to Content-Language.
-
Tests
Show more links
-
HTML4.01, 8.1 Specifying the language of content: the lang attribute
lang attribute definition in HTML 4.01
-
XML 1.0, 2.12 Language Identification
xml:lang attribute definition in XML.
-
WCAG, Guideline 4. Clarify natural language usage
Recommendation to express natural language in a document in Web Content Accessibility Guidelines.
-
WCAG, 2.2 Identifying the primary language
Recommendation to use lang attribute on html tag in Web Content Accessibility Techniques for HTML.
-
XHTML 1.1, 3. The XHTML 1.1 Document Type
The 2nd edition introduced the lang attribute to go with the xml:lang attribute.
-
XHTML 1.0, C.7 The lang and xml:lang Attributes
xml:lang and lang attribute definitions in XHTML 1.0.
-
HTTP 1.1, 14.12 Content-Language
Content-Language definition in HTTP 1.1.
-
Polyglot markup, 7.2 Language Attributes
Using lang and xml:lang in HTML5 polyglot documents.
-
Polyglot markup, 6.5.1.1 Content-Language
Content-Language and HTML5 polyglot documents.
Go to:
top of this section • top of this page • techniques home pageIdentifying in-document language changes
- When the page contains content in another language, add a language attribute to an element surrounding that content. more
- For HTML use the
langattribute only, for XHTML 1.0 served astext/htmluse thelangandxml:langattributes, and for XHTML served as XML use thexml:langattribute only. more - If the text in attribute values and element content is in different languages, consider using a nested approach. more
How to's
-
How should I set the language of the content in my HTML page?
Background reading
-
Why use the language attribute?
Why should I use the language attribute in web pages? A number of useful reasons.
Spec links
-
HTML5, 3.2.3.3 The lang and xml:lang attributes
The language attributes in HTML5.
Show more links
-
HTML4.01, 8.1 Specifying the language of content: the lang attribute
lang attribute definition in HTML 4.01
-
XHTML 1.0, C.7. The lang and xml:lang Attributes
Use both lang and xml:lang.
-
XHTML 1.1, 3. The XHTML 1.1 Document Type
The 2nd edition introduced the lang attribute to go with the xml:lang attribute.
-
XML 1.0, 2.12 Language Identification
xml:lang attribute definition in XML.
-
WCAG, Guideline 4. Clarify natural language usage
Recommendation to express natural language in a document in Web Content Accessibility Guidelines.
-
WCAG Techniques, 2.1 Identifying changes in language
Recommendation to use lang attribute when language changes in a document, in Web Content Accessibility Techniques for HTML.
-
Polyglot markup, 7.2 Language Attributes
Using lang and xml:lang in HTML5 polyglot documents.
Go to:
top of this section • top of this page • techniques home pageChoosing language tags
- Use subtags as defined by BCP 47 for language attribute values. more
- Use the shortest possible language tag values. more
- Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively. more
- Use the subtag zxx when the text is known to be not in any language. more
- When the language is undetermined and you have to label it, use lang="". more
- If you are serving XML, and the format you are using supports it, use xml:lang="", otherwise use xml:lang="und" when the language is undetermined and you have to label it. more
How to's
-
Which language tag is right for me? How do I choose language and other subtags? Covers all the subtag types in the latest version of BCP47.
-
A simple overview of the syntax for language tags in BCP 47.
-
How do I use language markup in HTML or XML content when I don't know the language, or the content is non-linguistic?
-
Two-letter or three-letter language codes
Should I use two-letter or three-letter ISO language codes in language tags? W3C article.
-
Picking the Right Language Identifier
Describes how to select Unicode language identifiers.
Useful reference links
-
This is the official location where you will find all subtags available for use in language tags.
-
User friendly interface to IANA's language tag registry. Provides for checking of subtags as well as lookup. Up-to-date with latest version of BCP 47.
-
Points to a document containing both RFC 5646 (Tags for the Identification of Languages) and RFC 4647 (Matching Language Tags)
-
RFC 5646 Tags for the Identification of Languages
The specification that describes language tag syntax.
-
RFC 4647 Matching of Language Tags
The specification that describes alternative ways of matching language tags.
-
-
Provides various useful links about language tags and a good place to find up-to-date information.
Spec links
-
HTML5, 3.2.3.3 The lang and xml:lang attributes
The language attributes in HTML5.
Show more links
-
Specifying the language of content: the lang attribute
lang in the HTML 4.01 spec (section 8.1)
-
xml:lang in the XML spec (section 2.12)
-
ISO 3166: Codes for Country Names
ISO country codes
-
ISO 639: Codes for the Representation of Names of Languages
ISO language codes
-
RFC 4646 Tags for the Identification of Languages
[Of historic interest only] An earlier version of the specification that describes language tag syntax.
-
RFC 3066 Tags for the Identification of Languages
[Of historic interest only] The previous IETF document that used to define how to use language tags to identify languages, now obsolete.
-
Understanding the New Language Tags
[Of historic interest only] Overview of planned improvements for RFC3066bis by one of its authors.
Go to:
top of this section • top of this page • techniques home pageDeclaring metadata about the language(s) of the intended audience
- Consider using a
Content-LanguageHTTP header to declare metadata about the language(s) of the intended audience of a document. more - Where a document contains content aimed at speakers of more than one language, use the HTTP
Content-Languageheader with a comma-separated list of language tags. more
How to's
-
HTTP headers, meta elements and language information
For HTML, should we put language declarations in HTTP headers and meta elements, and how are they different from those in language attributes?
-
How should I set the language of the content in my HTML page? Includes:
-
Specifying metadata about the audience language
Talks about using HTTP headers to provide metadata.
-
Spec links
-
HTML5, 4.2.5.3 Pragma directives
How HTML5 deals with a
metaelement withhttp-equivset toContent-Language. -
HTTP 1.1, 14.12 Content-Language
The
Content-LanguageHTTP header described in the HTTP1.1 specification.
Show more links
-
HTML 4.01, 8.1 Specifying the language of content: the lang attribute
Content-Languagein the HTML specification: only says that thehtmllanguage attribute has a higher precedence.
Go to:
top of this section • top of this page • techniques home pageIndicating the language of a link destination
- When pointing to a resource in another language, consider the pros and cons before indicating the language of the target document. more
- If you want to indicate that the target document of an a element is in another language, consider the pros and cons before using
hreflangwith CSS. more - Do not use flag icons to indicate languages. more
How to's
-
Indicating the language of a link destination
What should I bear in mind if I want to indicate to the reader that a link points to a page in a different language?
-
Why country flags as symbols for languages are problematic, and what you should do instead.
Spec links
-
HTML5, 4.12.2 Links created by a and area elements
hreflangin the HTML5 spec -
CSS 2.1, 12.1 The :before and :after pseudo-elements
:beforeand:afterin the CSS 2.1 spec
Show more links
-
hreflangin the HTML 4.01 spec.
Go to:
top of this section • top of this page • techniques home pageSetting & changing browser language preferences
How to's
-
Setting language preferences in a browser
How do I check or change the language settings of my browser?
-
Tells you what your current
Accept-Languageheaders are set to. (See the bottom of the information table.)
Useful reference links
-
Apache documentation on content negotiation.
-
Debian web site in different languages
How to set language preferences in a variety of legacy versions of browsers.
Go to:
top of this section • top of this page • techniques home pageUsing Accept-Language for locale setting
How to's
-
Accept-Language used for locale setting
Is it a good idea to use the HTTP Accept-Language header to determine the locale of the user?
-
Date formats, Option Three: Use the Accept-Language HTTP header
How do I prepare my web pages to display varying international date formats?
Go to:
top of this section • top of this page • techniques home pageMarkup & text
Getting started
How to's
-
Quick tips: Presentation vs. content
One of the 10 quick tips for internationalization.
-
One of the 10 quick tips for internationalization.
Go to:
top of this section • top of this page • techniques home pageWorking with composite strings and string re-use
- Use a topic-comment approach whenever possible. more
- Avoid sentence-like arrangements when they contain substrings that are predefined translatable text or numeric text. more
- Use sentence-like arrangements with care if you have non-numeric and non-translatable text substrings (ie. text created at runtime). more
- Where the parts of a composite message appear in separate locations, provide the translator with contextual information to show how the various parts of a composite message relate to each other. more
- Provide information to the translator, where needed, to clarify what a substring represents. more
- When requested by the localization group, be prepared to provide information about the size of each substring. more
- Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase. more
- Reused strings must not refer to more than one text, graphic or conceptual context. more
- If in doubt as to whether a string is a good candidate for re-use, don't. more
- If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box. more
How to's
-
Working with Composite Messages
Why you need to be very careful about splitting up and reusing text on-screen. The linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve.
-
Re-using Strings in Scripted Content
Things to be aware of if you plan to use the same text string in different places on your site or user interface.
Useful reference links
-
<Insert Title Here> (or, Variables in Interface Language)
Article by Chris Noessel illustrating a number of examples where composite messages can cause problems.
Go to:
top of this section • top of this page • techniques home pageUsing ruby markup
How to's
-
Discusses how to use ruby markup in HTML5, and has pointers to what currently works in browsers.
Useful reference links
-
Typography index: Ruby annotation
Links to requirements for ruby in the typography index.
Background reading
-
What are 'ruby' annotations?
-
A summary of how bopomofo is used and the implications for support on the Web.
-
Use Cases & Exploratory Approaches for Ruby Markup
Discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases and how well they are supported by the various markup models.
-
CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)
Spec links
-
-
The
rubyelement in the HTML5 spec -
The
rtelement in the HTML5 spec -
rpin the HTML5 spec
-
-
Proposed extensions to the HTML5 markup model.
Tests
Show more links
-
Looks at various possible models for marking up ruby, with a view to informing discussion about the models found in the HTML5 draft of 25 October 2012 and the proposed Ruby Extension Spec as of 25 February 2013.
-
Ruby Annotation Recommendation
W3C Recommendation that defines markup for ruby, in the form of an XHTML module. The HTML5 markup model should eventually replace this specification.
-
W3C Working Draft that defines how ruby elements can be styled in various different ways. This draft is likely to change significantly as it is reworked to support the HTML5 markup model. (For more about styling ruby see Styling ruby text).
-
XHTML 1.1, 3. The XHTML 1.1 Document Type
Ruby Annotation inthe XHTML 1.1 spec (bottom of the page)
-
Sample module implementations of the Ruby Annotation Specification in several schemas (W3C Personal Note)
Go to:
top of this section • top of this page • techniques home pageUsing b and i tags
- Use the class attribute on a b or i element to identify why the element is being used. more
- Consider whether other elements might be more applicable than the b or i element because they carry the right semantics. more
How to's
-
Should I use <b> and <i> elements, and if so, what do I need ?
Spec links
-
The
ielement in the HTML5 spec -
The
belement in the HTML5 spec
Go to:
top of this section • top of this page • techniques home pageWorking with form controls
How to's
-
As part of a form, I have a list of terms in a drop-down box. Why are they not correctly sorted when I translate the items in the list?
Go to:
top of this section • top of this page • techniques home pageIndicating what should and should not be translated
- Use the
translateattribute on an element to prevent its content being translated by online translation services or by computer-assisted translation tools. more
How to's
-
Using HTML's translate attribute
What is the translate attribute for, and how should I use it?
Spec links
-
HTML5, 3.2.3.4 The translate attribute
The
translateattribute in the HTML5 spec
Tests
Show more links
-
HTML5 adds new translate attribute
Blog post describing the translate attribute, how it works, and why it is needed.
Go to:
top of this section • top of this page • techniques home pageText direction
Getting started
How to's
-
Unicode Bidirectional Algorithm basics
A gentle introduction to how the bidi algorithm works.
-
Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts
A tutorial, that gathers together and organizes pointers to articles that, taken together, help you understand the essential aspects of how to work with languages in right-to-left scripts and bidirectional text when authoring HTML and CSS.
-
Quick tips: Right-to-left text
One of the top 10 quick tips for internationalization is about right-to-left text.
Go to:
top of this section • top of this page • techniques home pageSetting up a right-to-left page
- Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
- Add
dir="rtl"to thehtmltag any time the overall document direction is right-to-left. more - Don't add
dir="rtl"to thebodytag. more - If you need to avoid the scroll bar moving on some browsers, put
diron theheadelement and adivjust inside thebodyelement. more - Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding. more
- If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8. more
- Do not use CSS styling to control directionality in HTML. Use markup. more
How to's
-
Text direction and structural markup in HTML
How to use the
dirattribute and handle alignment. Includes:-
Setting direction at the document level
Using
diron thehtmltag to set the default direction of the document. -
Working with browsers that change the browser chrome
Workarounds if you don't want the browser to change the UI when
diris set on thehtmltag.
-
-
Visual vs. logical ordering of text
What is the difference between visual and logical ordering of text, and which should I use?
Spec links
Tests
Go to:
top of this section • top of this page • techniques home pageSetting direction on block elements
- Add the
dirattribute to a block element to change base direction. more - Do not use CSS styling to control directionality in HTML. Use markup. more
- Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
How to's
-
Text direction and structural markup in HTML
How to use the dir attribute and handle alignment. Includes:
-
Setting direction on block elements
How to use the
dirattribute and handle alignment. -
Particular advice for working with tables.
-
Handling content whose direction is not known in advance
Besides form-related information, how to insert text into a page with the right base direction, using HTML5 features.
-
Displaying bidi text in the textarea and pre elements
How
dir=autoaffects elements with multiple paragraphs of plain text.
-
-
Unicode controls vs. markup for bidi support
To correctly format bidi text in (X)HTML or XML content, should I use Unicode control codes or markup?
-
CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based markup languages?
Spec links
Tests
Go to:
top of this section • top of this page • techniques home pageManaging text direction in form controls
- Add
dir="auto"toinputtags to automatically align text to the correct side of an input field. more - Add
dir="auto"totextareaandpretags to make paragraphs align to the left or right according to the intial strong character more - Consider using the
dirnameattribute to pass information to the server about the direction of text in a text or search form control. more
How to's
-
Text direction and structural markup in HTML
How should I use the dir attribute to set text direction on structural elements in HTML? Includes:
-
Correcting display of opposite-direction text in the
inputelementHTML5 techniques for getting the cursor and text to the right side of the
inputelement. -
Displaying bidi text in the
textareaandpreelementsUsing
dir=autoin HTML5 to assign direction to each paragraph independently. -
Reporting direction to the server
Using HTML5's
dirnameattribute to pass direction information to the server. -
Setting direction on forms explicitly
Keystrokes that make browsers set the direction of form entry fields.
-
-
Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do?
Spec links
Tests
Go to:
top of this section • top of this page • techniques home pageMixing text direction inline
- Tightly wrap every opposite-direction phrase in markup that sets its base direction. more
- If you know the phrase's direction, wrap it in an element with a
dirattribute. If you don't already have an element around the text, usespanorbdi. more - If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either wrap the phrase in
bdi(nodirattribute needed), or if the phrase is tightly wrapped by an element already, just adddir="auto"to that element. more - To bulletproof the code for Edge or legacy browsers, if the tightly-wrapped phrase is followed inline (possibly after some intervening neutral characters) by a number, or is one of a list of separate phrases with the same direction, then add a directional mark (RLM or LRM) immediately after the markup of that phrase. more
- Only use Unicode control characters for bidirectional control in attribute text or element text that allows no internal markup. more
- Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes. more
- Do not leave white space at the end of inline elements that mark a directional boundary. more
How to's
-
Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:
-
Handling inline bidirectional text in HTML
Brief steps for marking up any type of inline bidirectional text. Following sections give worked examples.
-
Use Unicode control characters where markup isn't allowed.
-
-
Why does my browser collapse spaces between Latin and Arabic/Hebrew text?
-
CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based markup languages?
-
Unicode controls vs. markup for bidi support
To correctly format bidi text in (X)HTML or XML content, should I use Unicode control codes or markup?
-
Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do?
Spec links
Tests
Go to:
top of this section • top of this page • techniques home pageHandling parentheses and other mirrored characters
- Treat mirrored characters as if any word
leftin the name meant 'opening', andrightmeant 'closing'. more
How to's
-
Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:
-
Understanding how parentheses and other mirroring characters work in bidirectional text.
-
Go to:
top of this section • top of this page • techniques home pageOverriding the Unicode bidirectional algorithm
- Use the
bdoelement to force the directionality of a sequence of inline characters. more
How to's
-
Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:
-
How to disable the bidi algorithm, when needed.
-
Spec links
Tests
Go to:
top of this section • top of this page • techniques home pageCreating vertical text
How to's
-
Styling vertical Chinese, Japanese, Korean and Mongolian text (Draft)
How to use CSS to create vertical text, and what is currently supported. Includes:
-
Use writing-mode to achieve the basic direction.
-
Changing the glyph orientation for embedded text
How to make non-native text stand upright, rather than flow down the page.
-
Make numbers and short texts run horizontally within the vertical line.
-
Working with forms, lists and tables.
-
Spec links
Other links
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Go to:
top of this section • top of this page • techniques home pageStyling & layout
Preparing for text expansion during translation
- Ensure that your graphic backgrounds can automatically expand with the text they are related to, avoid highly constrained spaces, and anticipate that the box containing your text may grow during translation. more
How to's
-
Background images that support localization
How can I ensure that when text expands in translation the background images will still work?
Background reading
-
Overview of text expansion issues.
-
Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries?
-
Douglas Bowman's article in A List Apart about how to layer background images, allowing them to slide over each other to create certain effects. (A note from the editors: While brilliant for its time, this article no longer reflects modern best practices.)
Go to:
top of this section • top of this page • techniques home pageStyling by language
- Use :lang to set language-specific styling. more
How to's
-
Styling using the lang attribute
Compares :lang, lang |= and lang= selectors, for both HTML and XML. Includes:
-
The :lang() pseudo-class selector
How to use it.
-
Using CSS selectors in XML with xml:lang
Dealing with namespaces in documents served as XML.
-
-
How language tags work and where to find which one to use.
Spec links
Tests
Go to:
top of this section • top of this page • techniques home pageStyling lists
How to's
-
How to define your own counter styles when the pre-defined styles aren't fitting your needs.
-
Cut-and-paste templates for a large number of international counter styles that can be used for ordered lists and other such counters.
Useful reference links
-
Allows you to convert ASCII numbers into other representations that can be used for ordered list counters, headings, etc, using the algorithms described by CSS3 Counter Styles.
-
Typography index: Lists, counters, etc
Links to information about lists and counter-styles in the typography index.
Spec links
Tests
Show more links
-
CSS3 and International Text: Lists
Preview of upcoming proposals for CSS3 written in 2003.
Go to:
top of this section • top of this page • techniques home pageManaging line breaks
How to's
-
Specifies whether or not the browser should insert line breaks wherever the text would otherwise overflow its content box.
-
CSS property used to specify how (or if) to break lines when working with punctuation and symbols. Only affects text in Chinese, Japanese, or Korean (CJK).
-
Specifies how words should be hyphenated when text wraps across multiple lines. Also includes a table of supported languages in browsers.
Useful reference links
-
Typography index: Line-breaking
Links to information about line-breaking in the typography index.
-
Links to information about hyphenation in the typography index.
Background reading
-
High level summary of various typographic strategies for wrapping text at the end of a line, for a variety of scripts.
Spec links
Tests
Show more links
-
CSS3 and international text: Line breaking
Preview of upcoming proposals for CSS3 written in 2003.
Go to:
top of this section • top of this page • techniques home pageJustifying and aligning text
How to's
-
Specifies the horizontal alignment of an inline or table-cell box, including the value
justify, which is used to turn on justification. -
Defines what type of justification should be applied to text when it is justified (ie. when
text-align:justifyis set). Values includeinter-wordandinter-character.
Useful reference links
-
Typography index: Justification & line-end alignment
Links to information about justification in the typography index.
Background reading
-
High level summary of various typographic strategies for fully justifying text on a line and in a paragraph for a variety of scripts, and some advice for authors and implementers.
Spec links
Tests
Show more links
-
CSS3 and international text: Line breaking
Preview of upcoming proposals for CSS3 written in 2003.
-
CSS3 and international text: Text spacing
Preview of upcoming proposals for CSS3 written in 2003.
Go to:
top of this section • top of this page • techniques home pageStyling ruby text
How to's
-
Discusses how to use CSS styling to affect the rendering of ruby content.
-
Defines the distribution of the different ruby elements over the base.
Useful reference links
-
Typography index: Ruby annotation
Links to requirements for ruby in the typography index.
Background reading
-
What is 'ruby'?
-
CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)
Spec links
Tests
-
Includes tests for
ruby-position,ruby-align,ruby-merge, and ruby autohide
Show more links
-
Introduction to styling ruby with CSS3 Ruby Module. In W3C article, Ruby Markup and Styling.
Go to:
top of this section • top of this page • techniques home pageApplying various script-specific typographic conventions
Other links
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
-
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Go to:
top of this section • top of this page • techniques home pageUsing fonts & webfonts
How to's
-
How to Use Cross Browser Web Fonts
Useful tutorial on how to use webfonts, and some things to look out for.
-
Fonts supplied with Windows and Mac OS X, by script
Lists of fonts provided by the Windows7/8 and Mac OS X SnowLeopard/Lion operating systems, grouped by non-Latin script. Useful to set font-family styles for CSS.
Go to:
top of this section • top of this page • techniques home pageWorking with date formats
How to's
-
How do I prepare my web pages to display varying international date formats?
Go to:
top of this section • top of this page • techniques home pageWorking with personal names
- Ask yourself whether you really need to have separate fields for given name and family name. more
- Make input fields long enough to enter long names, and ensure that if the name is displayed on a web page later there is enough space for it. more
- Avoid limiting the field size for names in your database. more
- Try to avoid using the labels 'first name' and 'last name' in non-localized forms. more
- Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose. more
- Ask separately, when setting up a profile for example, how that person would like you to address them. more
- If you have separate fields for parts of a person's name, ensure that you label clearly which parts you want where more
- Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more
- Be as clear as possible about telling people how to specify their name. more
- Don't assume that a single letter name is an initial. more
- Don't require that people supply a family name. more
- Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more
- Don't require names to be entered all in upper case. more
- Allow the user to enter a name with spaces. more
- Don't assume that members of the same family will share the same family name. more
- It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more
- If you hope to get Latin- or ASCII-only, you need to tell the user. more
- You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, using separate fields. more
- If you do accept non-ASCII names, you should use a Unicode character encoding (eg. UTF-8) in your pages, your back end databases and in all the software code in between. more
How to's
-
Personal names around the world
How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?