W3C官方:HTML & CSS 国际化技术实践

794 阅读30分钟
原文链接: www.w3.org
Characters
Getting started
Background reading
Go to:
top of this sectiontop of this pagetechniques home page
Choosing and applying a character encoding
  • Choose UTF-8 for all content. more
  • If you really can't use a Unicode encoding, use only those legacy encodings listed in the Encoding specification. more
  • Avoid the following encodings: UTF-16, UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC, CESU-8, UTF-7, BOCU-1, and SCSU. more
How to's
Useful reference links
  • Encoding, 5.2 Names and labels

    If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Background reading
  • Who uses Unicode?

    Are corporate Web sites using Unicode right now?  This article is somewhat outdated, now that Unicode accounts for around 80% of pages on the Web.

  • Document character set

    What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?

Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Changing to UTF-8
  • Save the data as UTF-8, don't just change the encoding declaration. more
  • Declare the encoding in your page. more
  • Ensure that your server does the right thing. more
How to's
Background reading
  • Document character set

    What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?

Go to:
top of this sectiontop of this pagetechniques home page
Declaring the character encoding for HTML
  • Use the HTTP header if it is available. more
  • Always use an in-document encoding declaration, even if you are also using the HTTP header. more
  • Ensure that the encoding declaration fits within the first 1024 bytes of the page. more
  • If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
  • Do not use the charset attribute on a or link elements. more
How to's
Useful reference links
  • Encoding, 5.2 Names and labels

    If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Background reading
  • Serving HTML & XHTML

    Introduces doctypes, mime-types, and the influence of standards- vs. quirks-mode on character encoding declarations.

  • Handling character encodings in HTML and CSS

    Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.

Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Declaring the character encoding for a CSS style sheet
  • If you use UTF-8 as the character encoding for your style sheets and your HTML pages, and declare that encoding in your HTML, there is no need to declare the encoding for your style sheet. more
  • If you use @charset, ensure that nothing (except a BOM) comes before it in the style sheet, and use the exact syntax. more
  • If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
  • Do not use the charset attribute on a or link elements. more
How to's
Useful reference links
  • Encoding, 5.2 Names and labels

    If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.

Spec links
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Using escapes to represent characters
  • Avoid using escapes whenever possible. When you use UTF-8 it supports all the characters you need. more
  • Use escapes for invisible or ambiguous characters. more
  • Use CSS escapes for CSS embedded in HTML, rather than HTML escapes. more
  • Always use Unicode codepoints for the numeric part of a character escape. Do not use codepoint values of non-Unicode encodings. more
  • Use a single escape (representing the Unicode codepoint value) for supplementary characters. Do not escape surrogate character pairs. more
  • Ensure that all href attribute values have escaped ampersands in query parameters, ie. & rather than just &. more
  • Avoid named character entities in XHTML. more
How to's
Useful reference links
Spec links
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Checking the encoding of a document
How to's
Useful reference links
Go to:
top of this sectiontop of this pagetechniques home page
Handling the byte-order mark (BOM)
  • If you use the byte-order mark with UTF-8-encoded pages, check that any scripts and back-end processes can handle the BOM. more
  • If you ignored the advice above and encoded your page as UTF-16, always ensure that it starts with a BOM. more
How to's
Useful reference links
Spec links
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Handling character normalization
  • Ensure that all HTML class names and CSS selectors are saved using the same Unicode normalization form (NFC is recommended). more
How to's
Useful reference links
Go to:
top of this sectiontop of this pagetechniques home page
Handling encoding issues in forms
  • Use UTF-8 for the character encoding of your page. more
  • Consider checking on the server that form data is arriving in UTF-8. more
How to's
  • Multilingual form encoding

    What is the best way to deal with encoding issues in forms that may use multiple languages and scripts?

Go to:
top of this sectiontop of this pagetechniques home page
Using Unicode control codes
  • Don't use Unicode characters if there is markup to do the same job. more
  • Use character escapes to represent control codes, so that they are visible. more
How to's
Go to:
top of this sectiontop of this pagetechniques home page
Working around unavailable characters/glyphs
How to's
Go to:
top of this sectiontop of this pagetechniques home page
Using non-ASCII web addresses
Useful reference links
Spec links
Background reading
Other links
Go to:
top of this sectiontop of this pagetechniques home page
Language
Getting started
Background reading
Go to:
top of this sectiontop of this pagetechniques home page
Declaring the overall language of a page
  • Always declare the default language for text in the page using attributes on the html tag. more
  • Do NOT use the meta element with the content attribute set to Content-Language. more
  • Use language attributes rather than HTTP to declare the default language for 'text processing' (ie. when language needs to be known for things such as font choice, styling, spell-checking, hyphentation, quote mark styling, etc.). more
  • Do not declare the default language of a document in the body element, use the html element. more
  • Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the html tag, or leave the languages undefined until later.
  • Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only. more
How to's
Background reading
Spec links
Tests
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Identifying in-document language changes
  • When the page contains content in another language, add a language attribute to an element surrounding that content. more
  • For HTML use the lang attribute only, for XHTML 1.0 served as text/html use the lang and xml:lang attributes, and for XHTML served as XML use the xml:lang attribute only. more
  • If the text in attribute values and element content is in different languages, consider using a nested approach. more
How to's
Background reading
Spec links
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Choosing language tags
  • Use subtags as defined by BCP 47 for language attribute values. more
  • Use the shortest possible language tag values. more
  • Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively. more
  • Use the subtag zxx when the text is known to be not in any language. more
  • When the language is undetermined and you have to label it, use lang="". more
  • If you are serving XML, and the format you are using supports it, use xml:lang="", otherwise use xml:lang="und" when the language is undetermined and you have to label it. more
How to's
Useful reference links
Spec links
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Declaring metadata about the language(s) of the intended audience
  • Consider using a Content-Language HTTP header to declare metadata about the language(s) of the intended audience of a document. more
  • Where a document contains content aimed at speakers of more than one language, use the HTTP Content-Language header with a comma-separated list of language tags. more
How to's
Spec links
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Indicating the language of a link destination
  • When pointing to a resource in another language, consider the pros and cons before indicating the language of the target document. more
  • If you want to indicate that the target document of an a element is in another language, consider the pros and cons before using hreflang with CSS. more
  • Do not use flag icons to indicate languages. more
How to's
Spec links
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Setting & changing browser language preferences
How to's
Useful reference links
Go to:
top of this sectiontop of this pagetechniques home page
Using Accept-Language for locale setting
How to's
Go to:
top of this sectiontop of this pagetechniques home page
Markup & text
Getting started
How to's
Go to:
top of this sectiontop of this pagetechniques home page
Working with composite strings and string re-use
  • Use a topic-comment approach whenever possible. more
  • Avoid sentence-like arrangements when they contain substrings that are predefined translatable text or numeric text. more
  • Use sentence-like arrangements with care if you have non-numeric and non-translatable text substrings (ie. text created at runtime). more
  • Where the parts of a composite message appear in separate locations, provide the translator with contextual information to show how the various parts of a composite message relate to each other. more
  • Provide information to the translator, where needed, to clarify what a substring represents. more
  • When requested by the localization group, be prepared to provide information about the size of each substring. more
  • Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase. more
  • Reused strings must not refer to more than one text, graphic or conceptual context. more
  • If in doubt as to whether a string is a good candidate for re-use, don't. more
  • If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box. more
How to's
  • Working with Composite Messages

    Why you need to be very careful about splitting up and reusing text on-screen. The linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve.

  • Re-using Strings in Scripted Content

    Things to be aware of if you plan to use the same text string in different places on your site or user interface.

Useful reference links
Go to:
top of this sectiontop of this pagetechniques home page
Using ruby markup
How to's
  • Ruby markup

    Discusses how to use ruby markup in HTML5, and has pointers to what currently works in browsers.

Useful reference links
Background reading
  • What is ruby?

    What are 'ruby' annotations?

  • Bopomofo on the Web

    A summary of how bopomofo is used and the implications for support on the Web.

  • Use Cases & Exploratory Approaches for Ruby Markup

    Discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases and how well they are supported by the various markup models.

  • CJKV Information Processing

    Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)

Spec links
Tests
Show more links
  • Ruby extension markup

    Looks at various possible models for marking up ruby, with a view to informing discussion about the models found in the HTML5 draft of 25 October 2012 and the proposed Ruby Extension Spec as of 25 February 2013.

  • Ruby Annotation Recommendation

    W3C Recommendation that defines markup for ruby, in the form of an XHTML module. The HTML5 markup model should eventually replace this specification.

  • CSS3 Ruby Module

    W3C Working Draft that defines how ruby elements can be styled in various different ways. This draft is likely to change significantly as it is reworked to support the HTML5 markup model. (For more about styling ruby see Styling ruby text).

  • XHTML 1.1, 3. The XHTML 1.1 Document Type

    Ruby Annotation inthe XHTML 1.1 spec (bottom of the page)

  • Implementing the Ruby Module

    Sample module implementations of the Ruby Annotation Specification in several schemas (W3C Personal Note)

Go to:
top of this sectiontop of this pagetechniques home page
Using b and i tags
  • Use the class attribute on a b or i element to identify why the element is being used. more
  • Consider whether other elements might be more applicable than the b or i element because they carry the right semantics. more
How to's
Spec links
Go to:
top of this sectiontop of this pagetechniques home page
Working with form controls
How to's
  • Sorting select options

    As part of a form, I have a list of terms in a drop-down box. Why are they not correctly sorted when I translate the items in the list?

Go to:
top of this sectiontop of this pagetechniques home page
Indicating what should and should not be translated
  • Use the translate attribute on an element to prevent its content being translated by online translation services or by computer-assisted translation tools. more
How to's
Spec links
Tests
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Text direction
Getting started
How to's
Go to:
top of this sectiontop of this pagetechniques home page
Setting up a right-to-left page
  • Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
  • Add dir="rtl" to the html tag any time the overall document direction is right-to-left. more
  • Don't add dir="rtl" to the body tag. more
  • If you need to avoid the scroll bar moving on some browsers, put dir on the head element and a div just inside the body element. more
  • Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding. more
  • If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8. more
  • Do not use CSS styling to control directionality in HTML. Use markup. more
How to's
Spec links
Tests
Go to:
top of this sectiontop of this pagetechniques home page
Setting direction on block elements
  • Add the dir attribute to a block element to change base direction. more
  • Do not use CSS styling to control directionality in HTML. Use markup. more
  • Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
How to's
Spec links
Tests
Go to:
top of this sectiontop of this pagetechniques home page
Managing text direction in form controls
  • Add dir="auto" to input tags to automatically align text to the correct side of an input field. more
  • Add dir="auto" to textarea and pre tags to make paragraphs align to the left or right according to the intial strong character more
  • Consider using the dirname attribute to pass information to the server about the direction of text in a text or search form control. more
How to's
Spec links
Tests
Go to:
top of this sectiontop of this pagetechniques home page
Mixing text direction inline
  • Tightly wrap every opposite-direction phrase in markup that sets its base direction. more
  • If you know the phrase's direction, wrap it in an element with a dir attribute. If you don't already have an element around the text, use span or bdi. more
  • If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either wrap the phrase in bdi (no dir attribute needed), or if the phrase is tightly wrapped by an element already, just add dir="auto" to that element. more
  • To bulletproof the code for Edge or legacy browsers, if the tightly-wrapped phrase is followed inline (possibly after some intervening neutral characters) by a number, or is one of a list of separate phrases with the same direction, then add a directional mark (RLM or LRM) immediately after the markup of that phrase. more
  • Only use Unicode control characters for bidirectional control in attribute text or element text that allows no internal markup. more
  • Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes. more
  • Do not leave white space at the end of inline elements that mark a directional boundary. more
How to's
Spec links
Tests
Go to:
top of this sectiontop of this pagetechniques home page
Handling parentheses and other mirrored characters
  • Treat mirrored characters as if any word left in the name meant 'opening', and right meant 'closing'. more
How to's
Go to:
top of this sectiontop of this pagetechniques home page
Overriding the Unicode bidirectional algorithm
  • Use the bdo element to force the directionality of a sequence of inline characters. more
How to's
Spec links
Tests
Go to:
top of this sectiontop of this pagetechniques home page
Creating vertical text
How to's
Spec links
Other links
  • Text direction

    Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

Go to:
top of this sectiontop of this pagetechniques home page
Styling & layout
Getting started
How to's
Go to:
top of this sectiontop of this pagetechniques home page
Preparing for text expansion during translation
  • Ensure that your graphic backgrounds can automatically expand with the text they are related to, avoid highly constrained spaces, and anticipate that the box containing your text may grow during translation. more
How to's
Background reading
  • Text size in translation

    Overview of text expansion issues.

  • Display capabilities

    Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries?

  • Sliding Doors of CSS

    Douglas Bowman's article in A List Apart about how to layer background images, allowing them to slide over each other to create certain effects. (A note from the editors: While brilliant for its time, this article no longer reflects modern best practices.)

Go to:
top of this sectiontop of this pagetechniques home page
Styling by language
  • Use :lang to set language-specific styling. more
How to's
Spec links
Tests
Go to:
top of this sectiontop of this pagetechniques home page
Styling lists
How to's
  • MDN: @counter-style

    How to define your own counter styles when the pre-defined styles aren't fitting your needs.

  • Ready-made Counter Styles

    Cut-and-paste templates for a large number of international counter styles that can be used for ordered lists and other such counters.

Useful reference links
  • Counter styles converter

    Allows you to convert ASCII numbers into other representations that can be used for ordered list counters, headings, etc, using the algorithms described by CSS3 Counter Styles.

  • Typography index: Lists, counters, etc

    Links to information about lists and counter-styles in the typography index.

Spec links
Tests
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Managing line breaks
How to's
  • MDN: word-break

    Specifies whether or not the browser should insert line breaks wherever the text would otherwise overflow its content box.

  • MDN: line-break

    CSS property used to specify how (or if) to break lines when working with punctuation and symbols. Only affects text in Chinese, Japanese, or Korean (CJK).

  • MDN: hyphens

    Specifies how words should be hyphenated when text wraps across multiple lines. Also includes a table of supported languages in browsers.

Useful reference links
Background reading
  • Approaches to line breaking

    High level summary of various typographic strategies for wrapping text at the end of a line, for a variety of scripts.

Spec links
Tests
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Justifying and aligning text
How to's
  • MDN: text-align

    Specifies the horizontal alignment of an inline or table-cell box, including the value justify, which is used to turn on justification.

  • MDN: text-align

    Defines what type of justification should be applied to text when it is justified (ie. when text-align:justify is set). Values include inter-word and inter-character.

Useful reference links
Background reading
  • Approaches to justification

    High level summary of various typographic strategies for fully justifying text on a line and in a paragraph for a variety of scripts, and some advice for authors and implementers.

Spec links
Tests
Show more links
Go to:
top of this sectiontop of this pagetechniques home page
Styling ruby text
How to's
  • Ruby Styling (draft!)

    Discusses how to use CSS styling to affect the rendering of ruby content.

  • MDN: ruby-align

    Defines the distribution of the different ruby elements over the base.

Useful reference links
Background reading
  • Ruby

    What is 'ruby'?

  • CJKV Information Processing

    Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)

Spec links
Tests
  • CSS3 Ruby

    Includes tests for ruby-position, ruby-align, ruby-merge, and ruby autohide

Show more links
  • Ruby style

    Introduction to styling ruby with CSS3 Ruby Module. In W3C article, Ruby Markup and Styling.

Go to:
top of this sectiontop of this pagetechniques home page
Applying various script-specific typographic conventions
Other links
  • Document grids

    Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

  • Kumimoji and warichu

    Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

  • Emphasis

    Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.

Go to:
top of this sectiontop of this pagetechniques home page
Using fonts & webfonts
How to's
Go to:
top of this sectiontop of this pagetechniques home page
Working with date formats
How to's
  • Date formats

    How do I prepare my web pages to display varying international date formats?

Go to:
top of this sectiontop of this pagetechniques home page
Working with personal names
  • Ask yourself whether you really need to have separate fields for given name and family name. more
  • Make input fields long enough to enter long names, and ensure that if the name is displayed on a web page later there is enough space for it. more
  • Avoid limiting the field size for names in your database. more
  • Try to avoid using the labels 'first name' and 'last name' in non-localized forms. more
  • Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose. more
  • Ask separately, when setting up a profile for example, how that person would like you to address them. more
  • If you have separate fields for parts of a person's name, ensure that you label clearly which parts you want where more
  • Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more
  • Be as clear as possible about telling people how to specify their name. more
  • Don't assume that a single letter name is an initial. more
  • Don't require that people supply a family name. more
  • Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more
  • Don't require names to be entered all in upper case. more
  • Allow the user to enter a name with spaces. more
  • Don't assume that members of the same family will share the same family name. more
  • It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more
  • If you hope to get Latin- or ASCII-only, you need to tell the user. more
  • You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, using separate fields. more
  • If you do accept non-ASCII names, you should use a Unicode character encoding (eg. UTF-8) in your pages, your back end databases and in all the software code in between. more
How to's
  • Personal names around the world

    How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?

Go to:
top of this sectiontop of this pagetechniques home page
Navigation
Getting started
Background reading
Go to:
top of this sectiontop of this pagetechniques home page
Linking to localized content
  • Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer. more
  • Consider how to indicate to the user where the in-page language links are, and if the page is available in a long list of languages, consider whether or not to use something like a select control (and if so, how to make it obvious what its function is). more
  • Locate pull-down menus or selection lists at or near the top of the page. more
  • Use a recognizable image alongside a pull-down menu to indicate that it is a control which will take the user to localized pages. Do not use text. more
  • Consider using the size attribute to display the first set of options in a select control. more
  • Translate the links or options into the target language. more
  • Encode your page as UTF-8, so that it supports the necessary characters. more
  • Decide whether it is a problem that a user won't have fonts for all the list items or menu options. If it is, use javascript menus or some other graphic-based approach. more
  • Decide whether to add a description alongside each option, using the language of the current page, so that users can tell what the native word means. more
  • Find the most appropriate way of ordering the list of options. more
How to's
  • Guiding users to translated pages

    If my site contains alternative language versions of the same page, what can I do to help the user see the page in their preferred language?

  • Using <select> to Link to Localized Content

    What are the best practices for using pull-down menus based on the select element to direct visitors to localized content?

  • About languages and flags

    On some Web pages you’ll find country flags as symbols for languages. This article explains why this approach is problematic, and what you should do instead.

Go to:
top of this sectiontop of this pagetechniques home page
Using content negotiation
  • Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer. more
  • If the user switches to a different language, offer them the opportunity to remember that choice and serve up subsequent pages in that language, overriding their browser settings. more
How to's
Background reading
Go to:
top of this sectiontop of this pagetechniques home page