Links
- Formatting Messages | ICU User Guide
Pattern_Syntax
andPattern_Whitespace
- Unicode pattern syntax.
argNameOrNumber
inMessageFormat
(the first argument) is composed of either digits or anything that is NOT fromPattern_Syntax
andPattern_Whitespace
(so excludes all punctuation, spaces, etc.)
- Unicode pattern syntax.
- JS Lib: SlexAxton/messageformat.js
Patterns and Their Interpretation
Ref: icu-project.org/…/MessageFormat.html, Pattern_Syntax
MessageFormat
uses patterns of the following form:
message = messageText (argument messageText)* argument = noneArg | simpleArg | pluralArg | selectArg | selectordinalArg noneArg = '{' argNameOrNumber '}' simpleArg = '{' argNameOrNumber ',' argType [',' argStyle] '}' pluralArg = '{' argNameOrNumber ',' "plural" ',' pluralStyle '}' selectArg = '{' argNameOrNumber ',' "select" ',' selectStyle '}' selectordinalArg = '{' argNameOrNumber ',' "selectordinal" ',' pluralStyle '}' argNameOrNumber = argName | argNumber argName = [^[[:Pattern_Syntax:][:Pattern_White_Space:]]]+ argNumber = '0' | ('1'..'9' ('0'..'9')*) argType = "number" | "date" | "time" | "spellout" | "ordinal" | "duration" argStyle = "short" | "medium" | "long" | "full" | "integer" | "currency" | "percent" | argStyleText pluralStyle = [offsetValue] ( explicitValue|pluralKeyword '{' message '}')+ // the "other" pluralKeyword is required. offsetValue = "offset:" number explicitValue = '=' number // adjacent, no white space in between pluralKeyword = 'zero' | 'one' | 'two' | 'few' | 'many' | 'other' | keyword selectStyle = (selectKeyword '{' message '}')+ // the "other" selectKeyword is required. selectKeyword = 'other' | keyword keyword = [^[[:Pattern_Syntax:][:Pattern_White_Space:]]]+
Pattern_White_Space
between syntax elements is ignored, except:
- between the {curly braces} and their sub-message
- between the '=' and the number in
explicitValue
(i.e. there must be no space between them. e.g. "=1")
Plurals
Predefined keyword
's: 'zero
', 'one
', 'two
', 'few
', 'many
' and 'other
'.
You must always define message text for the
other
case (it's the fallback.)
Matching Priority / Algorithm
- Exact Matches
- Match the input number against the
explicitValue
clauses. If found, use thatmessageText
and return.
- Match the input number against the
- Keyword Matches
- set
keyword = PluralRules(input_number - offset)
(offset
defaults to 0) - Use clause corresponding to this
keyword
and return (if found).
- set
- Fallback
- Use the
messageText
corresponding to theother
clause.
- Use the
Formatting
- Format
number-minus-offset
using aNumberFormat
for thePluralFormat
's locale.- If you need special number formatting, you have to
use a
MessageFormat
and explicitly specify aNumberFormat
argument. (Note that argument is formatting without subtracting the offset! If you need a custom format and have a non-zero offset, then you need to pass thenumber-minus-offset
value as a separate parameter.)
- If you need special number formatting, you have to
use a
- Replace an unquoted pound sign (
#)
in the selected sub-message by the formattednumber-minus-offset
value from the previous step.
Gender and "select"
The main use case for the select format (selectArg
) is gender based inflection.
When names or nouns are inserted into sentences, their gender can affect pronouns, verb forms, articles, and adjectives. Special care needs to be taken for the case where the gender cannot be determined. The impact varies between languages:
- English has three genders, and unknown gender is handled as a special case. Names use the gender of the named person (if known), nouns referring to people use natural gender, and inanimate objects are usually neutral. The gender only affects pronouns: "he", "she", "it", "they".
- German differs from English in that the gender of nouns is rather arbitrary, even for nouns referring to people ("Mädchen", girl, is neutral). The gender affects pronouns ("er", "sie", "es"), articles ("der", "die", "das"), and adjective forms ("guter Mann", "gute Frau", "gutes Mädchen").
- French has only two genders; as in German the gender of nouns is rather arbitrary - for sun and moon, the genders are the opposite of those in German. The gender affects pronouns ("il", "elle"), articles ("le", "la"), adjective forms ("bon", "bonne"), and sometimes verb forms ("allé", "allée").
- Polish distinguishes five genders (or noun classes), human masculine, animate non-human masculine, inanimate masculine, feminine, and neuter.
- Noun clauses: Some other languages have noun classes that are not related to gender, but similar in grammatical use. Some African languages have around 20 noun classes.
The fallback keyword is "other" (just like with pluralization.) Some common keywords are: "male", "female", "mixed" (for groups of people) and "Unknown".
Quoting
messageText
can contain quoted literal strings including syntax characters.- A quoted literal string begins with an ASCII apostrophe and a syntax character (usually a curly brace/{}) and continues until the next single apostrophe.
- A double ASCII apostrohpe inside or outside of a quoted string represents one literal apostrophe.
- Quotable syntax characters are the curly braces ("
{
", "}
") in allmessageText
parts, plus the "#
" sign in amessageText
immediately inside apluralStyle
. - See also
MessagePattern.ApostropheMode
- In
argStyleText
, every single ASCII apostrophe begins and ends quoted literal text, and unquoted {curly braces} must occur in matched pairs.
Recommendation: Use the real apostrophe character, «’» (U+2019),
for human-readable text, and use the ASCII
apostrophe, «'» (U+0027), only in program syntax, like
quoting in MessageFormat
. See the annotations for U+0027
Apostrophe in The Unicode Standard.