All Packages Class Hierarchy This Package Previous Next Index
java.lang.Object
|
+----java.text.Collation
|
+----java.text.TableCollation
Table Collation has the following restrictions for efficiency (other subclasses may be used for more complex languages) :
1. The French secondary ordering is applied to the whole collation object.
2. All non-mentioned Unicode characters are at the end of the collation order.
3. Private use characters are treated as identical. The private use area in Unicode is 0xE800-0xF8FF.
The collation table is composed of a list of collation rules, where each rule is of three forms:
< modifier >
< relation > < text-argument >
< reset > < text-argument >
The following demonstrates how to create your own collation rules:
'@' : Indicates that accents are sorted backwards, as in French.
'&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted. The reset does not put the text-argument into the sorting sequence.
This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:
a < b < c
a < b & b < c
a < c & a < b
Notice that the order is important, as the subsequent item goes immediately
after the text-argument. The following are not equivalent:
a < b & a < c
a < c & a < b
Either the text-argument must already be present in the sequence, or some
initial substring of the text-argument must be present. (e.g. "a < b & ae <
e" is valid since "a" is present in the sequence before "ae" is reset). In
this latter case, "ae" is not entered and treated as a single character;
instead, "e" is sorted as if it were expanded to two characters: "a"
followed by an "e". This difference appears in natural languages: in
traditional Spanish "ch" is treated as though it contracts to a single
character (expressed as "c < ch < d"), while in traditional German "ä"
(a-umlaut) is treated as though it expands to two characters (expressed as
"a & ae ; ä < b").
Ignorable Characters
For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the all text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable.
Normalization and Accents
The Collation object automatically normalizes text internally to separate accents from base characters where possible. This is done both when processing the rules, and when comparing two strings. Collation also uses the Unicode canonical mapping to ensure that combining sequences are sorted properly (for more information, see The Unicode Standard, Version 2.0.)
Errors
The following are errors:
Examples
Simple: "< a < b < c < d"
Norwegian: "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J < k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T < u,U< v,V< w,W< x,X< y,Y< z,Z < å=a?,Å=A? ;aa,AA< æ,Æ< ø,Ø"
To create a table-based collation object, simply supply the collation rules to the TableCollation contructor. For example:
TableCollation mySimple = new TableCollation(Simple);
Another example:
TableCollation myNorwegian = new TableCollation(Norwegian);
To add rules on top of an existing table, simply supply the orginal rules
and modifications to TableCollation constructor. For example,
Traditional Spanish (fragment): ... & C < ch , cH , Ch , CH ...
German (fragment) : ...< y , Y < z , Z & AE , A? & Ae ; a? & OE , O? & Oe ; o? & UE , U? & Ue ; u?
Symbols (fragment) : ...< y, Y < z , Z & Question-mark ; ? & Ampersand ; '&' & Dollar-sign ; $
To create a collation object for traditional Spanish, the user can take the English collation rules and add the additional rules to the table. For example:
TableCollation mySpanish = new
TableCollation(CollationRules.DEFAULTRULES +
"& C < ch, cH, Ch, CH");
In order to sort symbols in the similiar order of sorting their alphabetic equivalents, you can do the following,
TableCollation myTable = new
TableCollation(CollationRules.DEFAULTRULES +
"& Question-mark ; ?" +
"& Ampersand ; '&'" +
"& Dollar-sign ; $");
Another way of creating the table-based collation object, mySimple, is:
TableCollation mySimple = new
TableCollation(" < a < b & b < c & c < d");
Or,
TableCollation mySimple = new
TableCollation(" < a < b < d & b < c");
Because " < a < b < c < d" is the same as "a < b < d & b < c" or
"< a < b & b < c & c < d".
NOTE: Typically, a collation object is created with Collation.getDefault().
public TableCollation(String rules) throws FormatException
public String getRules()
public byte compare(String source, String target)
public byte compare(String source, int start, int end, String target, int targetStart, int targetEnd)
public SortKey getSortKey(String source)
public SortKey getSortKey(String source, int start, int end)
public CollationKey getCollationKey(String source)
public CollationKey getCollationKey(String source, int start, int end)
public Object clone()
public boolean equals(Object obj)
public int hashCode()
All Packages Class Hierarchy This Package Previous Next Index