fn:collation-key
Given a string value and a collation, generates an internal value called a collation key, with the property that the matching and ordering of collation keys reflects the matching and ordering of strings under the specified collation.
Signatures
fn:collation-key($key as xs:string) as xs:base64Binary
fn:collation-key(
$key as xs:string,
$collation as xs:string
) as xs:base64Binary
Properties
This function is deterministic, context-dependent, and focus-independent. It depends on collations.
Rules
Calling the one-argument version of this function is equivalent to calling the two-argument version supplying the default collation as the second argument.
The function returns an implementation-dependent
value with the property that,
for any two strings $K1
and $K2
:
-
collation-key($K1, $C) eq collation-key($K2, $C)
if and only ifcompare($K1, $K2, $C) eq 0
-
collation-key($K1, $C) lt collation-key($K2, $C)
if and only ifcompare($K1, $K2, $C) lt 0
The collation used by this function is determined according to the rules in Choosing a collation. Collation keys are defined as xs:base64Binary
values
to ensure unambiguous and context-free comparison semantics.
An implementation is free to generate a collation key in any convenient way provided that it always generates the same collation key for two strings that are equal under the collation, and different collation keys for strings that are not equal. This holds only within a single execution scope; an implementation is under no obligation to generate the same collation keys during a subsequent unrelated query or transformation.
It is possible to define collations that do not have the ability to generate collation keys. Supplying such a collation will cause the function to fail. The ability to generate collation keys is an implementation-defined property of the collation.
Error Conditions
An error is raised [ERRFOCH0004] if the specified collation does not support the generation of collation keys.
Notes
The function is provided primarily for use with maps. If a map is required where
codepoint equality is inappropriate for comparing keys, then a common technique is
to
normalize the key so that equality matching becomes feasible. There are many ways
keys can be normalized, for example by use of functions such as
fn:upper-case
, fn:lower-case
,
fn:normalize-space
, or fn:normalize-unicode
, but this
function provides a way of normalizing them according to the rules of a specified
collation. For example, if the collation ignores accents, then the function will
generate the same collation key for two input strings that differ only in their use
of
accents.
The result of the function is defined to be an xs:base64Binary
value. Binary values
are chosen because they have unambiguous and context-free comparison semantics, because
the value space
is unbounded, and because the ordering rules are such that between any two values
in the ordered value space, an
arbitrary number of further values can be interpolated. The choice between xs:base64Binary
and xs:hexBinary
is arbitrary; the only operation that behaves differently between the two binary
data types is conversion to/from a string, and this operation is not one that is normally
required for
effective use of collation keys.
For collations based on the Unicode Collation Algorithm, an algorithm for computing collation keys is provided in [UTS #10]. Implementations are not required to use this algorithm.
This specification does not mandate that collation keys should retain ordering. This is partly because the primary use case is for maps, where only equality comparisons are required, and partly to allow the use of binary data types (which are currently unordered types) for the result. The specification may be revised in a future release to specify that ordering is preserved.
The fact that collation keys are ordered can be exploited in XQuery, whose order by
clause does not allow the collation to be selected dynamically. This restriction can
be circumvented
by rewriting the clause order by $e/@key collation "URI"
as order by fn:collation-key($e/@key, $collation)
,
where $collation
allows the collation to be chosen dynamically.
Note that xs:base64Binary
becomes an ordered type
in XPath 3.1, making binary collation keys possible.
Examples
let $C := 'http://www.w3.org/2013/collation/UCA?strength=primary'
The expression map:merge((map{collation-key("A", $C):1}, map{collation-key("a",
$C):2}), map{"duplicates":"use-last"})(collation-key("A", $C))
returns 2
. (Given that the keys of the two entries are equal under the rules of
the chosen collation, only one of the entries can appear in the result; the one
that is chosen is the one from the last map in the input sequence.)
The expression let $M := map{collation-key("A", $C):1, collation-key("B", $C):2}
return $M(collation-key("a", $C))
returns 1
. (The strings "A" and "a" have the same collation key under this
collation.)
As the above examples illustrate, it is important that when the
collation-key
function is used to add entries to a map, then it must
also be used when retrieving entries from the map. This process can be made less
error-prone by encapsulating the map within a function: function($k)
{$M(collation-key($k, $collation)}
.