Literal values are ordered and indexed using a skip list. The aim of this index is threefold.
library(semweb/litindex)
.
As string literal matching is most frequently used for searching
purposes, the match is executed case-insensitive and after removal of
diacritics. Case matching and diacritics removal is based on Unicode
character properties and independent from the current locale. Case
conversion is based on the‘simple uppercase mapping’defined
by Unicode and diacritic removal on the‘decomposition type’.
The approach is lightweight, but somewhat simpleminded for some
languages. The tables are generated for Unicode characters upto 0x7fff.
For more information, please check the source-code of the mapping-table
generator
unicode_map.pl
available in the sources of this package.
Currently the total order of literals is first based on the type of literal using the ordering numeric < string < term Numeric values (integer and float) are ordered by value, integers preceed floats if they represent the same value. Strings are sorted alphabetically after case-mapping and diacritic removal as described above. If they match equal, uppercase preceeds lowercase and diacritics are ordered on their unicode value. If they still compare equal literals without any qualifier preceeds literals with a type qualifier which preceeds literals with a language qualifier. Same qualifiers (both type or both language) are sorted alphabetically.
The ordered tree is used for indexed execution of
literal(prefix(Prefix), Literal)
as well as literal(like(Like), Literal)
if Like does not start with a‘*’. Note that results
of queries that use the tree index are returned in alphabetical order.