SWI-Prolog -- Literal matching and indexing

3.3.8 Literal matching and indexing

Literal values are ordered and indexed using a skip list. The aim of this index is threefold.

Unlike hash-tables, binary trees allow for efficient prefix and range matching. Prefix matching is useful in interactive applications to provide feedback while typing such as auto-completion.
Having a table of unique literals we generate creation and destruction events (see rdf_monitor/2). These events can be used to maintain additional indexing on literals, such as‘by word’. See library(semweb/litindex).

As string literal matching is most frequently used for searching purposes, the match is executed case-insensitive and after removal of diacritics. Case matching and diacritics removal is based on Unicode character properties and independent from the current locale. Case conversion is based on the‘simple uppercase mapping’defined by Unicode and diacritic removal on the‘decomposition type’. The approach is lightweight, but somewhat simpleminded for some languages. The tables are generated for Unicode characters up to 0x7fff. For more information, please check the source-code of the mapping-table generator unicode_map.pl available in the sources of this package.

Currently the total order of literals is first based on the type of literal using the ordering numeric < string < term Numeric values (integer and float) are ordered by value, integers precede floats if they represent the same value. Strings are sorted alphabetically after case-mapping and diacritic removal as described above. If they match equal, uppercase precedes lowercase and diacritics are ordered on their unicode value. If they still compare equal literals without any qualifier precedes literals with a type qualifier which precedes literals with a language qualifier. Same qualifiers (both type or both language) are sorted alphabetically.

The ordered tree is used for indexed execution of literal(prefix(Prefix), Literal) as well as literal(like(Like), Literal) if Like does not start with a‘*’. Note that results of queries that use the tree index are returned in alphabetical order.