1 library(unicode): Unicode string handling
All Application Manual Name SummaryHelp

  • Documentation
    • Reference manual
    • Packages
      • SWI-Prolog Unicode library
        • library(unicode): Unicode string handling
          • unicode_map/3
          • unicode_nfd/2
          • unicode_nfc/2
          • unicode_nfkd/2
          • unicode_nfkc/2
          • unicode_nfkc_casefold/2
          • unicode_casefold/2
          • unicode_property/2
          • atom_graphemes/2
          • string_graphemes/2
          • unicode_version/1
          • unicode_codepoint_valid/1
Availability::- use_module(library(unicode)).(can be autoloaded)
Source[det]unicode_map(+In, -Out, +Options)
Perform a Unicode mapping on In, returning Out. Options is a list that may contain any combination of the flags below; a call is roughly equivalent to utf8proc_map(In, Options) in the C API.
stable
Respect Unicode versioning stability --- the result does not depend on which (recent) version of Unicode is in use.
compat
Use compatibility decomposition (i.e. formatting information is lost).
compose
Produce a composed result (e.g. NFC or NFKC, depending on the presence of compat).
decompose
Produce a decomposed result (NFD/NFKD).
ignore
Strip "default ignorable" characters (e.g. soft hyphen, zero-width space).
rejectna
Raise an error instead of returning output when the input contains unassigned code points.
nlf2ls
Convert all NLF-sequences (LF, CRLF, CR, NEL) to U+2028 LINE SEPARATOR.
nlf2ps
Convert all NLF-sequences to U+2029 PARAGRAPH SEPARATOR.
nlf2lf
Convert all NLF-sequences to U+000A LINE FEED.
stripcc
Strip or convert control characters. NLF-sequences become a space, except if one of the NLF-conversion flags is set; HT and FF are treated as NLF in this case. All other control characters are removed.
casefold
Apply Unicode case folding (for caseless comparison).
charbound
Insert a U+00FF byte at the beginning of every grapheme cluster (UAX#29). The result can be split on 0xFF to recover individual graphemes; atom_graphemes/2 wraps this pattern.
lump
Normalise typographic variants to their ASCII equivalents (see module header for the full list). Combined with nlf2lf, paragraph and line separators become U+000A as well.
stripmark
Strip all combining marks (non-spacing, spacing, enclosing). Must be combined with compose or decompose.