With the introduction of strings as a Prolog data type, there are
three main ways to represent text: using strings, using atoms and using
lists of character codes. As a fourth way, one may also use lists of
chars. This section explains what to choose for what purpose. Both
strings and atoms are atomic objects: you can only look inside
them using dedicated predicates, while lists of character codes or chars
are compound data structures forming an extended structure that must
follow a convention.
- Lists of character codes
- is what you need if you want to parse text using Prolog grammar
rules (DCGs, see phrase/3).
Most of the text reading predicates (e.g.,
read_line_to_codes/2)
return a list of character codes because most applications need to parse
these lines before the data can be processed. As said above, the back-quoted
text notation (
`hello`
) can be used to easily specify
a list of character codes. The 0'c
notation can be used to
specify a single character code.
- Atoms
- are identifiers. They are typically used in cases where
identity comparison is the main operation and that are typically not
composed nor taken apart. Examples are RDF resources (URIs that identify
something), system identifiers (e.g.,
'Boeing 747'
), but
also individual words in a natural language processing system. They are
also used where other languages would use enumerated types,
such as the names of days in the week. Unlike enumerated types, Prolog
atoms do not form a fixed set and the same atom can represent different
things in different contexts.
- Strings
- typically represents text that is processed as a unit most of the time,
but which is not an identifier for something. Format specifications for
format/3
is a good example. Another example is a descriptive text provided in an
application. Strings may be composed and decomposed using e.g., string_concat/3
and sub_string/5
or converted for parsing using string_codes/2
or created from codes generated by a generative grammar rule, also using string_codes/2.