2.18.1 Wide character encodings on streams
All Application Manual Name SummaryHelp

  • Documentation
    • Reference manual
      • Overview
        • Wide character support
          • Wide character encodings on streams
            • BOM: Byte Order Mark
    • Packages

2.18.1.1 BOM: Byte Order Mark

From section 2.18.1, you may have got the impression that text files are complicated. This section deals with a related topic, making life often easier for the user, but providing another worry to the programmer. BOM or Byte Order Marker is a technique for identifying Unicode text files as well as the encoding they use. Such files start with the Unicode character 0xFEFF, a non-breaking, zero-width space character. This is a pretty unique sequence that is not likely to be the start of a non-Unicode file and uniquely distinguishes the various Unicode file formats. As it is a zero-width blank, it even doesn't produce any output. This solves all problems, or ... Some formats start off as US-ASCII and may contain some encoding mark to switch to UTF-8, such as the encoding="UTF-8" in an XML header. Such formats often explicitly forbid the use of a UTF-8 BOM. In other cases there is additional information revealing the encoding, making the use of a BOM redundant or even illegal.

The BOM is handled by SWI-Prolog open/4 predicate. By default, text files are probed for the BOM when opened for reading. If a BOM is found, the encoding is set accordingly and the property bom(true) is available through stream_property/2. When opening a file for writing, writing a BOM can be requested using the option bom(true) with open/4.