The C++ API remains a work in progress.
SWI-Prolog string handling has evolved over time. The functions that
create atoms or strings using char*
or wchar_t*
are “old school” ; similarly with functions that get the
string as
char*
or wchar_t*
. The PL_get,unify,put_[nw]chars()
family is more friendly when it comes to different input, output,
encoding and exception handling.
Roughly, the modern API is PL_get_nchars(), PL_unify_chars() and PL_put_chars() on terms. There is only half of the API for atoms as PL_new_atom_mbchars() and PL-atom_mbchars(), which take an encoding, length and char*.
For return values, char*
is dangerous because it can
point to local or stack memory. For this reason, wherever possible, the
C++ API returns a std::string
, which contains a copy of the
string. This can be slightly less efficient that returning a
char*
, but it avoids some subtle and pervasive bugs that
even address sanitizers can't detect.23If
we wish to minimize the overhead of passing strings, this can be done by
passing in a pointer to a string rather than returning a string value;
but this is more cumbersome and modern compilers can often optimize the
code to avoid copying the return value.
Some functions require allocating string space using PL_STRINGS_MARK().
The PlStringBuffers
class provides a RAII wrapper
that ensures the matching PL_STRINGS_RELEASE() is done. The PlAtom
or PlTerm
member functions that need the string buffer use PlStringBuffers
,
and then copy the resulting string to a std::string
value.
The C++ API has functions such as PlTerm::get_nchars()
that use
PlStringBuffers
and then copy the result to a
std::string
result, so the programmer often doesn't need to
use PlStringBuffers
.
BUF_STACK
. This isn't needed if you use a method such as
PlTerm::as_string(), but
is needed for calling certain PL_*() or Plx_*() wrapped functions.
The constructor calls PL_STRINGS_MARK() and the destructor calls PL_STRINGS_RELEASE(). Here is an example of its use, for writing an atom to a stream, using Plx_atom_wchars(), which must be called within a strings buffer:
PREDICATE(w_atom_cpp, 2) { auto stream(A1), term(A2); PlStream strm(stream, STIO_OUTPUT); PlStringBuffers _string_buffers; const pl_wchar_t *sa = Plx_atom_wchars(term.as_atom().unwrap(), nullptr); strm.printfX("/%Ws/", sa); return true; }
PlStream
can be used to get a stream from a Prolog term,
or to lock the stream so that other threads cannot interleave their
output. With either usage, PlStream
is a RAII
class that ensure the matchin PL_release_stream() is done, and
also handles some subtle problems with C++ exceptions.
The methods are:
PlStream
object to an invalid stream (see PlStream::check_stream()).IOSTREAM*
, PlStream
is implicitly converted to IOSTREAM*
.PlStream
object contains a valid stream and throws an
exception if it doesn't. This is used to ensure that PlStream::release()
hasn't been called.
Most of the stream I/O functions have corresponding methods in PlStream
.
For example, Sfprintf() corresponds to
PlStream::printf(). PlStream::seek() and PlStream::tell()
call
Sseek64() and Stell64() instead of long
(they
are also deprecated: PlStream::seek64() and PlStream::tell64()
are preferred).
The C interface to stream I/O doesn't raise a Prolog error when
there's a stream error (typically indicated by a -1 return code).
Instead, the error sets a flag on the stream and
PL_release_stream() creates the error term. The
PlStream
destructor calls PL_release_stream(); but
it's a fatal error in C++ to raise an exception in a destructor if the
destructor is invoked by stack-unwinding due to another exception,
including the pseudo-exceptions PlFail
and
PlExceptionFail
.
To get around this, the various stream I/O functions have wrapper
methods in the PlStream
class that check for an error and
call PlStream::release()
to create the Prolog error, which is thrown as a C++ error.
The destructor calls PlStream::release(), which throws a C++ exception if there is a stream error. This is outside the destructor, so it is safe - the destructor checks if the stream has been released and does nothing in that situation.
The following two code examples do essentially the same thing:
PREDICATE(name_arity, 1) { PlStream strm(Scurrent_output); strm.printf("name = %s, arity = %zd\n", A1.name().as_string().c_str(), A1.arity()); return true; }
PREDICATE(name_arity, 1) { PlStream strm(Scurrent_output); try { strm.printf("name = %s, arity = %zd\n", A1.name().as_string().c_str(), A1.arity()); } PREDICATE_CATCH({strm.release(); return false;}) return true; }
If you write the code as follows, using Sfprintf() directly, it is possible that a fatal exception will be raised on an I/O error:
PREDICATE(name_arity, 1) { PlStream strm(Scurrent_output); Sfprintf(strm, "name = %s, arity = %zd\n", A1.name().as_string().c_str(), A1.arity()); return true; // WARNING: the PlStream destructor might throw a C++ // exception on stack unwinding, giving a fatal // fatal runtime exception. }
If you don't use these, and want to throw an exception if there's an
error, the following code works because PlStream
(and the
underlying PL_acquire_stream()) can be called recursively:
{ PlStream strm(...); strm.release(); }
Many of the “opaque object handles” , such as atom_t
,
term_t
, and functor_t
are integers.24Typically uintptr_t
values, which the C standard defines as “an unsigned integer type
with the property that any valid pointer to void can be converted to
this type, then converted back to pointer to void, and the result will
compare equal to the original pointer.’ As such,
there is no compile-time detection of passing the wrong handle to a
function.
This leads to a problem with classes such as PlTerm
-
C++ overloading cannot be used to distinguish, for example, creating a
term from an atom versus creating a term from an integer. There are a
number of possible solutions, including:
struct
instead of an
integer.It is impractical to change the C code, both because of the amount of edits that would be required and also because of the possibility that the changes would inhibit some optimizations.
There isn't much difference between subclasses versus tags; but as a matter of design, it's better to specify things as constants than as (theoretically) variables, so the decision was to use subclasses.