12.4.10 BLOBS: Using atoms to store arbitrary binary data
All Application Manual Name SummaryHelp

  • Documentation
    • Reference manual
      • Foreign Language Interface
        • The Foreign Include File
          • BLOBS: Using atoms to store arbitrary binary data
            • Defining a BLOB type
              • PL_unregister_blob_type()
              • PL_register_blob_type()
            • Accessing blobs
            • Considerations for non-C code
    • Packages

12.4.10.1 Defining a BLOB type

The type PL_blob_t represents a structure with the layout displayed below. The structure contains additional fields at the ... for internal bookkeeping as well as future extensions.

typedef struct PL_blob_t
{ uintptr_t     magic;          /* PL_BLOB_MAGIC */
  uintptr_t     flags;          /* Bitwise or of PL_BLOB_* */
  const char *  name;           /* name of the type */
  int           (*release)(atom_t a);
  int           (*compare)(atom_t a, atom_t b);
  int           (*write)(IOSTREAM *s, atom_t a, int flags);
  void          (*acquire)(atom_t a);
  int           (*save)(atom_t a, IOSTREAM *s);
  atom_t        (*load)(IOSTREAM *s);
  ...
} PL_blob_t;

For each type, exactly one such structure should be allocated and must not be moved because the address of the structure determines the blob's "type". Its first field must be initialised to PL_BLOB_MAGIC. If a blob type is registered from a loadable object (shared object or DLL) the blob type must be deregistered using PL_unregister_blob_type() before the object may be released.

The flags is a bitwise or of the following constants:

PL_BLOB_TEXT
If specified, the blob is assumed to contain text and is considered a normal Prolog atom. The (currently) two predefined blob types that represent atoms have this flag set. User-defined blobs may not specify this, even if they contain only text. Applications should not use the blob API to create normal text atoms or get access to the text represented by normal text atoms. Most applications should use PL_get_nchars() and PL_unify_chars() to get text from Prolog terms or create Prolog terms that represent text.
PL_BLOB_UNIQUE
If specified the system ensures that the blob-handle is a unique reference for a blob with the given type, length and content. If this flag is not specified, each lookup creates a new blob. Uniqueness is determined by comparing the bytes in the blobs unless PL_BLOB_NOCOPY is also specified, in which case the pointers are compared. Note that the lookup does not use the blob's compare function when testing for equality, but only tests the bytes; this means that terms from the recorded database or C++-style strings will typically not compare as equal when doing blob lookup.
PL_BLOB_NOCOPY
By default the content of the blob is copied. Using this flag the blob references the external data directly. The user must ensure the provided pointer is valid as long as the atom lives. If PL_BLOB_UNIQUE is also specified, uniqueness is determined by comparing the pointer rather than the data pointed at. Using PL_BLOB_UNIQUE|PL_BLOB_NOCOPY can be used to make a blob reference an arbitrary pointer where the pointer data may be reclaimed in the release() handler.
PL_BLOB_WCHAR
If PL_BLOB_TEXT is also set, then the text is made up of pl_wchar_t items and the blob's lenght is the number of bytes (that is, the number of characters times sizeof(pl_wchar_t)). As PL_BLOB_TEXT, this flag should not be set in user-defined blobs.

The name field represents the type name as available to Prolog. See also current_blob/2. The other fields are function pointers that must be initialised to proper functions or NULL to get the default behaviour of built-in atoms. Below are the defined member functions:

void acquire(atom_t a)
Called if a new blob of this type is created through PL_put_blob(), PL_unify_blob(), or PL_new_blob(). Note this this call is done as part of creating the blob. The call to PL_unify_blob() may fail if the unification fails or cannot be completed due to a resource error. PL_put_blob() has no such error conditions. This callback is typically used to store the atom_t handle into the content of the blob. Given a pointer to the content, we can now use PL_unify_atom() to bind a Prolog term with a reference to the pointed to object. If the content of the blob can be modified (PL_BLOB_UNIQUE is not present) this is the only way to get access to the atom_t handle that belongs to this blob. If PL_BLOB_UNIQUE is provided and respected, PL_unify_blob() given the same pointer and length will produce the same atom_t handle.
int release(atom_t a)
The blob (atom) a is about to be released. The release() function is called when the atom is reclaimed by the atom garbage collector, when an explicit call to PL_free_blob() is made or during shutdown of Prolog. This function can retrieve the data of the blob using PL_blob_data(). If the release() function returns FALSE, the atom garbage collector will not reclaim the atom. For critical resources such as file handles or significant memory resources, it may be desirable to have an explicit call to dispose (most of) the resources. For example, close/1 reclaims the file handle and most of the resources associated with a stream, leaving only a tiny bit of content to the garbage collector. See also setup_call_cleanup/3.

The release() callback is called in the context of the thread executing the atom garbage collect, the thread executing PL_free_blob() or the thread initiating the shutdown. Normally the thread gc runs all atom and clause garbage collections. The release() function may not call any of the PL_*() functions except for PL_blob_data() or PL_unregister_atom() to unregister other atoms that are part data associated to the blob. Calling any of the other PL_* functions may result in deadlocks or crashes. The release() function should not call any potentially slow or blocking functions as this may cause serious slowdowns in the rest of the system.

Blobs that require cleanup that is slow, blocking or requires calling Prolog must pass the data to be cleaned to another thread. Be aware that if the blob uses PL_BLOB_NOCOPY the user is responsible for discarding the data, otherwise the atom garbage collector will free the data.

As SWI-Prolog atom garbage collector is conservative, there is no guarantee that the release() function will ever be called. If it is important to clean up some resource, there should be an explicit predicate for doing that, and calling that predicate should be guaranteed by using setup_call_cleanup/3 or some a process finalization hook such as at_halt/1.

Normally, Prolog does not clean memory during shutdown. It does so on an explicit call to PL_cleanup().225Or if the system is compiled with the cmake build type Debug. In such a situation, there is no guarantee of the order in which atoms are released; if a blob contains an atom (or another blob), those atoms (or blobs) may have already been released. See also PL_blob_data().

int compare(atom_t a, atom_t b)
Compare the blobs a and b, both of which are of the type associated to this blob type. Return values are as memcmp(): < 0 if a is less than b, = 0 if both are equal, and > 0 otherwise. The default implementation is a bitwise comparison of the blobs’contents. This default implementation suffices if PL_BLOB_UNIQUE is set and the blob follows the requirement that its contents do not change, although it might give an unexpected ordering, and the ordering may change if the blob is saved and restored using save_program/2.

If the compare() function is defined, the sort/2 predicate uses that to determine if two blobs are equal and only keeps one of them. This can cause unexpected results with blobs that are actually different; if you cannot guarantee that the blobs all have unique contents, then you should incorporate the blob address (the system guarantees that blobs are not shifted in memory after they are allocated). This function should not call any PL_*() functions other than PL_blob_data().

The following minimal compare function gives a stable total ordering:

static int
compare_my_blob(atom_t a, atom_t b)
{ const struct my_blob_data *blob_a = PL_blob_data(a, NULL, NULL);
  const struct my_blob_data *blob_b = PL_blob_data(b, NULL, NULL);
  return (blob_a < blob_b) ? -1 : (blob_a > blob_b) ? 1 : 0;
}
int write(IOSTREAM *s, atom_t a, int flags)
Write the content of the blob a to the stream s respecting the flags. The return value is TRUE or FALSE and does not follow the Unix convention of the number of bytes (where zero is possible) and negative for errors. Any I/O operations to s are in the context of a PL_acquire_stream(); upon return, the PL_release_stream() handles any errors, so it is safe to not check return codes from Sfprintf(), etc.

In general, the output from the write() callback should be minimal. If you wish to output more debug information, it is suggested that you either add a debug option to your "open" predicate to output more information, or provide a "properties" predicate. A typical implementation is:

static int write_my_blob(IOSTREAM *s, atom_t symbol, int flags)
{ (void)flags; /* unused */
  Sfprintf(s, "<my_blob>(%p)", PL_blob_data(symbol, NULL, NULL));
  return TRUE;
}

The flags are a bitwise or of zero or more of the PL_WRT_* flags that were passed in to the calling PL_write_term() that called write(), and are defined in SWI-Prolog.h. The flags do not have the PL_WRT_NEWLINE bit set, so it is safe to call PL_write_term() and there is no need for writing a trailing newline. This prototype is available if the SWI-Stream.h is included before SWI-Prolog.h. This function can retrieve the data of the blob using PL_blob_data().

Most blobs reference some external data identified by a pointer and the write() function writes <type>(address). If this function is not provided, write/1 emits the content of the blob for blobs of type PL_BLOB_TEXT or a string of the format <#hex data> for binary blobs.

int save(atom_t a, IOSTREAM *s)
Write the blob to stream s, in an opaque form that is known only to the blob. If a “save” function is not provided (that is, the field is NULL), the default implementation saves and restores the blob as if it is an array of bytes which may contain null (’
0’
) bytes.

SWI-Stream.h defines a number of PL_qlf_put_*() functions that write data in a machine-independent form that can be read by the corresponding PL_qlf_get_*() functions.

If the “save” function encounters an error, it should call PL_warning(), raise an exception (see PL_raise_exception()), and return FALSE.226Details are subject to change. Note that failure to save/restore a blob makes it impossible to compile a file that contains such a blob using qcompile/2 as well as creating a saved state from a program that contains such a blob impossible. Here, contains means that the blob appears in a clause or directive.

atom_t load(IOSTREAM *s)
Read the blob from its saved form as written by the “save” function of the same blob type. If this cannot be done (e.g., a stream read failure or a corrupted external form), the “load” function should call PL_warning(), then PL_fatal_error(), and return constFALSE.227Details are subject to change; see the “save” function. If a “load” function is not provided (that is, the field is NULL, the default implementation assumes that the blob was written by the default “save” - that is, as an array of bytes

SWI-Stream.h defines a number of PL_qlf_get_*() functions that read data in a machine-independent form, as written by the by the corresponding PL_qlf_put_*() functions.

The atom that the “load” function returns can be created using PL_new_blob().

bool PL_unregister_blob_type(PL_blob_t *type)
Unlink the blob type from the registered type and transform the type of possible living blobs to unregistered, avoiding further reference to the type structure, functions referred by it, as well as the data. This function returns TRUE if no blobs of this type existed and FALSE otherwise. PL_unregister_blob_type() is intended for the uninstall() hook of foreign modules, avoiding further references to the module.
int PL_register_blob_type(PL_blob_t *type)
This function does not need to be called explicitly. It is called if needed when a blob is created by PL_unify_blob(), PL_put_blob(), or PL_new_blob().