All predicatesShow sourcegitty_driver_files.pl -- Gitty plain files driver

This version of the driver uses plain files to store the gitty data. It consists of a nested directory structure with files named after the hash. Objects and hash computation is the same as for git. The heads (files) are computed on startup by scanning all objects. There is a file ref/head that is updated if a head is updated. Other clients can watch this file and update their notion of the head. This implies that the store can handle multiple clients that can access a shared file system, optionally shared using NFS from different machines.

The store is simple and robust. The main disadvantages are long startup times as the store holds more objects and relatively high disk usage due to rounding the small objects to disk allocation units.

bug
- Shared access does not work on Windows.
Source gitty_open(+Store, +Options) is det
Driver specific initialization. Handles setting up a Redis connection when requested. This processes:
redis(+DB)
Name of the redis DB to connect to. See redis_server/3.
redis_ro(+DB)
Read-only redis DB.
redis_prefix(+Prefix)
Prefix for all keys. This can be used to host multiple SWISH servers on the same redis cluster. Default is swish.
Source gitty_close(+Store) is det
Close resources associated with a store.
Source gitty_file(+Store, ?File, ?Ext, ?Head) is nondet
True when File entry in the gitty store and Head is the HEAD revision.
Source load_plain_commit(+Store, +Hash, -Meta:dict) is semidet
Load the commit data as a dict. Loaded commits are cached in commit/3. Note that only adding a fact to the cache is synchronized. This means that during a race situation we may load the same object multiple times from disk, but this is harmless while a lock around the whole predicate serializes loading different objects, which is not needed.
Source store_object(+Store, +Hash, +Header:string, +Data:string) is det
Store the actual object. The store must associate Hash with the concatenation of Hdr and Data.
Source store_object_raw(+Store, +Hash, +Bytes:string, -New) is det[private]
Store an object from raw bytes. This is used for replicating objects.
Source load_object(+Store, +Hash, -Data, -Type, -Size) is det
Load the given object.
Source has_object(+Store, +Hash) is det[private]
True when Hash exists in store.
Source load_object_raw(+Store, +Hash, -Data)[private]
Load the compressed data for an object. Intended for replication.
Source object_bytes(+Type, +Size, +Data, -Bytes) is det[private]
Encode an object with the given parameters in memory.
Source load_object_header(+Store, +Hash, -Type, -Size) is det[private]
Load the header of an object
Source gitty_rescan(?Store) is det
Update our view of the shared storage for all stores matching Store.
Source gitty_scan(+Store) is det[private]
Scan gitty store for files (entries), filling head/3. This is performed lazily at first access to the store.

@tdb Possibly we need to maintain a cached version of this index to avoid having to open all objects of the gitty store.

Source read_heads_from_objects(+Store) is det[private]
Establish the head(Store,File,Ext,Hash) relation by reading all objects and adding a fact for the most recent commit.
Source gitty_scan_latest(+Store)[private]
Scans the gitty store, extracting the latest version of each named entry.
Source gitty_hash(+Store, ?Hash) is nondet
True when Hash is an object in the store.
Source delete_object(+Store, +Hash)
Delete an existing object
Source gitty_object_file(+Store, +Hash, -Path) is det
True when Path is the file at which the object with Hash is stored.
Source gitty_update_head(+Store, +Name, +OldCommit, +NewCommit, +DataHash) is det
Update the head of a gitty store for Name. OldCommit is the current head and NewCommit is the new head. If Name is created, and thus there is no head, OldCommit must be -.

This operation can fail because another writer has updated the head. This can both be in-process or another process.

Source remote_updates(+Store)[private]
Watch for remote updates to the store. We only do this if we did not do so the last second.
Source remote_updates(+Store, -List) is det[private]
Find updates from other gitties on the same filesystem. Note that we have to push/pop the input context to avoid creating a notion of an input context which possibly relate messages incorrectly to the sync file.
Source restore_heads_from_remote(Store)[private]
Restore the known heads by reading the remote sync file.
Source delete_head(+Store, +Head) is det
Delete Head from Store. Used by gitty_fsck/1 to remove heads that have no commits. Should we forward this to remotes, or should they do their own thing?
Source set_head(+Store, +File, +Hash) is det
Set the head of the given File to Hash
Source repack_objects(+Store, +Options) is det[multifile]
Repack objects of Store for reduced disk usage and enhanced performance. By default this picks up all file objects of the store and all existing small pack files. Options:
small_pack(+Bytes)
Consider all packs with less than Bytes as small and repack them. Default 10Mb
min_files(+Count)
Do not repack if there are less than Count new files. Default 1,000.
Source pack_objects(+Store, +Objects, +Packs, +PackDir, -PackFile, +Options) is det
Pack the given objects and pack files into a new pack.
Source add_file(+Out, +Store, +Object) is det[private]
Add Object from Store to the pack stream Out.
Source gitty_fsck(+Store) is det
Validate all packs associated with Store
Source fsck_pack(+File) is det
Validate the integrity of the pack file File.
Source gitty_attach_packs(+Store) is det[private]
Attach all packs for Store
Source attach_pack(+Store, +PackFile)
Load the index of Pack into memory.
Source detach_pack(+Store, +Pack) is det[private]
Remove a pack file from the memory index.
Source load_object_from_pack(+Hash, -Data, -Type, -Size) is semidet
True when Hash is in a pack and can be loaded.
Source unpack_packs(+Store) is det[multifile]
Unpack all packs.
Source unpack_pack(+Store, +Pack) is det
Turn a pack back into a plain object files
Source remove_objects_after_pack(+Store, +Objects, +Options) is det[private]
Remove the indicated (file) objects from Store.
Source remove_repacked_packs(+Store, +Packs, +Options)[private]
Remove packs that have been repacked.
Source prune_empty_directories(+Dir) is det[private]
Prune directories that are empty below Dir. Dir itself is not removed, even if it is empty.
Source redis_file(+Store, ?Name, ?Ext, ?Hash)[private]
Source redis_ensure_heads(+Store)[private]
Ensure the redis db contains a hashmap mapping all file names to their head hashes.
Source redis_update_head(+Store, +Name, +OldCommit, +NewCommit, +DataHash)[private]
Source redis_delete_head(Store, Head) is det[private]
Unregister Head
Source redis_set_head(+Store, +File, +Hash) is det[private]
Source redis_replicate_get(+Store, +Hash)[private]
Try to get an object from another SWISH server in the network. We implement replication using the PUB/SUB protocol of Redis. This is not ideal as this route of the synchronisation is only used if for some reason this server lacks some object. This typically happens if this node is new to the cluster or has been offline for a long time. In a large cluster, most nodes will have the objects and each of them will send the object around. A consumer group based solution is not ideal either, as the message may be picked up by a node that does not have this object, after which we need the failure recovery protocol to get it right. This is particularly the case with two nodes, where we have a fair chance to have be requested for the hash we miss ourselves.

We could improve on this two ways: (1) put the hash published in a short-lived key on Redis and make others check that. That is likely to avoid many nodes sending the same object or (2) see how many nodes are in the pool and switch to a consumer group based approach if this number is high (and thus we are unlikely to be asked ourselves for the missing hash).

See also
- publish_objects/2 for the incremental replication
Source redis_replicate_get(+Store, +Hash) is semidet[private]
True to get Hash if we do not have it locally. This initiates a Redis discover request for the hash. The replies are picked up by gitty_message/1 above.

The code may be subject to various race conditions, but fortunately objects are immutable. It also seems possible that the Redis stream gets lost. Not sure when and how. For now, we restart if we get no reply, but nore more than once per minute.

Source publish_objects(+Store, +Hashes)[private]
Make the objects we just stored globally known. These are added to the Redis stream gitty:replicate and received by replicate/1 below.

This realized eager replication as opposed to the above code (redis_replicate_get/2) which performs lazy replication. Eager replication ensure the object is on multiple places in the event that the node on which it was saved dies shortly after.

Note that we also receive the object we just saved. That is unavoidable in a network where all nodes are equal.

Source replicate(+Data) is det[private]
Act on a message send to the gitty:replicate stream. Add the object to our store unless we already have it. Note that we receive our own objects as well.
Source redis_hcas(+DB, +Hash, +Key, +Old, +New) is semidet[private]
Update Hash.Key to New provided the current value is Old.