file-o-bot
Problem
You have file objects in separate forms of storage (internal SSD, SFTP, S3, etc), but want to have a centralized directory to access all those files. You want to be able to shift these files across these storage solutions automatically.
Hello file-o-bot
This script creates a folder with a huge list of soft links to every object across your storage mediums. You can also define "lifetimes" for each object in each form of storage such that it is slowly "downgraded" in storage priority as it becomes less and less accessed. This allows you to move those files transparently from faster storage to cheaper archive storage.
The entire system relies on the infamous atime
parameter for POSIX-compliant filesystems which records the last time a file has been recorded. By calculating atime
values with find
, we can quickly filter files to be moved between paths.
file-o-bot is inspired by the Autoclass feature of Google Cloud Storage buckets. file-o-bot is merely a redneck version of it.
Limitations
atime
is disabled on most Linux systems because of it's performance impact to disk performance. The filesystem attempts a write every time it wants to record a newatime
value, leading to 1000s of writes and reads that cripple the disk. Want to check if your filesystem supports this? Run$ mount | grep your_mount_path
.- The other option for modern Linux systems is the
relatime
value.relatime
only updates the access time if the value is older than modified or status change time. It still provides an inaccurate access time value though. - This requires a POSIX-compliant storage path for each of the storage mediums, which means you have to configure the mounting separately and maintain them accordingly. There is no fail-safe here.
Installation
- Clone this repo and change to the directory.
- Run the install script.
$ bash ./install.sh
Docker Support
Storage aliases are still in progress, but that should enable file-o-bot to accurately link the files.
FAQ
What happens if files/folder structures overlap across my storage mediums?
Movements still apply to the files.
Storage paths are sorted via the sort
utility and the preceding paths have higher priority over subsequent ones. All storage paths are "merged"
For overlapping files, the file in the highest priority storage would be chosen to be soft-linked.
What kind of file protections do I have when executing movements?
None. The files are transferred using rsync but that's about it.
Why soft links?
Soft links work over network file shares like NFS, Samba, etc but hard links do not. Do note that soft links also have their program-breaking limitations on certain filesystems.
This is madness.
Yes.
Future Plans
Yeah I'm thinking of actually converting this into a proper service of sorts written in a big-boy language, but this is where we are at the moment.