Compare commits

...

6 Commits

Author SHA1 Message Date
8c77d65341 Added section about relatime 2024-06-25 23:14:14 +08:00
3a0a411586 Update README 2024-06-25 08:37:37 +08:00
3d63f9c254 Fixed installations and xargs args. 2024-05-30 23:24:55 +08:00
c20053df10 Some installation updates 2024-05-30 23:16:18 +08:00
6298a324b0 Disable CentOS for now 2024-05-29 00:04:52 +08:00
d5a674a690 Smooth out installation stuff 2024-05-29 00:03:47 +08:00
5 changed files with 91 additions and 35 deletions

View File

@ -1,7 +1,6 @@
FROM debian:bookworm-slim
RUN apt update && apt upgrade -y && apt install rsync parallel jq
COPY . /app
WORKDIR /app
CMD bash /app/file-o-bot.sh
RUN bash ./install.sh
CMD bash /app/file-o-bot

View File

@ -1,16 +1,33 @@
# file-o-bot
### Problem
## Problem
You have file objects in separate forms of storage (internal SSD, SFTP, S3, etc), but want to have a centralized directory to access all those files. You want to be able to shift these files across these storage solutions automatically.
### Hello file-o-bot
## Hello file-o-bot
This script creates a folder with a huge list of soft links to every object across your storage mediums. You can also define "lifetimes" for each object in each form of storage such that it is slowly "downgraded" in storage priority as it becomes less and less accessed. This allows you to move those files transparently from faster storage to cheaper archive storage.
The entire system relies on the infamous `atime` parameter for POSIX-compliant filesystems which records the last time a file has been recorded. By calculating `atime` values with `find`, we can quickly filter files to be moved between paths.
file-o-bot is inspired by the Autoclass feature of Google Cloud Storage buckets. file-o-bot is merely a redneck version of it.
### FAQ
## Limitations
#### What happens if files/folder structures overlap across my storage mediums?
1. `atime` is disabled on most Linux systems because of it's performance impact to disk performance. The filesystem attempts a write every time it wants to record a new `atime` value, leading to 1000s of writes and reads that cripple the disk. Want to check if your filesystem supports this? Run `$ mount | grep your_mount_path`.
1. The other option for modern Linux systems is the `relatime` value. `relatime` only updates the access time if the value is older than modified or status change time. It still provides an inaccurate access time value though.
1. This requires a POSIX-compliant storage path for each of the storage mediums, which means you have to configure the mounting separately and maintain them accordingly. There is no fail-safe here.
## Installation
1. Clone this repo and change to the directory.
1. Run the install script. `$ bash ./install.sh`
### Docker Support
Storage aliases are still in progress, but that should enable file-o-bot to accurately link the files.
## FAQ
### What happens if files/folder structures overlap across my storage mediums?
Movements still apply to the files.
@ -18,10 +35,18 @@ Storage paths are sorted via the `sort` utility and the preceding paths have hig
For overlapping files, the file in the highest priority storage would be chosen to be soft-linked.
#### What kind of file protections do I have when executing movements?
### What kind of file protections do I have when executing movements?
None. The files are transferred using rsync but that's about it.
#### This is madness.
### Why soft links?
Soft links work over network file shares like NFS, Samba, etc but hard links do not. Do note that soft links also have their program-breaking limitations on certain filesystems.
### This is madness.
Yes.
## Future Plans
Yeah I'm thinking of actually converting this into a proper service of sorts written in a big-boy language, but this is where we are at the moment.

52
file-o-bot Normal file → Executable file
View File

@ -2,7 +2,12 @@
set -euo
DEFAULT_CONFIG_PATH=.
DEFAULT_CONFIG_PATH=/etc/file-o-bot
if ! [[ -d $DEFAULT_CONFIG_PATH ]]; then
printf "Config path not found. Exiting."
exit 1
fi
# Read default configs from default config file
. "${DEFAULT_CONFIG_PATH}"/default-config.ini
@ -36,12 +41,30 @@ if ! [[ -f ${MOVEMENT_JSON_PATH} ]]; then
exit 1
fi
# Compile storage paths from all the movements
readarray -t STORAGE_PATHS \
< <( jq -r '.movements[] | .sourcePath, .destinationPath' "${MOVEMENT_JSON_PATH}" | xargs -I {} readlink -e "{}" | sort --unique )
STORAGE_PATHS_LEN=${#STORAGE_PATHS[@]}
# Verify that all storage paths work
for ((i=0 ; i<${STORAGE_PATHS_LEN}; i++))
do
STORAGE_PATH=${STORAGE_PATHS[$i]}
if ! [[ -d $STORAGE_PATH ]]; then
printf "Storage path ${STORAGE_PATH} missing. Exiting.\n"
exit 1
fi
done
# Prepare path aliases
PATH_ALIASES_LENGTH=$( jq '.pathAliases | length' "${MOVEMENT_JSON_PATH}" )
# For each lifecycle rule, execute a movement using rsync
RULE_LENGTH=$( jq '.movements | length' "${MOVEMENT_JSON_PATH}" )
for (( i=0; i<$RULE_LENGTH ; i++ ))
do
SOURCE_PATH=$( jq -r ".movements[$i].sourcePath" "${MOVEMENT_JSON_PATH}" )
DESTINATION_PATH=$( jq -r ".movements[$i].destinationPath" "${MOVEMENT_JSON_PATH}" )
SOURCE_PATH=$( readlink -e $( jq -r ".movements[$i].sourcePath" "${MOVEMENT_JSON_PATH}" ) )
DESTINATION_PATH=$( readlink -e $( jq -r ".movements[$i].destinationPath" "${MOVEMENT_JSON_PATH}" ) )
AMINS=$( jq -r ".movements[$i].amins" "${MOVEMENT_JSON_PATH}" )
start_movement "$SOURCE_PATH" "$DESTINATION_PATH" "$AMINS"
done
@ -55,23 +78,14 @@ GREEN_PATH=$( readlink -f "${MAP_ROOT_DIRECTORY}"/green )
RED_PATH=$( readlink -f "${MAP_ROOT_DIRECTORY}"/red )
BLUE_PATH=$( readlink -f "${MAP_ROOT_DIRECTORY}"/blue )
if ! [[ -d ${GREEN_PATH} ]]; then
mkdir "${GREEN_PATH}"
mkdir -p "${GREEN_PATH}"
fi
if ! [[ -d ${RED_PATH} ]]; then
mkdir -p "${RED_PATH}"
fi
if ! [[ -d ${BLUE_PATH} ]]; then
mkdir -p "${BLUE_PATH}"
fi
# Compile storage paths from all the movements
readarray -t STORAGE_PATHS \
< <( jq -r '.movements[] | .sourcePath, .destinationPath' "${MOVEMENT_JSON_PATH}" | xargs -n 1 -I {} readlink -e "{}" | sort --unique )
STORAGE_PATHS_LEN=${#STORAGE_PATHS[@]}
# Verify that all storage paths work
for ((i=0 ; i<${STORAGE_PATHS_LEN}; i++))
do
STORAGE_PATH=${STORAGE_PATHS[$i]}
if ! [[ -d $STORAGE_PATH ]]; then
printf "Storage path ${STORAGE_PATH} missing. Exiting.\n"
exit 1
fi
done
# Build folder structure
printf '%s\0' "${STORAGE_PATHS[@]}" | \

View File

@ -2,18 +2,24 @@
"movements": [
{
"sourcePath": "./hdd",
"destinationPath": "./s3-bucket",
"destinationPath": "s3-bucket",
"amins": 1
},
{
"sourcePath": "./ssd",
"destinationPath": "./s3-bucket",
"sourcePath": "ssd",
"destinationPath": "s3-bucket",
"amins": 1
},
{
"sourcePath": "./s3-bucket",
"destinationPath": "./backblaze-b2",
"sourcePath": "s3-bucket",
"destinationPath": "backblaze-b2",
"amins": 1
}
],
"pathAliases" : [
"ssd": "./ssd",
"hhd": "./hdd",
"s3-bucket": "./file-o-bot-bucket",
"backblaze-b2": "./file-o-bot-rclone-b2"
]
}

16
install.sh Normal file → Executable file
View File

@ -2,6 +2,18 @@
set -euo
. /etc/os-release
if [[ $ID == "debian" ]]; then
apt update && apt-get install -y rsync jq parallel
elif [[ $ID_LIKE == "arch" ]]; then
pacman -Syyu && pacman -S parallel jq rsync
else
printf "Distro not found. Exiting."
exit 1
fi
mkdir -p /etc/file-o-bot/config.d
cd ./install-files/
cp -r * /etc/file-o-bot/
cp file-o-bot /usr/local/bin/file-o-bot
cp ./install-files/{default-config.ini,movement.json} /etc/file-o-bot/
cp ./install-files/config.ini /etc/file-o-bot/config.d/