Running more than one 'syncoid' at the same time to the same host resulted in two jobs referencing the same socket.
Often results in "already exists, disabling multiplexing" but has in more than one occasion failed with the following:
ControlSocket /tmp/syncoid-zfsbackup-zfsbackup@10.0.0.1-1649107066 already exists, disabling multiplexing
lzop: Inappropriate ioctl for device: <stdin>
CRITICAL ERROR: ssh -S /tmp/syncoid-zfsbackup-zfsbackup@10.0.0.1-1649107066 zfsbackup@10.0.0.1 ' zfs send -I '"'"'pool/office'"'"'@'"'"'autosnap_2022-04-04_21:00:00_frequently'"'"' '"'"'pool/office'"'"'@'"'"'autosnap_2022-04-04_21:15:00_frequently'"'"' | lzop | mbuffer -R 5m -q -s 128k -m 16M 2>/dev/null' | mbuffer -q -s 128k -m 16M 2>/dev/null | lzop -dfc | pv -s 18356312 | zfs receive -s -F 'zfs-pool/vault/office' 2>&1 failed: 256 at /usr/sbin/syncoid line 786.
Sample use-case:
Using Monit, Cron, or some other scheduler to trigger more than syncoid to the same host to sync two datasets.
Stagger the sync so that no two jobs get started at the same time, or add some form of randomization to the socket name so that two jobs may start at the same time.
Extends syncoid remote capabilities to match that of ssh as closely as
possible: allow a remote dataset to be specified without a username.
- Detect if a remote reference is possible by looking for a : before
any / characters.
- Check if there are any pool names that might conflict with this
name. E.g., 'weird:symbol/ds' might refer to the pool "symbol" on
host "weird", and dataset ds. OR it might refer to the local pool
"weird:symbol" and dataset ds.
- Prefer local pools, matching existing behavior. No preexisting
functioning configurations will break.
- The name of the control socket is changed slightly.
- A bug in the handling of remote datasets with colons in the name
is addressed.
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
The `--sshkey` option specifies the ssh key, *not* the ssh public key. Empirically, it fails if you point it at a public key and works if you point it at the private part. Make sense since it's being used as the arg to `-i` in the `ssh` command line.
Change getzfsvalue() so that if called in array context, returns the
value and the error as a two element list. This allows the caller to
recieve the error from the zfs command.
If the dataset went away before fetching the sanoid:sync property, just
issue a warning and skip it.
If the source dataset dissappeared before we were able to get the
sanoid:sync property, do not set an error exit code. Handle this
gracefully as this should not be considered an error.
Fixes#380