Skip to main content

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive.
Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

I suppose the reason send/receive is not so instantly popular among Linux users starting with ZFS is because they had access to rsync for such a long time and it works so well that they just don't feel the need for another replication tool.

TIP: when rsync'ing to ZFS use --in-place to improve performance

The way ZFS send/receive works is by selecting just the changed blocks between two snapshot. It does not have to walk the filesystem tree and compute, exchange and compare hashes, sizes, timestamps and so on like rsync does which means it is extremely efficient. Also, excluding compression, it is more bandwidth efficient than rsync because the involved parties do not have to exchange file lists.

While you could script zfs send/receive manually, the fastest way to replicate one pool or filesystem to another is by using one of the many tools building upon it. Probably the best tool for this job is syncoid (which is a part of sanoid).

Caveat: syncoid requires that target filesystem does not exist: it will complain and refuse to work if it does. It is best to start with an empty target or replicate one filesystem at the time.

While not necessary it is a good idea to install the following extra packages to run syncoid at its full potential: mbuffer, lzop, pv, git. Ubuntu provides binaries for all of them, other distributions might not.

To use syncoid first clone the sanoid github repository:

git clone https://github.com/jimsalterjrs/sanoid.git

then cd into sanoid or copy syncoid to a directory in your path. Also I have found that syncoid attempts to use an SSH cypher that might not always be available. If that happens edit line 28 and leave the sshcipher option empty (there are probably ciphers more efficient than others for this purpose, but I'm not an expert. If you knwo of them let me know in the comments).

After that set up ssh for key-based autentication and then go ahead and try your first replication:

syncoid --recursive tank root@remotehost:tank

try adding --debug if you want to look under the hood.

Syncoid uses mbuffer to read large chunks of data into a memory buffer, compress it with lzop and transfer it over the network with ssh. The same happens on the receiving side, only in reversed order.

To replicate again re-run syncoid with the same options.

Comments

jorgeefrrr828 said…
It’s exhausting to search out knowledgeable folks on this topic, however you sound like you realize what you’re talking about! Thanks online casinos for us players

Popular posts from this blog

Mirth: recover space when mirthdb grows out of control

I was recently asked to recover a mirth instance whose embedded database had grown to fill all available space so this is just a note-to-self kind of post. Btw: the recovery, depending on db size and disk speed, is going to take long. The problem A 1.8 Mirth Connect instance was started, then forgotten (well neglected, actually). The user also forgot to setup pruning so the messages filled the embedded Derby database until it grew to fill all the available space on the disk. The SO is linux. The solution First of all: free some disk space so that the database can be started in embedded mode from the cli. You can also copy the whole mirth install to another server if you cannot free space. Depending on db size you will need a corresponding amount of space: in my case a 5GB db required around 2GB to start, process logs and then store the temp files during shrinking. Then open a shell as the user that mirth runs as (you're not running it as root, are you?) and cd in

How to automatically import a ZFS pool built on top of iSCSI devices with systemd

When using ZFS on top of iSCSI devices one needs to deal with the fact that iSCSI devices usually appear late in the boot process. ZFS on the other hand is loaded early and the iSCSI devices are not present at the time ZFS scans available devices for pools to import. This means that not all ZFS pools might be imported after the system has completed boot, even if the underlying devices are present and functional. A quick and dirty solution would be to run  zpool import <poolname> after boot, either manually or from cron. A better, more elegant solution is instead to hook into systemd events and trigger zpool import as soon as the devices are created.