Tomorrow will be my first day at the (online) Studienkolleg Mittelhessen. The time table is filled with courses, and, most importantly, in the Central European timezone. Some courses end at 9:00pm, which translates to a whopping 3:00am in my timezone (Tieling).
Therefore, in the near future, I see little chances of devoting any large portions of time into developing bieaz or testing new Root on ZFS installation guides.
Fortunately, bieaz can now be considered feature-complete and not much work is needed to keep it useful. Root on ZFS guides, written for Arch, Fedora, RHEL and NixOS, are also accepted by openzfs-docs repo and published.
At this point, I think my work in this area is done and switching to maintainence mode is the right choice. After all, I believe in an user experience where the user should spend their time doing things that matters, and the system should be as inconspicuous and unobtrusive as possible.
The following text serves as a reflection on the development of both bieaz and the Root on ZFS guides.
Fiddling with a self-hosted, networked storage has always been one of my favorite hobbies since 2015, the year I graduated from junior high school.
That year, my parents elected to move to a new home near the senior high, and I proposed to buy some computer parts and build a NAS. The primary purpose of this NAS was to become a new home for our family photos, which is about 500 GB in size.
The first several iterations of our home NAS involves Xpenology, running Synology DiskStation Manager on 3rd party systems. Synology, much like Apple, runs their business like a walled garden. The DSM experience will be fine if you only uses the officially supported packages. With their non-standard, modified Linux system, anything else would be a no-go and there's no guarantee that your system will survive the next update. Also, their recommended RAID layout, Synology Hybrid RAID, based on btrfs on LVM, is just too convoluted. To administrate such a system, the user will not only need to master a set of btrfs commands, another set of LVM commands and concepts are also needed. I just plain dislike this sort of complexity.
Note: I just discovered that, Xpenology stuck with the version I used back in 2017, DSM6. Which means that no one has cracked Synology's hardware authentication on newer versions yet, and these Xpenology boxes are vulnerable to unpatched security holes.
In short, out of security, reliability, legal and ethnical concerns, do not use cracked software!
A bit later I learned of the Free Software movement. As a consequence, I banished Synology from our home and replaced our only iPad Air 2013 with Samsung Galaxy Tab S 10.5 with LineageOS, of course. For NAS, I experimented with Rockstor, openmediavault, Proxmox and NAS4Free.
Rockstor uses btrfs, and in 2017, as it is today, there's still the problem regarding write-holes. Also, when I actually began using it, the command line utilities are unintuitive and hard to use. Also, development of Rockstor was not very active back then. One definitely does not want to base their newly-build NAS system on something that could go out of support tomorrow.
With openmediavault, which was based on Debian, ZFS support is provided by a third-party plugin. The actual zfs-dkms package is also pulled by this plugin during installation. The WebUI, the main selling point of using openmediavault instead of plain Debian, was not so robust or up-to-date with the latest ZFS features. In hindsight, having a WebUI itself was a security liability and only useful for users who are not comfortable with command line interface.
Proxmox, as a Debian-based hypervisor, orients towards virtualization, not storage management. The WebUI also demostrates this. To build a normal networked storage out of a pristine Proxmox installation, extensive configuration with command line was still required.
I also religiously followed the Debian Root on ZFS guide, attempting to succeed at performing such an installation. The goal was, besides familiarize myself with ZFS and manual Debian installation, also installing either Proxmox or openmediavault onto the resultant base system. Alas, due to some mysterious reasons, the resultant system was always broken in one way or another.
NAS4Free, now named XigmaNAS, is a NAS solution build on the foundation of FreeBSD. It was shipped with a sane combination of NAS applications such as BitTorrent, SMART monitoring, NFS share, Samba share, etc, and all configurable within a few clicks in WebUI. Also, XigmaNAS supports, but does not recommend, Root on ZFS installation and boot environment. It was here that I was introduced to the concept of BE. However, as it turned out, BE support on XigmaNAS via WebUI was less than perfect, compounded with my then limited knowledge, I often did not understand how to use the BE manager. Therefore, the built-in BE management did not save me when my attempts to shoehorn other services into the system failed.
Despite the setbacks on NAS4Free, I then decided that I have to do without those additional services and be content with the options provided by the WebUI. I was not doing too well with the exams at senior high, leaving my parents worrying about a jobless and impoverished future for me.
ZFS is, naturally, integrated to NAS4Free. I encountered ZFS on NAS4Free and, no server of mine will be without ZFS in the future. Clear and simple.
In June of 2018, I graduated from senior high and applied for an entrance to a university. Things didn't work out for me, as I got accepted to hotel management major, instead of compsci, which I had actually applied for.
So, in early November 2018, I was granted an one-year break (to Sept. 2019) by the university. When I got back home, the first thing I did was to rebuild the hardware with ECC support and experiment with Proxmox.
Coinciding with the move to Proxmox was the rollout of native IPv6 service in my area, as mandated by the government. This allowed me to build a small Nextcloud instance and ejabberd server on this machine, but not before figuring out how to do dynamic DNS for IPv6. For context, my internet provider only provides one private IPv4 address generated by carrier-grade NAT for each user.
Fast-forward a lot of configuration, I helped my family switched to ejabberd and Nextcloud for instant messaging and photo/contacts back up. The services are hosted in unpriviledged Linux Containers. ZFS is used as root filesystem, via an option in Proxmox installer.
With the switch to Proxmox, the need for additional services is satisfied, and continued support of ZFS is guaranteed.
Before the migration to a networked storage, and before the advent of various "clouds" (read: evil corps' computers), our family photos and transfers from miniDV were scattered between burned DVDs and portable hard disk drives.
When I first started out with Xpenology, a spare desktop computer was used. Later, it was upgraded to an Asrock J3455-ITX based system, to mimic the then-popular DS918 model.
The rebuild during the first one-year break saw the system upgraded to Supermicro X10SRi-F with Intel E5-2630Lv3 and 32GB of DDR4-RegECC RAM.
I was granted another one-year break starting from January 2020. During the Sept. 2019 to Jan. 2020 semester, I received some automated emails from the server, generated by ZFS Event Daemon, which reported errors in one disk. 1-to-2 SATA power extension cables were thought to be the culprit.
In short, do not touch custom build for a critical server, not even with a ten-foot pole.
So, in March 2020, after waiting for delivery for more than 2 months due to the coronavirus, the custom Supermicro build was replaced with a Dell R720xd server. No read errors anymore, but boy, were the fan noise and power consumption huge! The server sucks about 90 Watts with only one measly E5-2620v2 CPU, single 750W power supply and 6 HDDs installed, with the unbearable fans turned off, of course.
In June 2021, during the short-lived Chia cryptocoin craze, R720xd was sold for an inflated price and replaced by a quite, power efficient (25 Watt idle with 4 HDDs) Dell T20 server, with 8GB of DDR3 unregistered ECC memory.
In the above section. I mainly detailed how I was using ZFS on my server since the day I started using NAS4Free. I can't claim that I have any knowledge of ZFS internals, but, it's certain that I'm at least aware of some of the benefits of using ZFS. It was then reasonable, for me, to adopt ZFS for other uses.
On the other hand, I have also been using various Linux distributions since I got my first computer, Dell XPS 9360, after graduating senior high in mid 2018. Among other things, "Free as in Freedom", rock solid stability, transparency and predictability made GNU/Linux my operating-system of choice.
Like many others, I have used Windows before and choose Ubuntu as my first distro. Ubuntu 18.04, latest at that time, was the first version to adopt GNOME as default. Details of my first experience with this version are mostly forgoten, but one thing stands out: the spring animation in the dock, triggered when viewing all available applications, were quite sluggish. Owing to reasons, including this one, I switched to Ubuntu MATE.
Later, when I became more aware of Ubuntu's some more questionable practices, Debian 10 Buster was released. Finding that Debian, as a community, is more devoted to the spirit of Free Software, I made the jump to Debian.
At this point, I was also busy configuring the various services on my home server. A browser and a terminal emulator were sufficient for this purpose. Filesystem used for / was not of particular concern for me.
Nevertheless, among the options provided by the Debian installer shipped with the live image, btrfs appears to be the most advanced option. One btrfs partition to rule them all, no more separate partition for / and /home, and the prospect of rolling back catastrophic system failure (caused by myself) made btrfs an easy choice. Full disk encryption with LUKS1 and GRUB is also available in this installer as a simple option.
I was using GNOME shipped with Debian 10 then. The optimized spring animation was, though not perfect, tolerable in this release. I was using HP Elite x2 1012 G2 convertible by this point. The screen was high-res (3k) but small. On this screen, I got tired of constantly maximizing and minimizing windows.
What's the use of a bunch of over-lapping windows anyway? Better open every program maximized by default, and show the title bar as tabs on the top of the screen. As usual, surfing the web gave me the answer, something called tiling window manager would be the solution to all my troubles.
I was also aware of the acute shortcomings of X.Org display server in day-to-day usage, namely screen-tearing, poor touch support (I'm using a convertible) and no support for different DPI on multiple monitors. Wayland was, and still is, the only way forward, for me, and for the entire desktop ecosystem. Combining these requirements, the clear answer emerged: the Sway Window Manager.
For all its comprehensiveness, sway was too new a program to be in the Debian Buster repo. Unwilling to make my hand dirty by either manual installation or enable Debian Unstable, I looked to other distros instead. Debian, with its stable but never changing set of packages, is fine for servers, but not so much for desktop usage, I realized. Maybe this is the chance to jump to another more up-to-date distro.
Arch Linux meets the requirements and has some excellent community documentation in the form of ArchWiki. As a first time user, I installed Arch using the generic installation guide with the most basic of options: one FAT EFI system partition and one ext4 partition for the actual system. Nothing besides the base package, kernel and bootloader were installed. Of course I was not able to get WiFi working after reboot, and several more trips to live system were needed to get things into working order.
Installed, configured and launched sway. Everything's going smoothly, but wait! I'm using a portable computer with a rolling release model, where's the essential full-disk encryption and system rollback?
The next step was to switch root filesystem to btrfs:
- ZFS is still to much a hassle to configure and is not officially supported by Arch Linux
- I have failed to set up Debian Root on ZFS previously; despite the existence of official guide
- I was using btrfs on Debian before and it didn't cause me any trouble; and
- I can configure btrfs full-disk encryption with the proven Debian method
On 3rd, December 2020, after several nights (and midnights) I edited Snapper ArchWiki page to include a btrfs subvolume layout under the I2Oc9 account. It was considered unnecessarily complex by some, and reverted by the administrator. I then moved the work to a subpage belonging to my account. Here, I created two articles:
A bit of burn-out ensued. To alleviate this, I created a new account S0x9v and initially, continued to work on root on btrfs.
After using btrfs management commands, snapper, grub-btrfs and snap-pac first hand, I realized that, although achieving system rollback with the combined effort of grub-btrfs and snap-pac is possible, managing btrfs snapshots is just a painful experience and I find myself desperately searching for an excuse to let myself to switch to work on ZFS instead.
The excuse was not hard to find, write-hole, instability, a decade of development hell, you name it. The shortcomings eventually outweighted the difficulties given above. So, on 5th, December 2020, I published the first draft of the root on zfs guide.
It turned out that installing Arch with ZFS as root file system is not the most difficult part of the journey, implementing the combination of grub-btrfs and snap-pac is.
Allow me to summarize the purpose of grub-btrfs and snap-pac.
- snapper: manages btrfs snapshots; btrfs snapshot command is so painful to use that another utility has to be written for this purpose
- snap-pac: pacman hook; calls snapper to create pre/post snapshots and pass command arguments to snapper
- grub-btrfs: scan for new snapshots and add them to boot menu
Combining them, pacman would create system snapshots pre/post transaction and user would be able to rollback the system state, or better, enter an alternative system state, should the current state become faulted.
There are limitations to the btrfs approach, where the entry GRUB booted will be readonly. But the important takeaway is that the implementation for ZFS does not have such limitation.
So my task was, I thought, create a package, combining snap-pac and grub-btrfs, for Root on ZFS.
The resultant package was, rozb3-pac (11th Dec 2020), where rozb3 stands for root on zfs boot 3nvironment, a name I choose, hoping that, no one else will use such a prefix, then I can safely purge everything that begins with rozb3.
This is a single pacman hook that creates BEs. And purges BEs. And generates GRUB menu entries. Yes. A supremely stupid idea.
That's not where stupidity ends. In fact, I (poorly) version-controlled the thing not with git, but with MediaWiki article history!
Hindsight is 20/20, anyway. But at that time, I was burnt out again by using such an inefficient way to write the script, along with continouously testing it in a virtual machine. So I created yet another account, m0p.
After creating the account, from 11th to 27th, Dec. 2020, there weren't much activity on my wiki account. Now nine month later, I can't remember exactly what happened, but perhaps the burn-out was more severe than the previous episode.
On 12th, Dec. 2020, a repo for rozb3-pac was created, here's the initial commit.
I also realized the mistake of lumping so much into rozb3-pac, and created another repo, named bieaz, which is a homonym (sound-alike) to "BEs", boot environments. Initial commit was committed on 2st, Jan. 2021.
Well, here I was, two packages to maintain. rozb3-pac was reduced to a simple script, passing arguments to bieaz when needed. The focus was thus moved to bieaz, the main script where BE creation, destruction and GRUB menu generation takes place.
bieaz can be divided to three parts:
- probe script, ensuring a bieaz-compatible dataset layout
- main script, responsible for BE-specific commands such as listing, and creation.
- GRUB menu script, which handles GRUB menu generation
In the latest commit, the three parts are all merged to one file - 534 lines of shell script. But it has not always been this way.
As one can easily see from the initial commit:
The next important improvement for bieaz was moving grub.cfg to EFI system partition, ESP for short.
To understand the motivation behind this, I need to explain how GRUB boots, from a user's perspective.
GRUB boots the computer in two stages.
The first stage:
- very minimal, can be fitted in the BIOS boot sector
- resides in BIOS boot sector or EFI system partition as grubx64.efi or BOOTX64.EFI (fallback)
- sole purpose is to load the second stage
- hard-codes the absolute path to second stage at boot loader installation
The second stage:
- resides in a partition, typically appears as /boot/grub
- loads other modules for various functions such as TPM support, LUKS encryption,
- loads the all-important grub.cfg file
- boots the computer with instructions given in grub.cfg
When switching boot environments, the absolute path to /boot/grub is changed. If we do not reinstall the boot loader, that is, update the hard-coded absolute path to /boot/grub, bieaz submenu will become outdated at best, or at worst, if the user has destroyed the hard-coded boot environment, GRUB will not be able to load second stage, requiring manual intervention to boot the computer.
The solution, discovered when I was writing Fedora Root on ZFS guide, is to install second stage in ESP. The location of ESP is fixed and does not change between BE switches. This way, I can ensure an up-to-date bieaz submenu is always loaded by GRUB.
There're also other benefits: I can deprecate the dangerous GRUB reinstall commands from bieaz, and setting default boot environment can be done from anywhere. Previously, being inside the BE was required.
There's a minor inconvenience caused by this change. A copy of grub.cfg needs to be present at /boot/grub, to facilitate the faster menu generation described in the above section.
Needless to say, the Root on ZFS guides are the inspiration of bieaz and the latter is written in tandem with the guides.
By authoring the guides, I became familar of the mounting process of ZFS file systems at boot. This is essential because, "boot environment" is just a fancy term for correctly mounting a desired dataset clone at / at boot.
In total, I probably spent more time on these guides than on bieaz itself. A link to all merged pull requests to openzfs-docs repo, authored by myself, is available here. Readers can refer to the link for detailed changelog.
Important additions include:
- Support for Arch, Fedora, RHEL and NixOS
- Encrypted boot pool and hibernation support for Arch and NixOS
- bieaz support for Arch, Fedora, RHEL. NixOS, on the other hand, has its own rollback mechanism and is thus not supported.
- Handle kernel downgrade for Arch
In the end, although still blissfully ignorant of the internals of ZFS, I think I can declare my mission as accomplished.
My implementation of BE management is not perfect, and as a complete hack, there's certainly a multitude of programmers much more competent than me. But somebody's got to do it, and that somebody just happened to be me.
I did not either plan, or take a structured approach to the development. The experience is more like, someone gave me a hammer named shell, and a stone called ZFS on Linux boot environment. I then hit the stone with this hammer until it breaks. Quite inefficent.