Update: The host mentioned in this infrastructure has since been replaced with another the upgrade process is covered here.
My personal infrastructure has gone through a number of iterations. Starting as a 450mhz Pentium 3 Ubuntu 7.04 server running SMB on a single 5400 RPM IDE disk cobbled together through a BT home hub and some cheap megabit switches, it later became an Ubuntu 14.06 host on a laptop with a broken screen and gigabit switches, then a Pentium 4 desktop and then a lightweight Gigabyte Brix mini-PC before I decided to get serious with the entire thing.
All of these iterations were basically functional but after suffering a catastrophic data loss when the drive controller blew on a disk, I set out to build something resilient, secure and functional, as well as learn about as many of the technologies that I was working with at the time and as ever, do the whole thing at the lowest possible cost.
Host: HP MicroServer N36L, 1.3Ghz Dual Core AMD Turion, 8GB RAM, StarTech PCI GB NIC
Host Disks: 4 x 2TB WD HDDs, 1* 256GB Kingston SSD
RAID Controller: HP P410 RAID Controller
Firewall: Juniper SRX100B
Switches (Managed): TP-Link TL-SGE (108 and 105 models)
Switches (Unmanaged): TP-Link TL-SG (108 and 105 models)
Modem: TP-Link W9980
WiFi Access Point: Ubiquiti UniFi AP-AC
Additional Storage: Netgear ReadyNAS Duo2
Backup Disks: External Hard Drives * 2 (4TB, 1TB)
The hardware provides a solid base to configure a highly available and resilient infrastructure (though its low power and low RAM hardly makes it enterprise-grade), it serves it’s functions with little difficulty.
Almost all of the hardware was purchased second hand, some of it was donated, some was bought new. Buying it all now better options are available but buying this same hardware now can be purchased for around £600, probably significantly less.
I obtained the host for free but you can usually obtain them now for around £90-120.
Initially I had hoped to use the embedded RAID controller to deploy all disks as one RAID 5 deployment, some earlier trial, errors and outright disasters proved that the embedded controller is little more than a fake RAID (where RAID can be presented to an OS on baremetal, but the lack of abstraction layers proves a disaster for hypervisors, so this led to the purchase of a P410 RAID controller for about £20 on eBay).
The server has 4 bays for 3.5″ HDDs at a maximum capacity of 2TB each, as a single SAS connector is provided for all disks, this allows for a connection to the RAID controller and a configuration at the hardware level of RAID10 (don’t be suckered in as I was initially by JBOD, this provides no redundancy and led to a week of recovery time in the event of a disk failure), the RAID10 configuration provides just under 4TB volume. A bigger yield can be obtained in RAID5, but this also presents a limitation in disk throughput, and requires a bigger overhead for recovery time in a disk failure, as well as the requirement for a bigger backup volume, both of which are requirements I eventually decided against after a previous failure. The disks I used can be obtained for around £35 each.
8GB of RAM was added in 2 DIMMs at a cost of around £30, whilst this is a rather low amount, the typical cost of operating a headless Linux distribution that does not require much interaction is around 1024-2048MB only depending on load (and in times of real constraint can go even lower).
The hypervisor OS I settled on was ESXi 6.5 (Free Edition) being the final version to support my CPU (removing my long time use of KVM/qemu/libvirt due to spending much more time working with vSphere, installation proved to have zero issues, drivers detected without issue and virtualization support provided within the CPU and kernel.
Operating System for the VMs was my mainstay of Ubuntu 16.04 headless, being the best trade off of lightweight, featured and supported as well as having more experience with Debian variants than RH variants, Debian tends to suffer more driver issues so it was the obvious choice.
Before attempting the installation of KVM we first need to know if the host even supports virtualization in the CPU, otherwise this is a non starter. Booting to a Linux live environment was the quickest means of testing this.
Given that this is an AMD CPU we use:
grep -c svm /proc/cpuinfo
svm referring to the ADM SVM virtualization technology. If this command returns anything other than a zero, we have support and can install the virtualization software and libraries:
sudo apt-get install qemu-kvm libvirt-bin virtinst virt-viewer ebtables dnsmasq
The installation of ESXi was performed to the SSD, with the rest of the space on the SSD being used as a datastore to house the VMs with small OS disks (Ubuntu having only around a 16GB minimum requirement which was raised to 30GB in thin provisioning).
Networking was configured to use the onboard NIC for management traffic, with the additional GB NIC being used for all VM traffic in a bridge configuration and using a dedicated vSwitch/Port Group.
We now have a working hypervisor and can start to install VMs. The first of which offers out data using the SMB protocol (via Samba) and is indexed in a MySQL database. Given the large data storage requirement, the RAID10 volume is presented to VMWare as an entire single datastore and then joined to the VM file server as a second disk and added to the fstab to ensure that it is visible at the OS level.
At some point, the 4TB storage reached it’s limit and needed an extension, requiring another 1TB to be added in the form of a NAS containing two 1TB drives in a RAID1 configuration.
This offers data over the SMB protocol and is mounted in to the VM mentioned previously to allow transfer between nodes.
I wrestled with a few backup options. Initially I used several custom rsync scripts which were robust, but I found a number of issues and maintaining the logs was too much overhead. Following this I considered switching to Bacula but the solution is too enterprise ready and more than I needed.
Eventually I settled on NAKIVO, a robust and vSphere integrated platform which is free for up to 10 VMs (and works with ESXi free edition), this runs on an external box, it also allows for multiple repositories allowing for the splitting of data and OS disks.
Backups are sent to a an external 4TB HDD which sends incremental changes daily and rolls up changes on a weekly basis, with OS disks being sent to the NAS.
The NAS itself luckily provides its own backup utility to a single 1TB HDD which cost around £30, these backups follow the same incremental and weekly rollup system.
The modem is a TP-Link W9980 which is sold as a VDSL2 WiFi Router, however I’m only interested in its modem functionality. Since I use a non-BT ISP authentication is provided by MER (MAC Encapsulation Routing) which requires the use of a beta firmware provided by TP-Link in order to authenticate with the ISP. These can be obtained for around £20 now.
The managed switches can be obtained for around £40 for the 108 model and £30 for 105 model, and the unmanaged for around £25 and £15.
Connected to a Ethernet port on the modem is one of the managed switches (105E) with the unmanaged extending the number of data VLAN access ports. This connection is dedicated to a single VLAN which is tagged at the firewall on another switch. A single run of cable connects these devices to the other side of the house where it connects to a trunk port on the 108E and it split in to other access ports and the data VLAN is extended in to the 108.
All traffic from the modem is set to forward traffic to the IP address of the modems outside (untrust) interface, which in turn acts as the default route of the firewall for all outgoing traffic and the interface for the NAT configurations.
The WiFi Access point is then connected to the 108 where several VLAN tags are applied to a single switch port and used to allow access to multiple WiFi SSIDs which are tagged to multiple VLANs. The WiFi AP can be picked up for around £40-50.
The managed switches support 802.1Q VLANs, and an unusual (seemingly TP-Link only method of VLANs that doesn’t seem to exist in any other implementation which is awful and doesn’t serve any purpose I can work out).
By default, the VLAN tag of 1 (DEFAULT) cannot be removed unless the firmware is upgraded to the latest, this of course is a terrible practice, so get the firmware upgraded as soon as they’re out the box.
The managed switches support a number of layer 2 VLANs, trunks and multiple tags for access ports and work akin to HP/Aruba switches.