Improvements of the BFQ I/O scheduler

Very good news for the BFQ I/O scheduler. One of the main challenges was to make it fast enough to comply with the speed of modern, flash-based storage. And with the changes that just entered linux-4.15, the maximum throughput sustainable by BFQ has grown by 60%-90%, depending on the CPU. For instance, on the ARM CortexTM-A53 Octa-core of the HiKey, the throughput sustainable by BFQ has grown from 40 to 80 KIOPS (320 MB/s for 4KB random I/O), while it has grown from 250 to 400 KIOPS (1.6 GB/s random I/O) on an Intel i7-4850HQ.

This performance boost opens the door to mid- and high-entry storage. Yet, exactly for this reason, the next big challenge is already around the corner. With these fast devices, an I/O scheduler must not only keep the pace with their high speed, but also make them reach that speed, while at the same preserving the service guarantees it has been designed for (low latency, guaranteed bandwidth for each application, …). Here are some details, just to give a glimpse of why such a challenge is so hard (and actually has never been accomplished so far). High-speed devices go fast only if their multiple internal queues, of I/O requests, are kept non-empty at all times. In this respect, it would be easy for an I/O scheduler to just pass incoming I/O requests to the underlying device as they arrive. Yet, to provide strong service guarantees to applications, an accurate I/O scheduler like BFQ must also control the service order of I/O requests. And modern storage devices do re-order requests internally! So the hard challenge is to keep the device fed enough to make it go fast, but not so much to lose control on service order.

Finally, although no longer a hot news item, it is probably worth mentioning also that, on the other hand, BFQ is now mature for low- to mid-speed, single-queue storage devices (eMMC, SDCard, low and mid-speed SSD, HDD, RAID, …). In fact, on these devices, BFQ has a negligible per-I/O-request overhead, and guarantees:
Very high throughput
Very low latency to tasks involving I/O; from interactive tasks, such as starting applications, to soft real-time tasks, such as playing back audio or video frames
Strong bandwidth guarantees: any application can be reserved and guaranteed the desired fraction of the device throughput