Large-scale file storage has reached a tipping point. The amount of unstructured data has been growing steadily, and its growth has accelerated to the point that companies wonder how they will be able to manage the ever-increasing scale of their digital assets. In addition, the global reach of public clouds are creating new demand for the mobility of file data.
As a result, new requirements for file storage are emerging for enterprises, underscoring the need for a scale-across file storage system. Such a system would have no upper limit on the number of files it could manage, regardless of size, and it would be able to run anywhere, on-prem or in the cloud, or both.
Qumulo offers an enterprise-proven, highly scalable, hybrid cloud file storage system that can span the data center and the cloud. It scales to billions of files, costs less, and has a lower TCO than legacy storage solutions. Qumulo also provides the highest performant file storage system on-prem and in the cloud. With built-in real-time analytics, administrators can easily manage data no matter the file size or where it’s located globally.
Qumulo’s continuous replication enables data to move where it’s needed, when it’s needed; for example, between on-prem clusters and clusters running in the cloud, or between cloud clusters. Qumulo’s software runs on Intel® Xeon® Gold based industry-standard hardware and was designed from the ground up to meet today’s requirements for scale. Qumulo offers the world’s first scale-across file storage system, allowing modern enterprises to easily store and manage files numbering in the billions, in any operating environment, anywhere in the world.
A Perfect Storm
IDC predicts that the amount of data deployed in public clouds, private clouds, and on-prem for file services is expected to reach 45.5 exabytes, 10.6 exabytes and 57.3 exabytes by 2022 respectively.1 An exabyte is a million terabytes. To put that in perspective, you can store 341 billion three-minute MP3s in an exabyte – that is a lot of music.
The rise of the public cloud signalled that compute resources and global reach were now achievable without building data centers across the world. Consequently, new ways of working have arrived and are here to stay. All businesses realize that, in the future, they will no longer be running their workloads out of single, self-managed data centers. Instead, they will be moving to multiple data centers, with one or more in the public cloud, or completely in the cloud. This flexibility will help them adapt to a world with geographically-dispersed employees and business partners. Companies will focus their resources on their core business lines instead of on IT expenditures. Most will improve their disaster recovery and business continuity plans, and many will do this by taking advantage of the cloud.
The storage industry finds itself at a crossroads, which includes both new challenges and new opportunities. Without innovation among storage providers, users of largescale file storage will continue to struggle to understand what is going on inside their systems. They will struggle to cope with massive amounts of data. They will struggle to meet the demands for global reach, with few viable options for file data that span both the data center and the cloud.
Scale-Across File Storage
Traditionally, companies face two problems when deploying file-based storage systems: they need to scale both capacity and performance simultaneously. In the world where the growth of unstructured data is unrelenting, scale is no longer limited to these two axes. New criteria for scale have emerged, including the number and size of files stored, the ability to control enormous amounts data in real-time, to distribute data globally, and the flexibility to leverage on-prem, hybrid, or cloud deployments. These requirements define a new market category called scale-across file storage.
Scale-across file storage works across operating environments, including onprem data centers, as well as private and public clouds. Proprietary hardware is increasingly a dead end for users of large-scale file storage. Today’s businesses need flexibility and choice. They want to store files in data centers, in private clouds and/or public clouds, opting for one or the other based on business decisions rather than on the technical limitations of their storage platform.
Re-Thinking The File Storage Industry
Legacy scale-up and scale-out file systems are not capable of meeting the emerging requirements of managing storage on-prem and/or in the cloud at scale. The engineers who designed them 20 years ago never anticipated the number of files and directories, and mixed file sizes, that characterize modern workloads. They could also not foresee cloud computing.
With Qumulo’s file storage, cloud instances or computing nodes with Intel® Xeon®
Gold based standard hardware work together to form clusters that provide scalable
performance a single, unified file system. Qumulo clusters work together to form a
globally distributed, highly connected, storage solution tied together with continuous
How Qumulo Works
Qumulo is a new kind of storage company, based entirely on advanced software and modern development practices. Intel based industry standard hardware running advanced, distributed software is the basis of modern, low-cost, scalable computing. This is just as true for file storage at large scale as it is for search engines and social media platforms.
Qumulo’s file system is unique in how it approaches the problems of scalability. Its design implements principles similar to those used by modern, large-scale, distributed databases. The result is a file system with unmatched scale characteristics.
THE QUMULO FILE SYSTEM
For massively scalable files and directories, Qumulo’s file system makes extensive use of index data structures known as B-trees. B-trees minimize the amount of I/O required for each operation as the amount of data increases. With B-trees as a foundation, the computational cost of reading or inserting data blocks grows very slowly as the amount of data increases.
REAL-TIME ANALYTICS WITH QUMULO
When people are introduced to Qumulo’s real-time analytics and watch them perform at scale, the first question is usually, “How can it be that fast?”. The breakthrough performance of Qumulo’s analytics is that it continually maintains up-to-date metadata summaries for each directory. It uses the file system’s B-trees to collect information about the file system as changes occur. Various metadata fields are summarized inside the file system to create a virtual index. The performance analytics that you see in the GUI, and can pull out with the REST API, are based on sampling mechanisms that are enabled by Qumulo’s metadata aggregation. In contrast, metadata queries in legacy storage appliances are answered outside of the core file system by an unrelated software component.
Just as real-time aggregation of metadata enables Qumulo’s real-time analytics, it also
enables real-time capacity quotas. Quotas allow administrators to specify how much
capacity a given directory is allowed to use for files.
Qumulo’s auditing capability is easy to set-up and integrates with standard monitoring systems for enhanced security. Audit will track all events and actions with your data and can scale from thousands to millions of IOPS with minimal performance impact.
Snapshots let system administrators capture the state of a file system or directory at a given point in time. If a file or directory is modified or deleted unintentionally, users or administrators can revert it to its saved state. Snapshots in Qumulo’s file system have an extremely efficient and scalable implementation. A single Qumulo cluster can have a virtually unlimited number of concurrent snapshots without performance or capacity degradation.
Qumulo provides continuous replication across storage clusters, whether on-prem or in the cloud. Once a replication relationship between a source cluster and a target cluster has been established and synchronized, Qumulo’s software automatically keeps data consistent. There’s no need to manage the complex job queues for replication associated with legacy storage appliances.
SCALABLE BLOCK STORE (SBS)
The Qumulo file system sits on top of a transactional virtual layer of protected storage blocks called the Scalable Block Store (SBS). Instead of a system where every file must figure out its protection for itself, data protection exists beneath the file system, at the block level. Qumulo’s block-based protection, as implemented by SBS, provides outstanding performance in environments that have petabytes of data and workloads with mixed file sizes. SBS has many benefits, including:
- Fast rebuild times in case of a failed disk drive;
- The ability to continue normal file operations during rebuild operations;
- No performance degradation due to contention between normal file writes and rebuild writes;
- Equal storage efficiency for small files and for large files;
- Timely, accurate reporting of usable space;
- Efficient transactions that allow Qumulo clusters to scale to many hundreds of nodes; and
- The ability to balance performance during rebuilds.
The virtualized protected block functionality of SBS is a huge advantage for the Qumulo file system. In legacy storage systems that do not have SBS, protection occurs on a file-by-file basis or using fixed RAID groups, which introduces many difficult problems such as long rebuild times, inefficient storage of small files, and costly and inefficient management of disk layouts.