Cloud web hosting company needed storage solutions that can help them manage critical business data. With the rise of open source software and high-performance storage systems, their penetration into cloud technology became imminent. Underlining this principle of high-performance storage systems for fast compute speed, Ceph storage was formed.
What is Ceph Storage?
Ceph storage is a software-defined storage solution that distributes data across clusters of storage resources. It is a fault-tolerant and scale-out storage system, where multiple Ceph storage nodes (servers) cooperate to present a single storage system that can hold many petabytes (1PB = 1,000 TB = 1,000,000 GB) of data.
Most storage products are block only, file only, object only, or file and block. However, Ceph storage unifies file, block and object data into one system. Ceph storage also includes storage management tools such as thin provisioning, erasure coding (replication), inline compression and cache tiering. Ceph is the only storage platform that is open source, software defined, enterprise class and unified (object, block and file). It supports iSCSI block storage, allowing Linux, VMWare, Unix and Windows servers to access storage from a Ceph cluster.
The CephFS file system runs on top of the storage system and provides an interface for object and block storage. A Ceph metadata server cluster maps the directories and file names of the file system to objects stored in clusters. Enterprises use Ceph storage for its unified file-block-object management, multi-platform support, scalability and built-in fault tolerance. Ceph storage requires no proprietary hardware components, so administrators can use low-cost commodity hardware.
How does Ceph storage work?
Ceph uses Ceph Block Device, a virtual disk that can be attached to bare-metal Linux-based servers or virtual machines. RADOS (Reliable Autonomic Distributed Object Store), an important component in Ceph, provides block storage capabilities like snapshots and replication which can be integrated with OpenStack Block Storage. Ceph also makes use of POSIX (Portable Operating System Interface), a Ceph file system to store data in their storage clusters. The file system uses the same clustered system as Ceph block storage and object storage to store a large amount of data.
On the whole, Ceph’s functioning as a storage system is quite simple. Hence, it is deployed by many hosting and IT solution providers for their clients.
What are the features of Ceph storage and why do we need it?
1) Ceph supports emerging IT infrastructure: Today, software-defined storage solutions are an upcoming practice when it comes to storing or archiving large volumes of data. One of the prime reasons for this being legacy infrastructure and solutions cannot meet the storage needs at a reasonable cost. Moreover, with cloud technology being increasingly leveraged by IT organizations, providing a solution as befitting becomes necessary. All these factors have helped Ceph steal an important spot when it comes to new infrastructure.
2) Ceph provides dynamic storage clusters: Most storage applications do not make the most of the CPU and RAM available in a typical commodity server but Ceph storage does. Right from rebalancing the clusters to recovering from errors and faults, Ceph offloads work from clients by using distributed computing power of Ceph’s OSD (Object Storage Daemons) to perform the required work.
3) Ceph is scalable, reliable and easy to manage: Ceph allows organizations to scale without affecting their Cap-ex or Op-ex. A Ceph node leverages commodity hardware and intelligent daemons along with Ceph Storage Clusters which communicate with each other to replicate and redistribute data dynamically. These nodes are monitored by Ceph monitors to ensure their high availability.
How is Ceph storage beneficial for web professionals and how can they make the most out of it?
a) Data safety– Ceph makes each data update visible to clients. In addition to this, they also let users know that this updated data is safely replicated on a disk and will survive power or other failures. Moreover, RADOS dissociates synchronization from safety while acknowledging updates in a bid to allow Ceph to realize low-latency updates for app synchronization and data safety semantics. In this manner, Ceph storage ensures data safety for users.
b) Failure detection- Spotting errors or failures at the right time is of essence while securing data. However, this can get difficult with too many clusters on a large scale. OSDs (Object Storage Daemons) can self-report in such cases. If OSDs do not hear of any failures from peers then a RADOS considers two dimensions of the OSD- a) whether it is reachable or b) whether it is assigned data by CRUSH. In case the OSD is not responsive, it gets marked down and any primary responsibility that it holds is passed to the next OSD on a temporary basis. Thus, Ceph monitoring detects anomalies, if any, in a distributive environment. Also, this distributed detection allows quick detection without burdening the monitors while resolving inconsistencies.
c) Cluster recovery and updates- In case of OSD failures, OSD cluster maps undergo changes. In a bid to provide fast recovery, OSD maintains a version number for each object and a log for recent changes. So, for example, let us consider OSD1 and OSD2. If OSD1 crashes and is marked down, its status is updated and OSD2 takes over. Once OSD1 recovers, it will request the latest map on boot and a monitor will mark it as up. This OSD2 realizes that it is no longer required to conduct primary responsibilities and allows OSD1 to take over to retrieve log entries. In this manner, Ceph not only allows data storage to remain safe but also recovers clusters of data quickly.
d) Data distribution & replication- Ceph adopts a simple strategy when it comes to distributing data. Ceph maps objects into PGs (placement groups) using a simple hash function. These placement groups are then assisted to OSDs using CRUSH to store object replicas. This differs from conventional approaches where one has to depend on a lot of metadata, though Ceph also uses metadata in a very small way. Also, when it comes to replication, data is replicated in terms of these placement groups each of which is mapped to an ordered list of OSDs. This distribution and replication have made Ceph a scalable storage solution.
Ceph overcomes crucial challenges when it comes to storage systems- scalability, reliability and performance. It also has important and useful role in securing cloud storage. Most importantly, its central tenets- RADOS, CRUSH, and POSIX have made Ceph a holistic storage system.