The Nutanix Distributed Filesystem (NDFS) is at the core of the Nutanix Platform. It manages all metadata and data, and enables core features. NDFS is the underpinning architectural element that connects the storage, compute resources, controller VM, and the hypervisor. It also provides full Information Lifecycle Management (ILM), including localizing data to the optimal node.
Metadata is distributed among all nodes in the cluster in order to eliminate any single point of failure and to allow scalability that increases linearly with cluster growth. The metadata is partitioned using a consistent hashing scheme to minimize the redistribution of keys during cluster-sizing modifications.
The system enforces strong consistency through a distributed consensus algorithm. Quorum-based leadership election eliminates potential “split brain” scenarios, which ensures strict consistency of configuration data.
NDFS was designed from the ground up to be extremely fault-resilient. It ensures data availability in the event of a node, controller, or disk failure. NDFS uses a replication factor (RF) that keeps redundant copies of the data. Writes to the platform are logged in the PCIe SSD tier, which can be configured to replicate to another controller before the write is committed. If a failure occurs, NDFS automatically rebuilds data copies to maintain the highest level of availability.
The platform is self-healing. Leveraging distributed MapReduce jobs, it proactively scrubs data to resolve disk or data errors. If a controller VM fails, all I/O requests are automatically forwarded to another controller VM until the local controller becomes available again. This Nutanix auto-pathing technology is completely transparent to the hypervisor, and guest VMs continue to run normally. In the case of a node failure, an HA event is automatically triggered and VMs fail over to other hosts within the cluster. Nutanix ILM localizes I/O operations by migrating data to the virtual machine’s local controller VM. Simultaneously, data is re-replicated to maintain RF, and overall availability.
NDFS provides built-in converged backup and disaster recovery (DR). The converged-backup capabilities leverage array-side snapshots and clones, which are performed using sub block-level change-tracking at the VM and file level. The snapshots and clones are instantaneous, and thin provisioning maintains very low overhead. These capabilities also support hypervisor array offload capabilities, such as VMware API for Array Integration (VAAI).
Snapshots can be configured on a standard schedule to align with RPO and RTOs, and can be replicated to remote sites using array-side replication. This replication is configurable at the VM level, and only the sub-block-level changes are shipped to the remote replication site.
A core design principle of the Nutanix platform is data localization. It keeps data proximate to the VM and allows write I/O operations to be localized on that same node. If a VM migrates to another host in an event such as DRS or vMotion, the data automatically follows the VM so it maintains the highest performance. After a certain number of read requests made by a VM to a controller that resides on another node, Nutanix ILM transparently moves the remote data to the local controller. The read I/O is served locally, instead of traversing the network.
Nutanix incorporates heat-optimized tiering (HOT), which leverages multiple tiers of storage and optimally places data on the tier that provides the best performance. The architecture was built to support local disks attached to the controller VM (PCIe SSD, SSD, HDD) as well as remote (NAS) and cloud-based source targets. The tiering logic is fully extensible, allowing new tiers to be dynamically added and extended. The Nutanix system continuously monitors data-access patterns to determine whether access is random, sequential, or a mixed workload. Random I/O workloads are maintained in an SSD tier to minimize seek times. Sequential workloads are automatically placed into HDD to improve endurance.
The most frequently accessed data (hot data) resides on the highest performance tier (PCIe SSD). That tier is not just a cache – it is a truly persistent tier for both read and write operations. The next hottest data is placed on the SSD tier, which serves as spillover for the highest-performance tier (PCIe SSD), as well as QoS-controlled data. Cold data sits on hard disk drives, the highest-capacity, most economical tier.
The Elastic Deduplication Engine is a software-driven, massively scalable and intelligent data reduction technology. Nutanix Deduplication performs inline deduplication in RAM and flash tiers, and will perform background deduplication in the storage tier (hard disks) to maximize efficiency. Unlike traditional deduplication technologies which focus only on the storage tier, Nutanix Elastic Deduplication Engine spans memory, flash and disk resources simultaneously in a natively converged platform.
NDFS array-side compression capabilities work in combination with Nutanix ILM. For sequential workloads, data is compressed during the write operation using in-line compression. For batch workloads, post-process compression adds significant value as data is compressed once it becomes idle and ILM has moved it down to the highest capacity tier (HDD). All compression configurations are carried out at a container level, and operate at a granular VM and file level. Decompression is done at the sub-block level to ensure precise granularity. The operations are monitored by the ILM process, which proactively moves frequently accessed, decompressed data up to a higher performance data tier.
The Nutanix Virtual Computing Platform converges compute and storage into a single system, eliminating the need for traditional storage arrays. The Nutanix 2U chassis (block) contains two to four independent nodes, each optimized for high-performance compute, memory, and storage. Each node runs an industry-standard hypervisor, and a Nutanix controller VM. The controller VM handles all data I/O operations for the local hypervisor.
All storage is directly mounted into the controller VM using a device pass-through mechanism. Storage resources are then exposed to the hypervisor through traditional interfaces, such as NFS or iSCSI. As new Nutanix nodes are added to the cluster, the number of controller VMs scale 1:1 to provide linear performance. Storage capacity from all nodes is aggregated into a global storage pool, which is accessible by all Nutanix controllers and hosts in the cluster. Containers are then defined from the storage pool, creating a logical datastore. Containers represent the main access point for the hypervisor, and are accessed using traditional interfaces.
The Nutanix platform uses industry-standard hardware. It does not rely on custom FPGAs, ASICs, RAID controllers, or disk drives. As a software-defined solution, Nutanix maintains the control logic in the software, and enables new features through simple software upgrades. NDFS is extensible. The Nutanix platform does not require a shared backplane for communication. Instead, it leverages standard 10GbE for all communications between nodes and controllers, as well as for VM traffic.
The Nutanix platform is based on the same architectural precepts that enable the world’s largest datacenters to scale. Google, Facebook, and Amazon all use a similar design. The Nutanix Distributed File System (NDFS) scales to thousands of nodes and maintains performance and availability as your system grows. Modular, converged building blocks (nodes) allow datacenter managers to start small and scale seamlessly to support future growth.
The Nutanix n-way controller model scales the number of storage controllers with the number of nodes. This design eliminates the performance bottlenecks common with traditional dual-controller storage arrays. Each Nutanix node that is added to a cluster uses its local controller VM as its gateway to NDFS and as its primary I/O point. Nutanix takes a big-data approach with a distributed MapReduce framework to manage cluster-wide operations. Nutanix distributes tasks and operations for self-healing and the redistribution of data for high availability.
IT can mix various Nutanix node types, whether they are compute-heavy or storage-heavy. So, your team can construct an infrastructure with the right balance for a particular environment or workload. Once they are powered on, new Nutanix nodes are automatically discovered using the Linux Avahi protocol and IPv6 link-local addresses. They are then added through the dynamic add-node process with zero downtime. Cluster metadata is distributed to new nodes as they are added, and storage resources are added to the cluster’s storage pool. This process extends the container’s capacity transparently. VMs are provisioned on the new hosts and cluster-balancing features such as DRS move VMs to the new hosts.