PEAK:AIO and Los Alamos National Labouratory have launched Lattice, an open-source pNFS metadata server for AI and high-performance computing storage systems.
Lattice was developed through a long-term engineering collaboration between the Manchester-based storage start-up and the US national laboratory. The project has been launched under the Linux Foundation as an open-source effort focused on parallel file system design.
The launch aims to address a longstanding problem in large-scale storage: the metadata layer that directs access to files can become a bottleneck as computing systems grow. In AI and high-performance computing environments, that can limit the rate at which processors and accelerators are fed with data.
According to PEAK:AIO, Lattice uses a distributed metadata design that separates the control plane into four layers: Protocol State Plane, Lattice Core, MD Catalog Authority and Data Server Control Plane. This design allows metadata services to run independently of persistent metadata and to be deployed across commodity hardware as demand changes.
The groups said the architecture can scale from a single server to more than 1,000 metadata servers. They described the software as Linux-based and designed to run in user space.
Attention on storage design has increased as AI operators build larger clusters for model training, inference and related applications. While processor performance has advanced rapidly, storage software has faced growing scrutiny over whether it can handle the volume and parallelism those systems require.
PEAK:AIO cited data from Cast AI showing average GPU utilisation across 23,000 production clusters at 5%. It argued that storage software, and metadata systems in particular, have become a key reason expensive computing resources are not fully used.
Performance tests
The organisations disclosed several early test results from the collaboration, including performance gains from 70 GB/s to 400 GB/s in one set of measurements.
On production hardware at Los Alamos, standard Linux NFS configurations delivered between 3 GB/s and 7 GB/s throughput, while the pNFS Lattice design reached 40 GB/s on the same servers, they said. Tests with a Tier 1 technical university also showed metadata-heavy workload improvements of more than 300% over conventional approaches.
In MDtest benchmarks, early testing showed up to a 10x improvement over standard Linux KNFSD, according to the groups. Performance work is continuing.
Gary Grider, HPC division leader at Los Alamos National Labouratory, gave technical detail on the design.
"PNFS-Lattice is unique in that it is an open-source, user-space, scalable PNFS metadata server, from the ground up, by leveraging the concept of separating the PNFS metadata service from the Metadata Store (catalog)," Grider said.
"Since the service is separate from the persistent metadata and it runs in user space, it is well poised to be an ephemeral service that could be resized on the fly. Further, since it is open-source and user space, it lowers the bar for community participation, encouraging more innovation driven by AI, HPC, and other community needs," he said.
Commercial model
Alongside the open-source project, PEAK:AIO will offer PEAK:AIO pNFS, a commercially supported version of Lattice for customers that want support agreements and additional features without managing the open-source software directly.
The arrangement is intended to mirror the relationship between Lustre and commercial distributions built around that file system, while keeping the underlying foundation open and standards-based.
Roger Cummings, president and chief executive officer of PEAK:AIO, linked the launch to wider changes in AI infrastructure economics.
"AI infrastructure markets are approaching an inflection point where scaling compute alone no longer delivers any meaningful efficiency gains," Cummings said.
"Our collaboration with Los Alamos National Labouratory was built around the idea that if AI infrastructure is to scale efficiently, metadata must become elastic, distributed and open. Lattice represents that transition, and we're excited to build it with the Linux and HPC communities beside us," he said.
Mark Klarzynski, Chief Strategy Officer and co-founder of PEAK:AIO, described the underlying design changes in similar terms.
"The key innovation behind Lattice is that it breaks apart what has traditionally been locked inside a single metadata server into four distinct layers: the Protocol State Plane, the Lattice Core, the MD Catalog Authority, and the Data Server Control Plane," Klarzynski said.
"That separation unlocks intelligent scale in a way traditional storage architectures were never designed to support. Metadata and data services can now become distributed, elastic participants that scale, fail over and adapt around the workload, rather than remaining fixed appliances or static MDS pairs. This is a fundamental step forward for pNFS and parallel file system design for ultra-high-performance storage, allowing metadata to move beyond the limitations that have constrained scale-out storage for decades," he said.