Normal view

There are new articles available, click to refresh the page.
Today — 17 April 2026Main stream

NetApp Expands Google Cloud Collaboration for Sovereign, Air-Gapped Deployments

16 April 2026 at 17:25

NetApp announced an expanded collaboration with Google Cloud, formalized through a four-year enterprise agreement to accelerate the deployment of NetApp storage within Google Distributed Cloud (GDC) Air-Gapped environments. Delivered with World Wide Technology (WWT), the offering targets sovereign cloud use cases that require strict data residency, security, and operational isolation.

The joint solution integrates NetApp’s data platform with Google Distributed Cloud’s full-stack private cloud architecture. The result is an air-gapped environment that supports sensitive and classified workloads while maintaining compliance with national sovereignty requirements. NetApp positions its storage systems as secure-by-design, enabling organizations to deploy controlled infrastructure that supports modern applications and AI workflows without external connectivity.

Google Cloud Air Gapped graphic

NetApp integrates its AFF all-flash systems, StorageGRID object storage, and Trident Kubernetes storage orchestration into the GDC stack. Together, these components form what the company calls an intelligent data infrastructure. Within GDC, this architecture supports zero-trust security models, local data storage, customer-managed encryption keys, and full operational control. The platform enables organizations to extend cloud capabilities to on-premises or edge environments while maintaining isolation, or to operate in fully disconnected, air-gapped configurations.

The collaboration is primarily aimed at government and regulated industries, where data-handling requirements limit the use of traditional public cloud. NetApp leadership highlighted that these environments require infrastructure capable of handling classified data while supporting modernization initiatives. By integrating with GDC, NetApp enables enterprise-grade AI and analytics capabilities within accredited environments, allowing agencies to derive insights and automate processes without compromising compliance or sovereignty.

Google Distributed Cloud is designed to extend Google Cloud services to customer-controlled locations, including on-premises data centers and edge sites. Google noted that public-sector organizations face growing pressure to extract value from data while complying with strict regulatory frameworks. GDC addresses this by enabling the deployment of cloud-native services and advanced AI in sovereign and disconnected environments.

As part of this effort, Google has expanded the availability of its AI capabilities for regulated use cases. Gemini models are now supported in GDC environments, enabling generative AI functions such as automation, content generation, discovery, and summarization directly on-premises. These capabilities can run in fully disconnected deployments, allowing organizations to leverage advanced AI while maintaining strict security and compliance boundaries.

The NetApp and Google Cloud partnership reflects a broader trend of bringing cloud and AI capabilities into controlled environments. By combining enterprise storage with sovereign cloud infrastructure, the companies are targeting organizations that require both advanced data services and strict operational isolation.

The post NetApp Expands Google Cloud Collaboration for Sovereign, Air-Gapped Deployments appeared first on StorageReview.com.

Yesterday — 16 April 2026Main stream

Broadcom Extends VMware Tanzu Platform with Agent Foundations for Enterprise AI

15 April 2026 at 18:02
VMware tanzu platform graphic VMware tanzu platform graphic

At the AI in Finance Summit, Broadcom introduced VMware Tanzu Platform agent foundations, positioning it as a secure-by-default runtime for building and operating autonomous AI applications on VMware Cloud Foundation (VCF). The release extends Tanzu’s established code-to-production model to AI agents, targeting enterprise teams seeking to move from isolated AI experiments to governed, production-scale deployments.

Moving AI Agents into Enterprise Operations

As AI agents assume execution and decision-making roles, operational requirements shift toward governance, security, and integration with enterprise systems. Many organizations still run AI workloads in isolated environments that lack access to core data and standardized controls.

VMware tanzu platform graphic

Tanzu Platform agent foundations address this gap by providing a pre-engineered platform-as-a-service layer for agent workloads, built directly on VCF. This enables platform engineering teams to manage AI services alongside traditional applications with familiar tooling and processes, without requiring deep specialization in AI infrastructure.

Deny-by-Default Agent Runtime

The agentic runtime introduces a set of controls to constrain agent behavior and reduce operational risk.

The software supply chain is managed using trusted Buildpacks rather than user-defined Dockerfiles. Containers are automatically built, patched, and verified, reducing exposure to embedded vulnerabilities or malicious components.

Secrets management is enforced at the structural level, preventing agents from accessing credentials outside their scope. This isolation is reinforced by VMware vDefend, which extends protections across infrastructure services and external SaaS integrations, limiting lateral movement.

Networking uses a zero-trust model. Agents operate within predefined resource and connectivity boundaries and have no default access to internal systems or models. Access is granted explicitly via secure service bindings, ensuring agents interact only with authorized data sources and services.

Developer Onboarding and Integrated Data Services

The platform includes pre-built agent templates to accelerate onboarding. Developers can provision agents with governed access to models, Model Context Protocol servers, and curated marketplace services defined by IT.

Data services are integrated into the platform, including Tanzu for Postgres with pgvector, as well as caching, streaming, and data flow services. Support for Spring AI memory services enables stateful agent behavior that aligns with enterprise application patterns.

Operational Scaling on VMware Cloud Foundation

Tanzu Platform agent foundations integrate with VCF infrastructure APIs to abstract away resource provisioning and lifecycle management. This ensures that agent workloads and their dependencies receive the required compute, storage, and networking resources without direct interaction with the infrastructure.

Elastic scaling allows environments to scale up or down based on workload demand, supporting both short-lived and persistent agents while optimizing cost and utilization.

High availability is achieved through multiple layers of redundancy and automated remediation. The platform continuously monitors and self-heals the underlying infrastructure to maintain service continuity for mission-critical autonomous applications.

An integrated AI gateway provides centralized control of model and tool access. It manages availability, usage policies, cost controls, and safety filtering for both public and private models on VCF.

According to Purnima Padmanabhan, General Manager of the Tanzu Division at Broadcom, the rapid agentic application development is driving collaboration with customers to accelerate innovation. She highlights that Tanzu Platform agent foundations enable the rapid deployment of agentic ideas on modern private clouds, specifically using VMware Cloud Foundation 9.

With agent foundations, Broadcom is aligning Tanzu Platform with emerging enterprise AI requirements, with a focus on governance, security, and operational consistency. The approach builds on existing VMware infrastructure investments and introduces a standardized runtime for agent-based applications, making AI deployment more predictable and manageable at scale.

The post Broadcom Extends VMware Tanzu Platform with Agent Foundations for Enterprise AI appeared first on StorageReview.com.

Wasabi Technologies to Acquire Seagate Lyve Cloud Business

15 April 2026 at 18:01

Wasabi Technologies has reached a definitive agreement to acquire the Lyve Cloud business from Seagate Technology. As part of the transaction, Seagate will receive an equity stake in Wasabi, officially becoming a shareholder. While specific financial details remain undisclosed, the move marks a significant consolidation in the pure-play cloud storage market.

Wasabi security graphic

David Friend, co-founder and CEO of Wasabi, noted that the acquisition bolsters the company’s position as a leader in independent cloud storage. The integration of Lyve Cloud brings a dedicated enterprise customer base into Wasabi’s ecosystem. These customers will transition to Wasabi’s global data center infrastructure, which features specialized security tools such as Covert Copy and integrated AI capabilities. The provider intends to maintain high levels of technical support and partner integration for the incoming Lyve Cloud users.

For Seagate, the divestiture serves a specific strategic purpose. Gianluca Romano, Seagate’s CFO, indicated that the sale allows the company to refocus resources on its core mass-capacity storage hardware business. As the demand for high-capacity drives continues to climb, Seagate aims to prioritize manufacturing and innovation in hard drive technology. By transitioning the cloud service to Wasabi, Seagate ensures that a specialized provider services its existing cloud customers while the manufacturer maintains an indirect interest through its new equity position.

Engineering for Enterprise Scale

The proliferation of AI initiatives, large-scale analytics, and extensive video workloads is currently driving demand for enterprise-grade storage. As organizations manage data volumes reaching the petabyte scale, the total cost of ownership and vendor complexity become critical factors in infrastructure design. Many firms are moving away from traditional hyperscalers in favor of providers that offer predictable pricing models and robust security without the egress fees often associated with legacy cloud platforms.

Lyve Cloud established itself as a viable enterprise platform by prioritizing compliance and security features. By merging these assets with Wasabi’s established channel reach and execution strategy, the combined entity provides a streamlined alternative for professional IT environments. The acquisition aims to deliver consistent performance at scale while addressing the economic challenges of long-term data retention.

Ecosystem Integration and Data Protection

The consolidation of these two platforms simplifies the data protection and backup landscape for administrators. Both Wasabi and Lyve Cloud maintain deep integrations and certifications with leading backup software providers, including Veeam, Rubrik, and Commvault. This overlap ensures that existing automated workflows and S3-compatible API calls remain functional during and after the transition.

For channel partners and system integrators, the acquisition reduces the overhead of managing multiple independent S3-compatible storage vendors. By unifying the service under a single banner, Wasabi enhances its ability to support mission-critical backup and recovery workloads. This move strengthens the broad ecosystem of independent storage solutions, providing technical teams with a reliable, cost-effective target for enterprise data offloading.

The post Wasabi Technologies to Acquire Seagate Lyve Cloud Business appeared first on StorageReview.com.

Before yesterdayMain stream

Scale Computing and Nexsan Address Asymmetric Growth in HCI Environments

8 April 2026 at 16:32
Scale computing hypervisor graphic Scale computing hypervisor graphic

While hyperconverged infrastructure (HCI) has simplified virtualization via streamlined deployments and reduced operational overhead, traditional architectures often struggle with asymmetric scaling. This is particularly evident when storage requirements for large unstructured datasets outpace compute needs, forcing IT teams into inefficient and costly node expansions.

scale computing and nexsan logos

To address this imbalance, Scale Computing and Nexsan have introduced a joint architecture that integrates the SC//HyperCore virtualization suite with enterprise-grade external storage. This combined solution allows organizations to decouple storage growth from compute resources, providing a scalable and cost-effective model for capacity-intensive workloads like video retention, backup repositories, and long-term archives.

Addressing Real-World Infrastructure Constraints

Many IT teams are modernizing their infrastructure while still managing legacy storage investments and growing volumes of unstructured data. Requirements such as long-term video retention, secure backup strategies, and preservation of existing SAN and NAS assets create architectural friction. Traditional approaches often force a tradeoff between adopting fully integrated HCI stacks or continuing with less efficient legacy systems.

The combined Scale Computing and Nexsan approach avoids this binary decision. It enables organizations to retain the simplicity of HCI for core workloads while extending storage capacity through external systems that scale independently.

Architecture Overview

SC//HyperCore provides a tightly integrated virtualization platform with built-in high availability and simplified lifecycle management. It is designed to minimize administrative overhead, particularly in edge and remote deployments.

Scale computing hypervisor graphic

Nexsan complements this with a portfolio of external storage platforms that support block, file, and object protocols. These systems are designed for capacity scaling, long-term retention, and data protection. Together, the platforms enable a hybrid model in which performance-sensitive workloads remain on-cluster while capacity-heavy datasets are offloaded to external storage.

This separation allows IT teams to align infrastructure decisions with actual workload characteristics rather than forcing all applications into a single scaling model.

Edge and Distributed Use Cases

The joint solution is particularly relevant in edge environments across sectors such as retail, healthcare, manufacturing, education, and government. These deployments often require local compute resources to ensure application performance while supporting centralized data strategies.

Nexsan e-series e60 image front facing

SC//HyperCore simplifies operations at remote sites with limited IT presence, while Nexsan platforms handle the associated data growth. This includes centralized archives, backup repositories, and long-term video storage. The result is an edge-to-core architecture that maintains edge simplicity without sacrificing enterprise storage capabilities.

Flexible Storage Integration

A key aspect of the joint approach is support for multiple storage access methods based on workload requirements. Organizations can deploy iSCSI for block-based virtual machine storage, NFS or SMB for file services, and S3-compatible object storage for modern data workflows.

This flexibility enables use cases such as immutable backups, lifecycle-managed archives, and centralized data repositories. It also supports edge-to-core data flows, in which applications run locally while large datasets are aggregated centrally.

Security and Data Protection Considerations

Infrastructure decisions increasingly prioritize cyber resilience alongside performance and capacity. Nexsan platforms incorporate features such as immutable snapshots, object locking, replication, and encryption. These capabilities support secure backup, compliance retention, and rapid recovery workflows.

The Unity NV-Series targets mixed workloads with an emphasis on ransomware resilience, while the E-Series P focuses on dense, high-capacity block storage scenarios such as surveillance. These design points align with environments where data protection and recoverability are critical operational requirements.

Use Cases

The joint solution is best suited for environments with uneven growth patterns and a need for operational simplicity. Common use cases include video surveillance retention, backup and disaster recovery repositories, centralized file services, and long-term archival storage.

It also aligns well with organizations that are modernizing their virtualization while preserving existing storage investments. For channel partners and managed service providers, the architecture supports repeatable solution design that can be tailored to specific vertical requirements.

By separating compute and storage scaling while maintaining a unified operational model, Scale Computing and Nexsan provide a pragmatic approach to modern infrastructure design that reflects how enterprise workloads and data actually grow.

The post Scale Computing and Nexsan Address Asymmetric Growth in HCI Environments appeared first on StorageReview.com.

Nutanix and NetApp Announce Integration to Align ONTAP with Nutanix Cloud Platform

7 April 2026 at 20:51

At the Nutanix .NEXT Conference in Chicago, Nutanix and NetApp announced a strategic partnership to integrate NetApp Intelligent Data Infrastructure with the Nutanix Cloud Platform (NCP), including support for the Nutanix AHV hypervisor. The integration is expected later this year and targets enterprises looking to align virtualization and data management strategies across on-premises, hybrid cloud, and containerized environments.

Nutanix cloud platform graphic

The joint approach brings NetApp ONTAP into the Nutanix ecosystem as a primary data layer, combining ONTAP’s mature data services with NCP’s unified operational model. This positions ONTAP as the storage backbone while Nutanix continues to deliver compute, virtualization, and cloud orchestration through AHV and its broader platform stack.

NetApp data platform graphic

NetApp and Nutanix are streamlining the modernization of virtualized environments by providing secure and efficient solutions, according to Sandeep Singh, Senior Vice President and General Manager of Enterprise Storage at NetApp. Singh emphasized the importance of Intelligent Data Infrastructure as the foundation for transforming virtualization and data operations, highlighting that their collaboration simplifies the operation of virtualized workloads at the enterprise level.

NFS-Based Connectivity

The integration centers on NFS-based connectivity between Nutanix and ONTAP systems. This allows virtual machines to run on Nutanix while leveraging external NetApp storage, enabling a disaggregated architecture where compute and storage scale independently. This model is particularly relevant for organizations seeking to optimize resource utilization or extend existing NetApp investments into Nutanix environments.

Migration is a key focus area. The companies are aligning NetApp Shift and Nutanix Move to enable faster VM migrations to AHV environments. The tooling is designed to enable data-in-place conversions, reducing the need for full data copies and shortening migration timelines to minutes in some scenarios. This approach is intended to minimize operational disruption while accelerating adoption of the Nutanix platform.

Operational simplification is another stated goal. By offloading storage services to ONTAP, organizations can centralize data management functions such as snapshots, replication, and tiering, while Nutanix manages compute and virtualization. The combined environment is expected to offer unified visibility and control, reducing administrative overhead and simplifying troubleshooting across the stack.

VM-Level Granularity

The integration also introduces VM-level granularity for storage operations. Administrators can apply policies for performance, capacity, and data protection at the individual VM level, rather than managing resources at a broader datastore or cluster level. This aligns with enterprise requirements for fine-grained control in multi-tenant or mixed workload environments.

Cyber resilience is addressed through native ONTAP capabilities. The solution is expected to incorporate Autonomous Ransomware Protection with AI and NetApp’s ransomware resilience services, providing real-time detection of anomalies and potential data exfiltration. These features extend Nutanix’s existing security posture with deeper storage-layer intelligence.

NetApp Chief Commercial Officer Dallas Olson emphasized that partnering with Nutanix enhances NetApp’s position as a leader in storage and data management for virtualization. The collaboration aims to provide enterprises with a robust foundation for building an Intelligent Data Infrastructure that offers high performance, resilience, and scalability to support growing virtualization requirements.

Nutanix President and Chief Commercial Officer Tarkan Maner announced that their partnership with NetApp enables customers to modernize their virtualization platforms and leverage Intelligent Data Infrastructure at their own pace, combining modernization with advanced data management capabilities.

Overall, the partnership reflects a shift toward composable infrastructure models, in which best-of-breed compute and storage platforms are integrated via standard protocols and unified management layers.

The two vendors also plan to collaborate on AI initiatives. ONTAP integration with the Nutanix Agentic AI stack is intended to support emerging enterprise AI use cases, focusing on data accessibility, governance, and performance in AI-driven workflows.

The post Nutanix and NetApp Announce Integration to Align ONTAP with Nutanix Cloud Platform appeared first on StorageReview.com.

AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROCm Stack

2 April 2026 at 11:50

AMD has released its MLPerf Inference v6.0 results, positioning the Instinct MI355X GPU as a scalable inference platform across single-node, multinode, and heterogeneous deployments. The submission extends beyond incremental gains by adding new workloads, demonstrating cluster-scale throughput exceeding 1 million tokens per second, and validating reproducibility across a growing partner ecosystem.

CDNA 4 Architecture Targets High-Capacity Inference

The Instinct MI355X GPU is based on AMD’s CDNA 4 architecture built on a TSMC 3nm | 6nm FinFET process (it uses a dual-process chiplet design: the compute dies (XCDs) use TSMC’s 3nm node while the I/O dies use 6nm), integrating 185 billion transistors and supporting FP4 and FP6 data formats. It’s worth noting that this is across the entire multi-chiplet package, not a monolithic die. Each GPU includes up to 288GB of HBM3E memory, enabling support for models up to 520 billion parameters on a single device. AMD positions this combination of compute density and memory capacity as critical to large-model inference without excessive model partitioning.

The platform is available in UBB8 configurations with both air-cooled and direct liquid-cooled options, aligning with data center deployment requirements.

Multinode Throughput Surpasses 1 Million Tokens per Second

A key result from this round is AMD surpassing 1 million tokens per second at the cluster scale. Using Instinct MI355X GPUs, AMD achieved this threshold on Llama 2 70B in both Server and Offline scenarios, and on GPT-OSS-120B in Offline.

AMD MLPerf 1M tokens per second graphic

These results reflect a shift toward evaluating inference performance at the cluster level rather than per accelerator. Aggregate throughput and time-to-serve are increasingly used to determine production readiness for large-scale AI deployments.

AMD also demonstrated efficient scaling. On Llama 2 70B, a configuration of 11 nodes and 87 GPUs reached over 1 million tokens per second across Offline, Server, and Interactive scenarios, with scale-out efficiency ranging from 93% to 98%. On GPT-OSS-120B, a 12-node, 94-GPU cluster achieved similar throughput with over 90% scaling efficiency. These results indicate that performance gains translate effectively as deployments expand beyond a single system.

Generational Gains and Competitive Single-Node Performance

AMD reported a 3.1x performance increase on Llama 2 70B Server compared to the prior Instinct MI325X generation, reaching 100,282 tokens per second. The improvement reflects both architectural changes and ROCm software optimizations. Offline scores improved by 4.4x and Server scores improved by 4.8x compared to prior rounds. These gains are primarily attributed to FP4 quantization.

AMD Inference results vs previous gen graphic

In single-node comparisons, MI355X demonstrated competitive positioning against NVIDIA platforms. On Llama 2 70B, AMD matched NVIDIA B200 in Offline throughput, reached near parity in Server performance, and exceeded Interactive performance. Against Nvidia’s B300, AMD’s GPU delivers 92% in offline mode, 93% in server mode, and exceeds it with 104% in interactive mode.

First-Time Model Enablement Expands Coverage

MLPerf Inference v6.0 includes several new workloads, and AMD used this round to demonstrate rapid model enablement. GPT-OSS-120B, a mixture-of-experts model, was introduced for the first time and achieved competitive results compared to NVIDIA systems across both Offline and Server scenarios.

AMD also submitted results for Wan-2.2 text-to-video generation, marking its entry into multimodal and generative video inference. While the official submission focused on Single Stream latency, the results were competitive with those of existing platforms. Post-submission tuning further improved performance, indicating headroom for optimization as software matures.

These additions highlight AMD’s focus on expanding beyond traditional LLM benchmarks to support emerging AI workloads.

ROCm Software Enables Scaling and Heterogeneous Inference

AMD attributes much of the performance and scalability to its ROCm software stack. Enhancements include optimized FP4 execution, improved GPU-to-GPU communication for distributed inference, and support for dynamic workload distribution across heterogeneous environments.

AMD MLPerf inference results instinct mI355x graphic

The initial MLPerf heterogeneous submission was developed using three AMD Instinct GPU models: MI300X, MI325X, and MI355X. Submitted by Dell and MangoBoost, the configuration achieved 141,521 tokens per second on Llama 2 70B Server and 151,843 tokens per second on Llama 2 70B Offline.

Worth noting, the AMD Instinct MI355X platform was located in Dell’s lab in the United States, while the Instinct MI300X and MI325X platforms were in Korea. This demonstrates the capability to coordinate systems across different geographic locations.

Ecosystem Growth and Reproducibility

AMD’s partner ecosystem expanded in this MLPerf round, with nine companies submitting results across multiple Instinct GPU generations. Participating vendors included Cisco, Dell, Giga Computing, HPE, MangoBoost, MiTAC, Oracle, Supermicro, and Red Hat.

Partner submissions closely matched AMD’s internal results, typically within 4% and, in some cases, within 1%. This consistency indicates that performance is reproducible across OEM and cloud platforms, reducing deployment risk and improving confidence in real-world outcomes.

The post AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROCm Stack appeared first on StorageReview.com.

Dell Technologies Enhances PowerProtect Portfolio for Improved Cyber Resilience

2 April 2026 at 11:45
Dell PowerProtect open top view Dell PowerProtect open top view

Dell Technologies has announced several updates to its PowerProtect portfolio, focusing on management simplicity, integrated artificial intelligence, and expanded hardware options for mid-sized environments. These enhancements target the growing complexity of distributed workloads across on-premises, edge, and cloud infrastructure.

PowerProtect Data Manager Evolution

The latest iteration of PowerProtect Data Manager introduces a unified dashboard that centralizes visibility across distributed systems. This interface consolidates monitoring into a single view to reduce operational overhead and provide a clearer picture of protection status across the enterprise.

Dell is also integrating a new AI Assistant directly into the Data Manager UI. This tool provides contextual guidance and intelligent navigation to assist administrators in troubleshooting and optimizing configurations. By offering proactive recommendations, the assistant aims to accelerate problem resolution and simplify compliance auditing processes.

Advanced Anomaly Detection and Security

Security capabilities within Data Manager now include expanded anomaly detection for Dell PowerStore snapshots. This feature is designed to identify potential ransomware threats early in the data lifecycle. All anomaly signals from workloads, storage, and protection policies are now aggregated on a dedicated landing page in the UI, enabling faster response to irregularities.

To meet evolving regulatory requirements and NIST standards, the PowerProtect Data Domain Operating System now supports TLS 1.3. This update ensures that the underlying infrastructure remains compliant with modern security protocols and encrypted communication standards.

PowerProtect Data Domain DD3410 Appliance

The PowerProtect Data Domain DD3410 is now available, targeting medium-sized businesses and remote office/branch office (ROBO) locations. Occupying a 2U footprint, the appliance scales from 8TB to 40TB of usable capacity. It is designed for low power and cooling requirements while maintaining the security features found in larger Data Domain models.

Dell PowerProtect open top view

 

The DD3410 supports both traditional and modern workloads and integrates natively with PowerStore for streamlined backup and recovery. Furthermore, the appliance is now supported as a vault target within the PowerProtect Cyber Recovery ecosystem, bringing enterprise-grade vaulting to smaller sites.

Storage Efficiency and Cyber Recovery

Data Domain continues to lead in storage efficiency, with real-world telemetry from Data Manager users indicating data reduction ratios as high as 75:1. This level of deduplication significantly lowers the total cost of ownership by reducing the physical storage footprint required for long-term retention.

For organizations deploying Cyber Recovery and CyberSense, Dell has introduced Cyber Recovery Essentials. This offering provides pre-validated reference architectures and standardized configurations to accelerate deployment. Additionally, the portfolio now includes enhanced analytics support for Oracle RAC with ASM, broadening the scope of protected database environments.

The post Dell Technologies Enhances PowerProtect Portfolio for Improved Cyber Resilience appeared first on StorageReview.com.

NVIDIA Sets MLPerf Inference v6.0 Records with Blackwell Ultra Platform

1 April 2026 at 18:35
NVIDIA MLPerf v6 graphic NVIDIA MLPerf v6 graphic

NVIDIA has published results for MLPerf Inference v6.0, highlighting system-level gains driven by tight co-design across hardware, software, and models. The company positions inference throughput and token economics as the primary metrics for AI factory performance, moving beyond peak accelerator specifications to measured output under real workloads.

In this round, systems built on NVIDIA Blackwell Ultra GPUs delivered the highest throughput across all submitted models and scenarios. The ecosystem around the platform also expanded, with 14 partners submitting results, including major OEMs, cloud providers, and integrators such as ASUS, Cisco, CoreWeave, Dell Technologies, GigaComputing, Google Cloud, HPE, Lenovo, Nebius, Netweb Technology, QCT, Red Hat, Supermicro, and Lambda.

Expanded Benchmark Coverage Reflects Emerging Workloads

MLPerf Inference v6.0 introduces several new benchmarks to represent current AI deployments better. NVIDIA was the only vendor to submit across all new tests, spanning large language models, multimodal systems, generative video, and recommendation engines.

Key additions include DeepSeek-R1 Interactive, which evaluates higher interactivity with faster token delivery and reduced time to first token compared to prior server scenarios. The suite also adds Qwen3-VL-235B-A22B, marking the first multimodal vision-language model in MLPerf Inference, and GPT-OSS-120B, a mixture-of-experts reasoning model tested across offline, server, and interactive scenarios.

Scenario DeepSeek-R1 GPT-OSS-120B Qwen3-VL Wan 2.2 DLRMv3
Offline 2,494,310 tokens/sec* 1,046,150 tokens/sec 79 samples/sec 0.059 samples/sec 104,637 samples/sec
Server 1,555,110 tokens/sec* 1,096,770 tokens/sec 68 queries/sec 21 secs**
(Single Stream)
99,997 queries/sec
Interactive 250,634 tokens/sec 677,199 tokens/sec *** *** ***

* Not a new scenario in MLPerf Inference v6.0
** Wan 2.2 features a single stream scenario, which measures end-to-end request latency, instead of a server scenario. Lower is better.
*** Not tested in MLPerf Inference v6.0

Generative media and recommendation workloads are now included. The Wan 2.2 text-to-video model features both latency-sensitive and throughput-focused tests, while DLRMv3 replaces previous recommendation benchmarks with a transformer-based architecture that boosts compute intensity and model complexity.

Software Optimization Drives Measurable Gains

A notable aspect of this submission is the performance uplift achieved on existing hardware through software updates. NVIDIA reports up to 2.7x higher token throughput on the GB300 NVL72 platform for DeepSeek-R1 server scenarios compared to results from six months prior. This improvement translates to materially lower cost per token and higher utilization of deployed infrastructure.

NVIDIA MLPerf v6 graphic

These gains are attributed to updates in the TensorRT-LLM stack and associated frameworks. Kernel-level optimizations and fusion techniques reduce execution overhead, while improved attention data parallelism more effectively balances workloads across GPUs. Additional enhancements in the Dynamo distributed inference framework enable disaggregated serving, allowing independent optimization of prefill and decode phases.

For mixture-of-experts models, techniques like Wide Expert Parallel distribute expert weights across GPUs to reduce memory bottlenecks. Multi-token prediction boosts compute efficiency in low-batch, latency-sensitive scenarios by generating and validating multiple tokens at once. KV-aware routing further enhances scheduling by directing inference requests based on estimated compute costs.

Benchmark GB300 NVL72
v5.1
GB300 NVL72
v6.0
Speedup
DeepSeek-R1
(Server)
2,907 tokens/sec/gpu 8,064 tokens/sec/gpu 2.77x
DeepSeek-R1
(Offline)
5,842 tokens/sec/gpu 9,821 tokens/sec/gpu 1.68x
Llama 3.1 405B
(Server)
170 tokens/sec/gpu 259 tokens/sec/gpu 1.52x
Llama 3.1 405B
(Offline)
224 tokens/sec/gpu 271 tokens/sec/gpu 1.21x

NVIDIA also demonstrated continued scaling on established models. On Llama 3.1 405B, the GB300 NVL72 platform achieved a 1.5x performance increase in server scenarios, indicating ongoing optimization for dense LLMs alongside newer architectures.

Open Ecosystem and Framework Integration

Submissions across new workloads leveraged a mix of NVIDIA and open-source frameworks. The Qwen3-VL benchmark used the vLLM framework, reflecting the rapid development in multimodal inference optimization. The Wan 2.2 text-to-video results were powered by TensorRT-LLM VisualGen, targeting diffusion-based pipelines on GPUs.

For DLRMv3, NVIDIA combined its recsys-example framework with GPU-accelerated embedding lookup technologies to handle the increased demands of transformer-based recommendation models. These integrations underscore the role of the broader software ecosystem in extracting performance from the underlying hardware.

Scale-Out Performance with InfiniBand

NVIDIA also showcased large-scale inference performance using four GB300 NVL72 systems connected via Quantum-X800 InfiniBand. This setup, with a total of 288 Blackwell Ultra GPUs, marks the largest MLPerf Inference submission to date and achieved system-level throughput of millions of tokens per second on DeepSeek-R1.

DeepSeek-R1 | 4x GB300 NVL72 Tokens/Second
Offline 2,494,310
Server 1,555,110

The results highlight the importance of high-performance interconnects in scaling inference workloads, particularly for distributed LLM serving and high-throughput batch processing.

Toward Service-Level Benchmarking

Looking ahead, NVIDIA is helping develop the MLPerf Endpoints within the MLCommons consortium. This upcoming benchmark aims to measure deployed inference services using real API traffic, giving insight into latency, throughput, and efficiency at the service level rather than just at the component level.

As AI workloads develop into agentic systems with longer context windows, benchmarks that measure end-to-end service performance are expected to become more important for both cloud providers and enterprise deployments.

The post NVIDIA Sets MLPerf Inference v6.0 Records with Blackwell Ultra Platform appeared first on StorageReview.com.

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

25 March 2026 at 18:35

WEKA announced integration of its NeuralMesh platform with the NVIDIA STX reference architecture, positioning its Augmented Memory Grid as a core component for next-generation AI infrastructure. The combined solution targets one of the primary constraints in large-scale inference environments: memory limitations that impact performance, cost, and scalability.

Running on NeuralMesh, WEKA’s Augmented Memory Grid extends GPU memory by externalizing and persisting key-value cache. In NVIDIA STX deployments, this architecture supports high-throughput context memory storage for agentic AI workloads, enabling long-context reasoning across sessions, tools, and workflows. The company states that configurations leveraging NVIDIA Vera Rubin NVL72 systems, BlueField-4 DPUs, and Spectrum-X Ethernet can increase context memory token throughput by 4x to 10x. The platform is also expected to deliver at least 320 GB/s read and 150 GB/s write throughput, more than doubling the performance of conventional AI storage systems.

NVIDIA Vera Rubin NVL72 top open

Memory Infrastructure Becomes the Inference Bottleneck

WEKA frames the integration around a growing constraint in AI deployments: the memory wall. In modern inference pipelines, limited high-bandwidth memory on GPUs leads to frequent KV cache evictions, resulting in recomputation and reduced efficiency. As concurrency increases, these inefficiencies compound, driving higher infrastructure costs and reducing system predictability.

The company advocates for shared KV cache infrastructure as a solution. By maintaining persistent context across users and sessions, shared cache eliminates redundant computation and stabilizes token throughput. NVIDIA STX provides a reference architecture for implementing this model, with WEKA supplying the storage and memory extension layer.

NeuralMesh and Augmented Memory Grid Architecture

NeuralMesh serves as WEKA’s distributed storage platform, designed to operate across the full NVIDIA STX stack. The system provides high-performance data services tailored for AI workloads, while the Augmented Memory Grid functions as a dedicated memory extension layer that pools KV cache outside GPU memory.

WEKA Augmented Memory Grid graphic

This approach allows inference environments to maintain long-context sessions without exhausting GPU resources. By preserving cache state and enabling reuse across workloads, the platform aims to sustain high utilization and consistent performance as deployments scale.

WEKA reports that Augmented Memory Grid, initially introduced at GTC 2025 and now generally available, has been validated on NVIDIA Grace CPU platforms with BlueField DPUs. The architecture delivers measurable improvements in inference efficiency, including significantly faster time-to-first-token, higher token throughput per GPU, and sustained performance as concurrency increases. Offloading the storage data path to BlueField-4 further reduces CPU overhead and minimizes I/O bottlenecks.

Performance and Efficiency Gains

In production-aligned environments, the platform is designed to improve responsiveness and infrastructure efficiency. WEKA indicates that Augmented Memory Grid can reduce time-to-first-token by up to 4x to 20x, while increasing token output per GPU by up to 6.5x. These gains are driven by higher KV cache hit rates and reduced recomputation, allowing systems to maintain performance as context windows and user counts grow.

Firmus, an AI infrastructure provider, is cited as an early adopter using NeuralMesh with NVIDIA-aligned infrastructure. The company reports improved token throughput and reduced latency at scale, attributing the gains to more efficient use of existing GPU resources rather than an expansion of the hardware footprint.

Implications for AI Infrastructure Design

The integration underscores a shift in AI system architecture, in which memory and storage design increasingly dictate overall performance and cost efficiency. As agentic AI workloads expand and context windows grow, DRAM-only approaches become less viable due to escalating recomputation overhead and underutilized GPUs.

WEKA positions persistent, shared KV cache as a foundational capability for AI factories. Organizations that adopt this model can maintain higher GPU utilization, reduce energy consumption per inference task, and achieve more predictable scaling characteristics. Conversely, environments that continue to rely solely on local GPU memory are likely to face increasing operational costs and diminishing returns as workloads scale.

Availability

WEKA’s Augmented Memory Grid is generally available as part of the NeuralMesh platform.

The post WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks appeared first on StorageReview.com.

WEKA Announces General Availability of NeuralMesh AIDP

25 March 2026 at 18:30
WEKA NeuralMesh Dashboard graphic WEKA NeuralMesh Dashboard graphic

WEKA has announced the general availability of its NeuralMesh AI Data Platform, an enterprise-focused, composable infrastructure designed for AI factory deployments. Built on the NVIDIA AI Data Platform reference architecture, NeuralMesh AIDP provides an integrated stack that delivers AI-ready data to production environments, with a focus on accelerating time-to-deployment for large-scale AI applications.

The platform targets a persistent challenge in enterprise AI adoption. While many organizations complete proof-of-concept projects, scaling those implementations into production often exposes limitations in data infrastructure, performance consistency, and operational complexity. NeuralMesh is designed to address this transition by providing a unified data platform optimized for both early-stage experimentation and production-scale workloads.

Architecture and Design

NeuralMesh is built on more than a decade of AI-native storage development and incorporates over 170 patents. The platform is designed to scale efficiently into exabyte-class environments while maintaining performance and resilience. WEKA positions NeuralMesh as an adaptive system that improves with increasing deployments, particularly in distributed AI environments where data access patterns and concurrency increase over time.

WEKA NeuralMesh Dashboard graphic

At the core of the platform is WEKA’s Augmented Memory Grid, which extends GPU memory by externalizing and persisting inference context. This architecture enables higher utilization of GPU resources by reducing redundant computation and maintaining context across sessions. In inference workloads, the company reports up to 6.5 times more tokens per GPU compared to traditional storage approaches, reflecting improved efficiency in handling large context windows and concurrent workloads.

Integration with NVIDIA AI Infrastructure

NeuralMesh AIDP is aligned with NVIDIA’s AI Data Platform, enabling tight integration with GPU, networking, and data processing technologies. The platform is designed to support continuous data pipelines and persistent inference context, which are critical for production agentic AI systems. NVIDIA highlights the importance of a persistent context layer to maintain stability and performance as inference workloads scale.

The solution is available as a pre-integrated platform that includes validated configurations with NVIDIA RTX 6000 PRO and RTX 4500 PRO Server Edition GPUs, along with ecosystem integrations from vendors such as Red Hat, Spectro Cloud, and Supermicro. This appliance-style delivery model reduces deployment complexity and accelerates time to production.

AI Factory Enablement

WEKA positions NeuralMesh AIDP as the infrastructure for AI factories, in which data ingestion, processing, and inference operate as continuous, interconnected workflows. These environments require more than raw storage capacity. They depend on sustained data movement, context persistence, and predictable performance under load.

NeuralMesh is designed to support these requirements by enabling a continuous data loop between storage and compute resources. This allows organizations to maintain high GPU utilization while supporting dynamic, multi-stage AI pipelines that include training, fine-tuning, and inference.

Pre-Built AI Workloads

The platform includes preconfigured pipelines for a range of AI applications, allowing organizations to deploy common workloads without extensive integration. These include semantic search, video search and summarization, AlphaFold workflows for drug discovery, and agentic retrieval-augmented generation use cases.

In production environments, NeuralMesh is being applied across multiple sectors. In healthcare and life sciences, it supports large-scale data analysis workflows such as identifying patient cohorts and processing cryo-electron microscopy datasets. In financial services, it enables analysis of market signals and secure access to institutional knowledge. Public sector deployments focus on contextual threat detection and automated evidence synthesis. In robotics and physical AI, the platform reduces the time between data collection and model retraining, improving system responsiveness and deployment cycles.

Availability

NeuralMesh AI Data Platform is available immediately as an appliance-style solution.

The post WEKA Announces General Availability of NeuralMesh AIDP appeared first on StorageReview.com.

ASRock Rack Unveils Liquid-Cooled AI Systems Built Around NVIDIA Rubin and Blackwell at GTC 2026

23 March 2026 at 19:35
NVIDIA HGX Rubin NVL8 NVIDIA HGX Rubin NVL8

ASRock Rack used NVIDIA GTC 2026 to introduce a broader range of liquid-cooled AI platforms for high-density enterprise and data center deployments. The announcement centered on new systems based on the NVIDIA HGX Rubin NVL8 platform, as well as NVIDIA MGX-based servers designed for the NVIDIA RTX PRO 4500 Blackwell Server Edition and liquid-cooled RTX PRO 6000 Blackwell Server Edition GPUs.

The launch reflects a broader infrastructure shift as AI accelerators increase rack power and thermal demand. For training, inference, and HPC environments, cooling is increasingly a system-level design constraint rather than a secondary consideration. ASRock Rack is positioning liquid cooling as a practical way to sustain performance under continuous utilization while improving thermal efficiency and enabling denser deployments.

Rubin-based Systems Target Enterprise AI Factories

At the top of the new portfolio are the 2U16X-GNR2/DLC RUBIN and 4U16X-GNR2/DLC RUBIN, both built on the NVIDIA HGX Rubin NVL8 platform. ASRock Rack is targeting these systems for enterprise-scale AI factory deployments and HPC environments that require sustained accelerator performance and predictable thermal performance under heavy load.

NVIDIA HGX Rubin NVL8

To complement the node-level systems, the company also showed a 44RU liquid-cooled rack configuration populated with eight 4U16X-TURIN2 systems. The rack is presented as a more complete infrastructure building block for organizations looking to accelerate deployment of Rubin-based AI capacity rather than assembling rack-scale liquid-cooled infrastructure independently.

Blackwell Systems Address Enterprise AI and Visual Computing

ASRock Rack also expanded its portfolio to include platforms for enterprise AI, data processing, and visual computing. The 6UXGM-GNR2/DLC supports up to 8 liquid-cooled NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, targeting environments where both GPU density and thermal control are critical.

For customers seeking a more flexible, potentially lower-power configuration, the 4UXGM-GNR2 CX8 supports NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs. ASRock Rack positions the platform for AI, video, visual computing, and data-intensive workloads, with an emphasis on single-slot efficiency.

Liquid Cooling Moves Deeper into Mainstream AI Infrastructure

ASRock Rack framed the launch around the growing importance of liquid-cooled designs in AI infrastructure. Company president Weishi Sa said AI platforms are being reshaped by the need for higher compute density and sustained utilization, making liquid cooling increasingly important for large-scale deployments. In practical terms, ASRock Rack is arguing that liquid-cooled architectures are no longer limited to niche HPC installations and are becoming part of mainstream enterprise AI infrastructure planning.

Vera-based CPU systems broaden the roadmap

In addition to its GPU-centric platforms, ASRock Rack disclosed CPU-based systems powered by NVIDIA’s Vera architecture. It explained that these systems are intended for emerging agentic AI and reinforcement learning workloads at data center scale.

According to ASRock Rack, Vera combines custom CPU cores, LPDDR5X memory, and NVIDIA Scalable Coherency Fabric to improve efficiency and increase AI factory throughput. While the announcement was light on platform specifics, the inclusion of Vera-based systems suggests that ASRock Rack is aligning its roadmap with a broader NVIDIA stack that spans both accelerated and CPU-side infrastructure for next-generation AI deployments.

The post ASRock Rack Unveils Liquid-Cooled AI Systems Built Around NVIDIA Rubin and Blackwell at GTC 2026 appeared first on StorageReview.com.

IBM and NVIDIA Announce Expanded Partnership to Operationalize Enterprise AI

23 March 2026 at 19:34

At GTC 2026, IBM and NVIDIA announced a significant expansion of their more than decade-long partnership, focusing on moving AI from pilot phases to full-scale production. The collaboration targets several critical bottlenecks in enterprise AI adoption, including GPU-native data analytics, intelligent document processing, and infrastructure for regulated environments. The joint effort aims to provide a unified stack of data foundations, hardware, and consulting expertise to help organizations manage fragmented data and compliance requirements.

IBM Chairman and CEO Arvind Krishna noted that the next phase of enterprise AI depends on the tight integration of data, infrastructure, and orchestration. He stated that the partnership is designed to provide the necessary components for businesses to transition from experimentation to operational deployment. NVIDIA founder and CEO Jensen Huang highlighted that by integrating CUDA acceleration into the data layer, the companies are attempting to turn traditional data processing into real-time intelligence engines.

Accelerating Structured Data Analytics via cuDF and Presto

A primary technical focus of the announcement is the integration of the NVIDIA cuDF library with the Presto SQL engine in IBM watsonx.data. This open-source integration enables GPU-accelerated query execution on massive datasets, significantly reducing the time and cost of extracting intelligence from structured data.

The companies validated this approach in a production environment with Nestlé’s Order-to-Cash data mart, which processes terabytes of data across 44 tables globally. On traditional CPU infrastructure, a single data refresh took 15 minutes and was limited to a few cycles per day. By moving to the GPU-accelerated watsonx.data engine, Nestlé reported a reduction in query runtime to three minutes. This represents a 30X improvement in price-performance and an 83 percent reduction in costs. Nestlé’s Chief Information and Digital Officer, Chris Wright, indicated that this capability enables faster decision-making in manufacturing and warehousing by providing near-real-time operational data.

Unlocking Unstructured Data with Docling and Nemotron

To address the challenge of data trapped in unstructured formats such as SharePoint sites and CMS systems, IBM and NVIDIA introduced a joint solution leveraging IBM Docling and NVIDIA Nemotron open models. Docling is designed to standardize and convert complex documents into AI-ready formats while maintaining traceability to the source.

IBM docling logo

When paired with NVIDIA Nemotron models, the system accelerates the ingestion of multi-modal content. Early testing indicates significantly higher throughput compared to existing open-source models. This approach is intended to help enterprises build a trusted data foundation for AI by making internal knowledge bases more accessible and easier to standardize for automated reasoning.

Nvidia nemotron image

GPU-Optimized Infrastructure and Sovereign AI

The partnership also extends deep into the storage and infrastructure layers. NVIDIA has selected the IBM Storage Scale System 6000 to provide 10 PB of high-performance storage for its GPU-native advanced analytics engines. The Storage Scale 6000 is certified and validated for NVIDIA DGX platforms, combining IBM’s parallel throughput capabilities with NVIDIA’s GPU data pipelines to eliminate I/O bottlenecks.IBM Storage Scale system rack front facing

Organizations with strict data residency and regulatory requirements are exploring the integration of IBM Sovereign Core with NVIDIA infrastructure. This initiative aims to enable GPU-intensive AI workloads to run entirely within specific regional borders, ensuring governance and compliance standards are met without sacrificing compute performance.

Expanding the AI Stack with Blackwell and Red Hat

IBM announced plans to introduce NVIDIA Blackwell Ultra GPUs to the IBM Cloud in early Q2 2026. These resources will be targeted at large-scale model training and high-throughput inference. The Blackwell architecture will also be integrated into the Red Hat AI Factory, alongside NVIDIA and VPC servers, to provide enterprise-grade controls over data residency.

nvidia dgx racks

Furthermore, IBM Consulting will offer the Red Hat AI Factory with NVIDIA through the IBM Consulting Advantage platform. This move is intended to streamline the process of data preparation, model creation, and deployment. By combining these technologies, IBM and NVIDIA are attempting to provide a more seamless path for enterprises to scale AI across diverse technology environments while maintaining oversight and performance.

The post IBM and NVIDIA Announce Expanded Partnership to Operationalize Enterprise AI appeared first on StorageReview.com.

VDURA Introduces RDMA and Context-Aware Tiering for AI Data Platforms at GTC 2026

23 March 2026 at 19:31
VDURA Global Namespace VDURA Global Namespace

During GTC 2026, VDURA showcased updates to its Data Platform that improve GPU utilization and storage efficiency in AI environments. The announcement includes the general availability of Remote Direct Memory Access (RDMA), a preview of its Context-Aware Tiering technology, and validated infrastructure setups based on AMD EPYC Turin CPUs and NVIDIA ConnectX-7 networking.

The updates aim to eliminate data movement bottlenecks between GPU clusters and storage and to optimize data placement across storage tiers for large-scale AI training and inference workloads.RDMA Enables GPU-Direct Data Paths

VDURA has added RDMA support across its platform, allowing GPU servers to access storage directly over the network without CPU involvement. This enables GPU-to-storage data transfers that bypass traditional kernel and CPU-mediated paths, reducing latency and increasing throughput.

VDURA Global Namespace

The implementation integrates with VDURA DirectFlow, the company’s data movement layer, to ensure all GPU server traffic uses RDMA. By eliminating CPU overhead in the data path, compute resources remain dedicated to model training and inference tasks. This approach is intended to sustain higher GPU utilization rates while minimizing pipeline latency in distributed AI clusters.

Context-Aware Tiering Targets Data Placement Efficiency

VDURA also detailed the first phase of its Context-Aware Tiering capability, scheduled for release later this year. This feature introduces automated data placement across storage tiers based on workload behavior and access patterns.

The initial phase extends the DirectFlow buffer into local NVMe SSDs, allowing frequently accessed data to reside closer to compute resources. This reduces dependency on shared or network-attached storage for hot data and improves response times for active workloads.

The platform also introduces KVCache writeback controls, which selectively persist only critical inference data to durable storage. This reduces unnecessary write activity while maintaining persistence guarantees required by production inference pipelines.

Additionally, VDURA is implementing a unified Context Cache Tiering framework that spans DRAM and local SSD. This enables high-speed read and write access aligned with LMCache-class performance, supporting use cases such as long-context LLM inference and retrieval-augmented generation.

VDURA indicated that future phases of Context-Aware Tiering will expand into application-aware data placement, improved cache coherence across nodes, and support for emerging infrastructure components such as NVIDIA BlueField-4 DPUs.

The company also introduced optimized platform configurations combining AMD EPYC Turin processors with NVIDIA ConnectX-7 network adapters. These configurations are designed to complement RDMA-enabled data paths and support high-throughput, low-latency communication between GPU clusters and storage systems.

Full-Stack AI Data Pipeline Focus

VDURA CEO Ken Claffey highlighted the company’s AI storage platform, which spans the entire data hierarchy from memory to long-term storage, and emphasized its performance. He said the platform uses RDMA for direct, CPU-free data access and features Context-Aware Tiering to position data across storage tiers. Claffey noted that these innovations help organizations support larger models, handle more inference requests, and scale AI infrastructure while meeting production AI reliability requirements.

The combined approach is intended to support larger model sizes, increase inference throughput, and improve infrastructure efficiency while maintaining reliability requirements for production AI deployments.

Availability

RDMA is now available on the VDURA V5000 and V7000 platforms. Context-Aware Tiering Phase 1 is expected to reach general availability later in 2026, with early access programs currently underway.

The post VDURA Introduces RDMA and Context-Aware Tiering for AI Data Platforms at GTC 2026 appeared first on StorageReview.com.

HPE Introduces AI Grid to Connect AI Factories and Distributed Inference Clusters Using NVIDIA Reference Architecture

18 March 2026 at 15:18
HPE AI Grid image HPE AI Grid image

HPE has announced the HPE AI Grid, a comprehensive infrastructure solution aligned with the NVIDIA AI Grid reference architecture. It is designed to securely connect AI factories and distributed inference clusters across regional and remote edge locations. HPE positions this platform for service providers that need to deploy and manage thousands of distributed inference sites as a single, coordinated system.

HPE AI Grid image

HPE introduces the AI Grid as a solution for AI-native applications that increasingly require predictable latency, deterministic performance, and distributed deployment. The company claims that the solution delivers ultra-low latency at scale, with zero-touch provisioning, integrated orchestration, and automated security to make lifecycle management easier across large, geographically dispersed deployments.

Rami Rahim, EVP, President, and GM of Networking at HPE, described the strategy as bringing intelligence closer to where data is generated and used, with network infrastructure serving as a key enabler for real-time AI services. NVIDIA’s Chris Penrose, Global Vice President of Telco, emphasized the importance of an AI Grid in connecting geographically dispersed clusters and dynamically assigning workloads based on performance, cost, and latency, with HPE providing multicloud routing and edge infrastructure and NVIDIA delivering accelerated compute and networking components.

Full-stack hardware and networking foundation

HPE states that the HPE AI Grid offers a unified hardware and software platform designed to support service-provider operational models, including multi-tenancy and cloud-native security. The architecture is centered on HPE Juniper capabilities for telco-grade networking, including multicloud routing and coherent optics for long-haul and metro connectivity. HPE also highlights integrated firewalls, WAN automation, and orchestration to enable zero-touch deployment and continuous lifecycle management.

HPE ProLiant Compute DL380a Gen12

On the compute side, HPE is pairing edge and rack servers with NVIDIA accelerated computing and a high-performance networking and I/O stack. HPE lists support for NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, as well as NVIDIA BlueField DPUs, Spectrum-X Ethernet switches, and ConnectX SuperNICs. The stack also includes NVIDIA AI blueprints to accelerate the deployment of inference services across distributed sites.

Service provider focus

HPE is positioning AI Grid for applications needing predictable latency and reliable connectivity, including retail personalization, predictive maintenance, edge healthcare, and carrier-grade AI services. The company states the platform can help operators turn sites with existing power and connectivity into RAN-ready AI grid nodes, effectively broadening where inference can occur without treating each location as a separate deployment.

Field trials and partner ecosystem

As part of the broader AI Grid messaging, Comcast announced new AI field trials on its distributed network focused on real-time edge inferencing. HPE stated that early trials included HPE ProLiant servers running small language models from Personal AI (part of HPE’s Unleash AI partner program) on NVIDIA GPUs for AI-driven “front desk” services aimed at small businesses.

The post HPE Introduces AI Grid to Connect AI Factories and Distributed Inference Clusters Using NVIDIA Reference Architecture appeared first on StorageReview.com.

HPE Expands NVIDIA AI Computing Portfolio with Scalable Private Cloud AI and Blackwell GPU Integration

18 March 2026 at 15:18
HPE ProLiant Compute DL380a Gen12 HPE ProLiant Compute DL380a Gen12

HPE has announced a significant expansion of the NVIDIA AI Computing by HPE portfolio, introducing integrated systems designed to scale enterprise AI deployments while maintaining security and governance. The update focuses on co-engineered, validated architectures intended to accelerate time-to-value for AI inferencing and model development.

HPE CEO Antonio Neri and NVIDIA CEO Jensen Huang positioned the collaboration as a new standard for enterprise AI infrastructure. The partnership centers on developing AI factories and grids that leverage HPE’s leadership in private cloud and networking to embed intelligence across enterprise workflows.

Scaling HPE Private Cloud AI and Security

HPE Private Cloud AI, a turnkey enterprise AI factory, now includes network expansion racks that allow deployments to scale up to 128 GPUs. This expansion provides a consistent operational experience for demanding AI workloads. For organizations requiring strict data isolation, a new air-gapped configuration is available to ensure sensitive data remains disconnected from external networks.

 HPE ProLiant Compute DL380a Gen12

Security enhancements include the certification of HPE ProLiant Compute DL380a Gen12 servers for Fortanix Confidential AI. This solution uses NVIDIA Confidential Computing to process sensitive data on-premises without exposing it. Additionally, CrowdStrike is providing agentic security for the platform, offering AI-powered threat detection for infrastructure, models, and autonomous agents.

Software Blueprints and Blackwell GPU Support

The platform includes the latest NVIDIA AI Enterprise software and specialized blueprints. The NVIDIA AI-Q blueprint enables developers to build and control customizable AI agents, while the NVIDIA Omniverse blueprint supports digital-twin development. HPE is also updating its ProLiant servers and AI factories to support NVIDIA Nemotron open models, simplifying the deployment of production-ready agentic workflows.

NVIDIA RTX Pro 6000

Hardware updates include the availability of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs across all HPE Private Cloud AI and AI factory configurations.

Specialized Solutions for Edge and Industry Use Cases

HPE introduced new multi-workload solutions co-designed with NVIDIA for retail, medical research, and manufacturing. These stacks combine HPE ProLiant servers with NVIDIA Spectrum-X Ethernet, BlueField DPUs, and ConnectX NICs. The architectures incorporate NVIDIA CUDA-X libraries, Multi-Instance GPU (MIG), and vGPU technologies, managed via HPE Compute Ops Management.

Nvidia spectrum x

To support edge intelligence and small-language models, HPE is adding the NVIDIA RTX PRO 4500 Blackwell Server Edition GPU to the ProLiant portfolio. This includes integration with the NVIDIA Retail Shopping Assistant Blueprint to streamline sector-specific deployments.

High-Scale Networking and AI Data Pipelines

At NVIDIA GTC, HPE introduced networking solutions using HPE Juniper Networking routers and coherent optics to connect distributed AI deployments. The company also expanded its at-scale AI factories with systems built on the NVIDIA Vera Rubin architecture.

HPE Alletra Storage MP X10000

To address performance bottlenecks in AI data lifecycles, HPE is evolving the HPE Alletra MP X10000. The X10000 is the first object-based system to achieve NVIDIA-Certified Storage Foundation-level validation, confirming its ability to efficiently feed data to up to 128 GPUs for high-throughput training and low-latency inference. HPE also confirmed support for the NVIDIA STX rack-scale reference architecture for future storage solutions powered by Vera Rubin and BlueField-4.

Services and Financing

HPE Services launched an agents hub to help organizations adopt agentic AI with NVIDIA Nemotron models. Additionally, a new blueprint developed with Protopia AI enables trustworthy, multi-tenant AI factories for regulated environments. To assist with modernization costs, HPE Financial Services introduced the 90/9 Advantage program (no payments for 90 days, then 1% for the following 9 months) across the compute and networking portfolios.

Availability

  • NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs for HPE ProLiant: Q1 and Q2 2026.
  • HPE Private Cloud AI (Air-gapped, RTX PRO 6000 support, AI-Q/Omniverse blueprints): Available now.
  • Network expansion racks (up to 128 GPUs): July 2026.
  • HPE and Protopia secure blueprint: Q2 2026.
  • Fortanix support for HPE ProLiant DL380a Gen12: Q3 2026.

The post HPE Expands NVIDIA AI Computing Portfolio with Scalable Private Cloud AI and Blackwell GPU Integration appeared first on StorageReview.com.

HPE Cray GX5000 and AI Factory Get NVIDIA Vera Rubin NVL72, Quantum-X800 InfiniBand, and New Blackwell Options

17 March 2026 at 17:39

HPE has unveiled updates to the NVIDIA AI Computing by HPE portfolio to support large-scale AI factories and next-generation supercomputers. The offerings combine compute, GPUs, networking, liquid cooling, software, and services into full‑stack solutions designed for at‑scale and sovereign environments. NVIDIA AI Integrated into HPE Exascale Supercomputing Platform. Argonne National Laboratory, HLRS, Hudson River Trading, and Korea Institute of Science and Technology Information have adopted HPE AI infrastructure integrated with NVIDIA technologies to accelerate scientific and industrial innovation.

HPE Cray Discovery

HPE is extending NVIDIA technologies into its second-generation exascale-class platform, the HPE Cray Supercomputing GX5000, which unifies AI and HPC in a single architecture. Research labs, sovereign entities, and enterprises are increasingly blending AI models with traditional HPC simulations, and HPE is positioning the platform to meet these converging requirements.

A new option in the lineup is an HPE compute blade based on NVIDIA Vera CPUs. Each liquid‑cooled HPE Cray Supercomputing GX240 Compute blade includes up to 16 NVIDIA Vera CPUs. A full rack supports up to 40 blades, totaling 640 CPUs and 56,320 Arm-compatible cores. The GX240 targets high‑density, high‑performance AI compute installations that require sustained throughput.

HPE is also expanding networking options for large-scale systems, adding support for NVIDIA Quantum‑X800 InfiniBand. These switches provide 144 ports operating at 800 Gb/s and incorporate power-efficiency features, which are increasingly important at exascale.

Trish Damkroger, Senior Vice President and General Manager of HPC and AI Infrastructure Solutions at HPE, said the company’s experience in exascale supercomputing is shaping how it integrates advanced AI workloads with traditional HPC. She noted that the ongoing collaboration with NVIDIA helps customers reach higher performance density and drive progress across fields such as medicine, life sciences, engineering, and manufacturing.

Enhancements to HPE AI Factory for At‑Scale and Sovereign Deployments

Beyond its supercomputing portfolio, HPE is expanding the HPE AI Factory to support service providers, sovereign governments, and large enterprises that are adopting the NVIDIA Vera Rubin and NVIDIA Blackwell platforms.

The next-generation NVIDIA Vera Rubin NVL72 by HPE is the headline addition. Built for frontier‑scale models exceeding one trillion parameters, the rack-scale system is engineered for high-efficiency deployments in neo-cloud environments. It includes 36 NVIDIA Vera CPUs, 72 NVIDIA Rubin GPUs, sixth‑generation NVIDIA NVLink scale-up networking, NVIDIA ConnectX‑9 SuperNICs, and NVIDIA BlueField‑4 DPUs. HPE integrates the platform with its liquid cooling technologies and data center design services to streamline large-scale rollouts.

NVIDIA Bluefield-4 GTC Screencap

HPE is also introducing the HPE Compute XD700, an OCP‑inspired system built on NVIDIA HGX Rubin NVL8. The XD700 is intended for dense AI training and inference, supporting up to 128 Rubin GPUs per rack. The system doubles the GPU density of the previous generation while targeting reductions in space, power, and cooling.

In addition, HPE is broadening access to NVIDIA Blackwell by making the NVIDIA RTX PRO 6000 Blackwell Server Edition GPU available across all HPE AI Factory systems.

Software and Services Updates for Faster AI Deployment

HPE updates extend into the software and services stack that supports large-scale AI factories. The HPE AI Factory portfolio is now endorsed under the NVIDIA Cloud Partner program, which simplifies NVIDIA Cloud Provider certification for cloud operators. This supports faster validation cycles and easier integration for service providers building multi-tenant AI environments.

HPE is also expanding multi-tenancy options by supporting virtual machine GPU passthrough and secure Kubernetes namespaces using NVIDIA Multi‑Instance GPU (MIG). These capabilities are enabled by integration with SUSE Virtualization and SUSE Rancher Prime, giving providers the choice between strict and flexible tenancy models.

Red Hat Enterprise Linux and Red Hat OpenShift remain supported as part of Red Hat AI Enterprise, integrating cleanly with NVIDIA AI Enterprise software for customers standardizing on enterprise Linux.

HPE AI Factory at scale and HPE AI Factory sovereign will also integrate NVIDIA Mission Control software. Mission Control provides a unified operational layer for AI factories, streamlining orchestration via NVIDIA Run:ai and adding monitoring and autonomous recovery features from NVIDIA Dynamo. The intent is to simplify management and improve operational consistency for platform teams running large AI clusters.

NVIDIA Mission Control Dashboard

Chris Marriott, Vice President of Enterprise Platforms at NVIDIA, emphasized that AI development depends on robust infrastructure. He highlighted the joint engineering work between HPE and NVIDIA in accelerated computing, advanced networking, and liquid cooling to support faster insights in large-scale and sovereign deployments.

Availability

NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs are available today in the HPE AI Factory portfolio. Integration of Red Hat Enterprise Linux and Red Hat OpenShift with NVIDIA is also available today.

HPE AI Factory multi‑tenancy and GPU passthrough capabilities will be available in Spring 2026. NVIDIA Mission Control support for HPE AI Factory at scale and sovereign is planned for later in 2026. The NVIDIA Vera Rubin NVL72 by HPE rack‑scale system will be available in December 2026.

The HPE Compute XD700 will be available in early 2027. The HPE Cray Supercomputing GX240 Compute blade with up to 16 NVIDIA Vera CPUs and NVIDIA Quantum‑X800 InfiniBand networking for the HPE Cray Supercomputing GX5000 will also be available in 2027.

The post HPE Cray GX5000 and AI Factory Get NVIDIA Vera Rubin NVL72, Quantum-X800 InfiniBand, and New Blackwell Options appeared first on StorageReview.com.

Lenovo Expands Hybrid AI Advantage with NVIDIA at GTC 2026: New Inference Platforms, Workstations, and Rack-Scale AI Cloud

17 March 2026 at 17:18

At NVIDIA GTC, Lenovo introduced an expanded phase of its Lenovo Hybrid AI Advantage with NVIDIA, positioning the portfolio as an end-to-end path for production AI inferencing across client devices, enterprise infrastructure, and large-scale AI cloud deployments. The announcement centers on accelerating AI adoption, cutting time-to-first-token (TTFT), and improving per-token economics as organizations move from model training toward real-time, decision-oriented AI.

Lenovo framed the update as a continuation of its inferencing acceleration initiatives first discussed at Lenovo Tech World, now extending beyond enterprise deployments to include rack-scale and “gigawatt-scale” AI cloud buildouts. The company’s message is that inferencing is becoming the operational bottleneck and value driver for agentic AI. The increased inference volume elevates the importance of cost control, security, and predictable performance across edge, on-prem, and cloud environments.

Hybrid AI Demand

Lenovo cited the “CIO Playbook 2026” (commissioned by Lenovo and conducted by IDC), noting that 84% of organizations expect to run AI on-premises or at the edge alongside cloud resources. The implication is that hybrid architectures are becoming the default for production inference, driving demand for validated platforms that can be deployed consistently across sites while meeting enterprise requirements for governance, latency, and data locality.

Lenovo Hybrid AI Advantage graphic

Lenovo CEO Yuanqing Yang emphasized that the Lenovo-NVIDIA partnership is intended to push AI from pilot projects into enterprise production and cloud-scale deployments. His focus was on agentic AI, increasing inference workloads, and managing costs while optimizing per-token performance. Lenovo said it is combining NVIDIA AI Enterprise software with its hybrid AI platforms to streamline deployment and improve scaling efficiency.

NVIDIA CEO Jensen Huang similarly framed the moment as an “AI production era,” in which real-time intelligence generation and agentic behaviors require scalable, accelerated computing, software, and infrastructure. NVIDIA’s view is that full-stack platforms will be needed to keep pace as agents evolve to reason, plan, and act.

Device-Side AI

On the client side, Lenovo is extending AI development and inference capabilities to mobile and desktop workstations powered by NVIDIA RTX Pro Blackwell GPUs. Lenovo’s next-generation mobile workstation lineup includes NVIDIA RTX PRO Blackwell laptop GPUs across systems such as the ThinkPad P14s Gen 7, ThinkPad P16s Gen 5, and ThinkPad P1 Gen 9, targeting professionals who need local acceleration for AI workflows and content pipelines.

Lenovo Thinkstation P5 front facing

For fixed workstations, Lenovo highlighted the ThinkStation P5 Gen 2, configurable with up to two NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition GPUs and support for NVIDIA OpenShell, described as a safe, private runtime for autonomous AI agents. Lenovo also pointed to a “ThinkStation PGX” AI developer device designed for secure, private, on-prem development and inference, claiming up to 1 petaflop of AI compute and support for models with up to 200B parameters.

Lenovo also packaged the workstation story with deployment and manageability components. The company described “Lenovo AI Developer” as a full-stack development suite with pre-designed blueprints to help teams build and secure workflows, and referenced Lenovo Imaging Services for Devices to simplify PC fleet deployment and day-one readiness.

In parallel, Lenovo previewed a laptop and workstation battery proof of concept: a silicon-anode design cited as 1,000Wh/L, with up to 99.9Whr capacity, intended to improve battery life and sustained performance without increasing device footprint.

Enterprise Inference

For data center and edge, Lenovo positioned its Hybrid AI Advantage with NVIDIA as a production inferencing platform, designed to bring AI on-premises with tighter operational control and improved economics compared to cloud-only approaches. Lenovo claimed ROI in under six months and up to 8x lower per-token cost than “comparable cloud IaaS,” aligning the message with TTFT, throughput, and per-token cost as decision criteria.

Lenovo introduced new inference-optimized ThinkSystem and ThinkEdge servers, along with enhanced hybrid AI platforms and partner integrations, designed for real-time inference across retail, manufacturing, healthcare, sports, and smart cities.

RTX PRO 6000 Blackwell Server Edition GPU

The expanded portfolio includes NVIDIA-Certified Systems integrated with NVIDIA AI Enterprise software, with several specific platform tiers:

  • Two Lenovo Hybrid AI platforms: one built around NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs for scale-out enterprise AI and multimodal inferencing, and another based on NVIDIA Blackwell Ultra targeting training, fine-tuning, and large-scale inference.
  • Hybrid AI inferencing starter platform: using NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs, positioned for single-node deployments. Lenovo claimed up to 3x performance gains for vision AI and up to 4x performance for content generation compared to NVIDIA L4.
  • ThinkAgile HX650a: paired with Nutanix Enterprise AI and Nutanix Kubernetes Platform as a validated foundation for protected inferencing and agentic workloads.
  • Data and protection integrations: Lenovo Hybrid AI platforms with Cloudian for scalable, sovereignty-aligned data pipelines, and Veeam Kasten for Kubernetes-native protection of AI models and services.

Lenovo also said the portfolio is backed by an expanded global collaboration with IBM Technology Lifecycle Services, positioning the relationship to accelerate deployment and operations of hybrid AI at scale.

Industry Solutions

Lenovo is expanding its Lenovo AI Library with vertical solutions built on Lenovo Hybrid AI Advantage with NVIDIA, emphasizing production-grade inferencing embedded into workflows rather than stand-alone demos.

In sports, Lenovo described use cases for real-time analytics, operational intelligence, and broadcast optimization, with the underlying theme of low-latency inference for live production and venue operations. In retail, Lenovo highlighted in-store and digital assistants delivered through the Lenovo xIQ Agent Platform with NVIDIA, aiming at personalization and operational efficiency.

For industrial and mobility environments, Lenovo described “physical AI” offerings that combine robotics, edge compute, and multimodal sensing to automate inspection, improve worker safety, optimize fleet operations, and reduce downtime. The company also referenced an “Auto AI Box” intended to extend edge inference into vehicle computing platforms for advanced driver assistance, predictive maintenance, and real-time fleet intelligence.

Lenovo tied these efforts to its Lenovo AI Innovators ecosystem, naming partners AiFi, RocketBoots, and Vaidio as part of a validated-solutions approach for the public sector, smart cities, retail, and other verticals.

AI Cloud

On the cloud infrastructure side, Lenovo positioned itself as a launch partner for NVIDIA’s Vera Rubin NVL72, offering liquid-cooled, rack-scale AI systems for hyperscale and sovereign AI cloud providers. Lenovo’s pitch here is faster deployment and improved token economics, citing up to 10x higher throughput and reduced cost per token.

Lenovo also introduced NVL8 systems and said it is collaborating with Nscale to support large-scale inference and agentic workloads. Lenovo Hybrid AI Factory Services is being used to wrap these deployments with lifecycle management, global rollout support, and operational optimization, with the stated goal of reducing risk and accelerating time-to-revenue for AI cloud providers.

Channel and Go-to-Market

Finally, Lenovo emphasized that its partnership with NVIDIA is structured for broad partner delivery through the Lenovo 360 framework. The company said partners can package devices, infrastructure, services, NVIDIA AI Enterprise software, accelerated computing, and networking into a guided engagement model designed to move customers from pilot to production and then scale AI capabilities across hybrid environments.

The post Lenovo Expands Hybrid AI Advantage with NVIDIA at GTC 2026: New Inference Platforms, Workstations, and Rack-Scale AI Cloud appeared first on StorageReview.com.

Dell Expands AI Factory with NVIDIA at GTC 2026: New Data Engines, Lightning File System, and Exascale Storage

17 March 2026 at 17:10

Dell Technologies has introduced the Dell AI Data Platform, a set of data and storage technologies aligned with NVIDIA’s AI ecosystem, designed to help enterprises move from AI pilots to production-scale, agentic systems. The platform is designed to address a familiar constraint in enterprise AI: data that is too slow, siloed, or poorly governed to support high-value AI workloads.

As organizations attempt to operationalize AI agents and long-context workflows, they often discover that the limiting factor is not model choice but the ability to access, structure, and govern data across fragmented systems. Dell positions the AI Data Platform as part of the Dell AI Factory with NVIDIA, providing a combined stack of data engines, GPU-accelerated processing, and high-throughput storage.

Dell Rack 7000 door open

Dell cites internal benchmarks showing up to 12x faster vector indexing, 3x faster data processing, and 19x faster time-to-first-token compared with “traditional computing methods,” with acceleration driven by NVIDIA GPUs and CUDA-X libraries integrated into the data layer.

Automating the AI Data Lifecycle with Dell Data Engines

At the core of the Dell AI Data Platform is a set of Dell data engines, accelerated by NVIDIA AI infrastructure, to automate the AI data lifecycle and reduce data prep time while preserving enterprise controls.

Dell AI Data Platform with NVIDIA graphic

The Dell Data Orchestration Engine, based on technology from Dell’s Dataloop acquisition, is designed to manage data from ingestion through to AI-ready datasets. It automatically discovers structured, unstructured, and multimodal data sources, then labels, enriches, and transforms them into governed datasets suitable for training and inference. The platform combines automated pipelines with active learning and human-in-the-loop review to iteratively improve dataset quality and model accuracy without sacrificing governance.

To speed deployment, the Data Orchestration Engine includes a Marketplace with pre-built data workflows that incorporate NVIDIA NIM microservices, NVIDIA AI blueprints, and more than 200 additional models, applications, and templates. This marketplace-style approach is intended to shorten time-to-value by providing reusable components for common enterprise AI patterns.

Integration with NVIDIA AI Blueprints, NIM, and Nemotron

The Dell AI Data Platform aligns with NVIDIA’s AI-Q blueprint for building AI agents that deliver actionable insights across enterprise data. Dell integrates NVIDIA-accelerated data engines to support efficient data preparation, retrieval, and reasoning across structured and unstructured sources, with a focus on retrieval-augmented generation and agentic workflows.

Dell AI Data Platform with NVIDIA features graphic

Enterprises can access a library of NVIDIA content, including:

  • Pre-built NVIDIA AI Blueprints for common agent and application patterns
  • NVIDIA NIM microservices for model deployment and inference
  • The NVIDIA Nemotron 3 Super model via the Dell Enterprise Hub on Hugging Face

This integration allows customers to pair Dell’s data layer and storage stack with NVIDIA’s model, inference, and microservice ecosystem in a supported configuration.

Support for NVIDIA STX, Vera Rubin NVL72, and Spectrum-X

Dell is also aligning its infrastructure portfolio with NVIDIA’s STX modular reference design. The company plans to support NVIDIA STX, powered by the NVIDIA Vera Rubin NVL72, NVIDIA BlueField-4 DPUs, and NVIDIA Spectrum-X Ethernet networking. The intent is to provide a modular, scalable platform for AI data management and processing that tightly couples GPU, DPU, and high-performance networking with Dell’s storage and data software.

NVIDIA Spectrum-X front exploded

This STX-based approach targets environments that need to manage, process, and retrieve large volumes of data for training, fine-tuning, and real-time inference, while maintaining performance isolation and predictable throughput.

AI Assistant for SQL Analytics

Within the Dell Data Analytics Engine, Dell is introducing an AI Assistant that provides a conversational interface to SQL analytics. The assistant allows business and technical users to:

  • Query governed, structured data using natural language instead of raw SQL
  • Visualize query results for analysis and collaboration
  • Work within existing governance boundaries while expanding data access to non-SQL experts

For organizations deploying AI agents that need high-quality structured data, this SQL-focused assistant is intended to reduce reliance on specialized data engineering skills and accelerate decision cycles.

NVIDIA RTX PRO Blackwell and CUDA-X in the Data Layer

A key design point of the Dell AI Data Platform is the integration of NVIDIA GPUs directly into the data layer. Dell is incorporating NVIDIA RTX PRO Blackwell Server Edition GPUs to accelerate data processing close to where data resides.

NVIDIA RTX PRO Blackwell Server Edition GPU

NVIDIA CUDA-X libraries, such as cuDF for columnar operations on structured data and cuVS for vector indexing, run alongside Dell’s data engines to offload and parallelize data workloads. Dell reports performance improvements of up to 3x for SQL queries and up to 12x for vector indexing compared to traditional, CPU-centric approaches.

By embedding GPU acceleration into the data path, Dell aims to keep downstream AI training and inference pipelines fed with prepared, indexed data at a rate that matches GPU consumption, which is a common bottleneck in production-scale AI environments.

Storage: Keeping GPUs Utilized at Scale

As AI workloads move beyond experimentation, storage bandwidth and latency often become the limiting factor, leaving costly GPUs idle. Dell is responding with AI-optimized storage engines that are architected to sustain high performance as capacity and node counts grow.

Dell Lightning File System

The Dell Lightning File System is a high-performance parallel file system tuned for AI training and inference workloads. Dell specifies up to 150 GB/s per rack unit, with performance advantages versus traditional flash-only and parallel file systems.

Dell Lightning FS graphic

Lightning uses a fabric-based architecture that enables direct, high-bandwidth access from compute nodes to storage, aiming to maintain GPU utilization at scale. The system integrates with NVIDIA-based AI infrastructure and is well-suited to GPU-dense training clusters where sustained throughput is critical.

Dell Exascale Storage

Dell Exascale Storage is presented as a 3-in-1 storage system for AI and high-performance computing, supporting concurrent file, object, and parallel file storage on Dell PowerEdge servers. On a single hardware platform, organizations can deploy:

  • Dell PowerScale for scale-out file system
  • Dell ObjectScale for S3-compatible object storage
  • Dell Lightning File System for a high-throughput parallel file system

This converged approach targets demanding AI and HPC workloads, including high-frequency trading and “neoclouds,” where mixed protocols and extreme throughput requirements coexist. Exascale Storage supports NVIDIA CX-8 and CX-9 SuperNICs and up to 800 GbE network links, and Dell quotes performance of up to 6 TB/s per rack for AI workloads that must ingest very large datasets quickly.

KV Cache Offload with NVIDIA CMX and Dell Storage

To support long-context and agentic AI workloads, Dell and NVIDIA are enabling KV Cache offload from GPU memory to shared storage. Using NVIDIA CMX memory-storage technologies and inference-acceleration capabilities, KV Cache can be deployed on Dell CMX Storage and high-speed networks across PowerScale, ObjectScale, and Lightning File System.

By relocating the KV Cache from limited GPU memory to high-performance shared storage, organizations can:

  • Increase effective context length for large language models
  • Maintain continuity for long-running agent interactions
  • Improve GPU utilization by freeing VRAM for compute rather than cache

This is particularly relevant for AI agents that need to reference extensive historical data or maintain long, stateful conversations without exhausting GPU memory.

PowerScale pNFS for Large File Performance

Dell PowerScale’s software-driven Parallel Network File System (pNFS) architecture is optimized for large-file throughput in AI environments. Dell reports up to 6x faster performance for large files compared with NESv3.12.

pNFS distributes file access across nodes to maintain high bandwidth and reduce hot spots, helping keep data flowing to GPU workloads. The goal is to minimize I/O bottlenecks and reduce idle GPU time, especially in training scenarios with large checkpoint files and datasets.

Dell AI Factory with NVIDIA: Scale and ROI

The Dell AI Data Platform and storage enhancements arrive as Dell marks the two-year milestone of the Dell AI Factory with NVIDIA. According to Dell, more than 4,000 customers are deploying the AI Factory stack, with early adopters reporting up to 2.6x ROI in the first year.

The broader Dell AI Factory portfolio spans infrastructure, software, solutions, and services designed to move AI from isolated pilots to production at scale. The AI Data Platform, data engines, and storage stack outlined here are intended to address the data access and performance aspects of that journey, which, in many environments, have proven more challenging than standing up GPU clusters or models.

The Dell AI Data Platform with NVIDIA offers an integrated, data-centric architecture that combines lifecycle automation, GPU-accelerated data processing, and high-throughput storage, all designed to keep AI agents and models fully utilized.

The post Dell Expands AI Factory with NVIDIA at GTC 2026: New Data Engines, Lightning File System, and Exascale Storage appeared first on StorageReview.com.

Everpure Aligns FlashBlade//EXA with NVIDIA AI Factory Architectures, Previews Data Stream

16 March 2026 at 20:30

Everpure is aligning its FlashBlade//EXA platform with NVIDIA’s evolving AI Factory architectures while previewing a new automation layer called Everpure Data Stream. The announcement extends Evergreen//One support to EXA and introduces a service designed to streamline data movement across AI pipelines, targeting one of the most common enterprise AI challenges: projects that work in pilot environments but stall before reaching production scale.

FlashBlade//EXA is positioned as the high-performance data backbone for these deployments, supporting large training runs and high-concurrency inference workloads. The upcoming Data Stream service focuses on automating data ingestion, preparation, and delivery to GPU infrastructure, reducing the operational complexity that often slows AI programs as they move from experimentation to production environments.

Storagereview Everpure Evergreen//One stack

EG1 for AI now extends from FlashBlade//EXA to deliver the performance, scale, and throughput needed for larger training runs and high‑concurrency inference. Everpure Data Stream, entering beta later in 2026, is designed to automate data movement from ingestion to model execution, reducing manual pipeline work and operational delays that often slow down AI projects.

Kaycee Lai, Everpure’s AI Vice President, frames the problem as treating AI as “just another workload” rather than as a data‑centric, continuous system. She highlighted that Everpure positions its stack to collapse data silos and move AI programs from experimentation to repeatable production outcomes, enabled by predictable performance and operational flexibility.

Benchmark‑Proven AI Storage and Data Path

For AI deployments, storage and data infrastructure must keep high‑value GPUs running near full utilization. Everpure is aligning FlashBlade//EXA with NVIDIA’s modular STX reference architecture to support next‑generation AI Factory designs built on the Vera Rubin platform.

Everpure FlashBlade//EXA compute diagrame

The combined architecture integrates EXA’s scalable file and object performance with STX components, such as BlueField-enabled storage controllers and context memory architectures. The goal is to optimize the entire AI data pipeline: data preparation, feature and embedding creation, and long‑context inference. Special emphasis is placed on context memory because large-scale, agentic, multi-step reasoning systems rely on quick access to extensive context windows and history. The EXA/STX design addresses these giga‑scale inference demands by delivering sustained bandwidth and minimizing tail latency.

Recent industry benchmarks validate the platform’s behavior under realistic, high‑concurrency AI workloads. The benchmarks include:

  • SPECstorage Solution 2020 AI_Image: FlashBlade//EXA achieved the highest score recorded for the SPEC Storage AI_Image benchmark, powering 6,300 simultaneous AI jobs. This result illustrates the system’s ability to support large numbers of concurrent training and preprocessing tasks at full performance, an increasingly common pattern in multi‑tenant and multi‑team AI environments.
  • MLPerf‑aligned GPU Utilization and Throughput: Internal, model‑driven workloads aligned with MLPerf show that FlashBlade//EXA can transfer data nearly twice as fast as its closest competitor while using less than half the storage footprint of a rack. In tests, the platform maintained over 90% GPU utilization across large H100 clusters. This suggests the storage system is unlikely to be a bottleneck, allowing expensive accelerators to stay busy as datasets and models grow. EXA’s design scales linearly, maintaining this utilization as more compute and storage are added.

Everpure is also expanding NVIDIA‑Certified Storage (NVCS) validation to FlashBlade//EXA. This effort provides a clearer baseline for compatibility and performance and serves as a stepping stone to the NVCS “NCP” certification level, aligned with NVIDIA Cloud Partner reference architectures. For enterprises standardizing on NVIDIA-focused AI solutions, this type of storage certification helps reduce integration risk and makes it easier to adopt reference designs.

Automating AI Data Orchestration

High storage performance alone does not guarantee AI success if data pipelines into the AI stack remain fragmented and manual. Everpure Data Stream is introduced as an orchestration layer that automates key steps from data ingestion through preparation and delivery into GPU infrastructure.

The service focuses on curating and orchestrating “AI‑ready” data so that training and inference systems are continuously fed with current datasets without requiring heavy operational intervention. The intent is to shorten the time from the initial experiment to a stable production run by reducing ad hoc scripting, manual data staging, and repeated engineering workarounds for dataset refreshes.

An AI Data Platform (AIDP) for Everpure Data Stream, co-engineered with Supermicro, offers a compact reference design for organizations seeking a smaller initial footprint. This combination integrates Supermicro’s server and accelerator hardware with Everpure’s software-defined storage layer, providing a ready-made solution for deploying a data plane that supports both training and inference pipelines.

As part of this AIDP strategy, Everpure also supports accelerated platforms, including the NVIDIA RTX PRO 6000 Blackwell Server Edition, and plans to extend support to the RTX PRO 4500 Blackwell Server Edition. These configurations target customers who need strong inference and edge or departmental training capabilities without having to immediately invest in large data center GPU clusters.

Continuous Data Optimization

Everpure builds its platform around the idea that AI infrastructure isn’t just a one-time investment but an ongoing process of data improvement and performance testing. In this view, AI readiness isn’t just about deploying technology but involves a continuous cycle of collecting new data, retraining or tuning models, and verifying performance as workloads change.

By integrating FlashBlade//EXA, Evergreen//One’s consumption model, Data Stream automation, and alignment with NVIDIA STX and NVCS certifications, Everpure aims to help organizations move ready from isolated AI pilots to repeatable, production‑grade AI factories, while maintaining focus on GPU utilization and operational efficiency.

Internal MLPerf component measurements support some of these claims, although they were not submitted as official MLPerf results. From a technical perspective, the key points are the demonstrated concurrency under SPECstorage AI_Image, the reported GPU utilization figures in H100 environments, and the move toward fully validated NVIDIA‑aligned reference architectures.

The post Everpure Aligns FlashBlade//EXA with NVIDIA AI Factory Architectures, Previews Data Stream appeared first on StorageReview.com.

❌
❌