Normal view

There are new articles available, click to refresh the page.
Today — 17 April 2026StorageReview

NetApp Expands Google Cloud Collaboration for Sovereign, Air-Gapped Deployments

16 April 2026 at 17:25

NetApp announced an expanded collaboration with Google Cloud, formalized through a four-year enterprise agreement to accelerate the deployment of NetApp storage within Google Distributed Cloud (GDC) Air-Gapped environments. Delivered with World Wide Technology (WWT), the offering targets sovereign cloud use cases that require strict data residency, security, and operational isolation.

The joint solution integrates NetApp’s data platform with Google Distributed Cloud’s full-stack private cloud architecture. The result is an air-gapped environment that supports sensitive and classified workloads while maintaining compliance with national sovereignty requirements. NetApp positions its storage systems as secure-by-design, enabling organizations to deploy controlled infrastructure that supports modern applications and AI workflows without external connectivity.

Google Cloud Air Gapped graphic

NetApp integrates its AFF all-flash systems, StorageGRID object storage, and Trident Kubernetes storage orchestration into the GDC stack. Together, these components form what the company calls an intelligent data infrastructure. Within GDC, this architecture supports zero-trust security models, local data storage, customer-managed encryption keys, and full operational control. The platform enables organizations to extend cloud capabilities to on-premises or edge environments while maintaining isolation, or to operate in fully disconnected, air-gapped configurations.

The collaboration is primarily aimed at government and regulated industries, where data-handling requirements limit the use of traditional public cloud. NetApp leadership highlighted that these environments require infrastructure capable of handling classified data while supporting modernization initiatives. By integrating with GDC, NetApp enables enterprise-grade AI and analytics capabilities within accredited environments, allowing agencies to derive insights and automate processes without compromising compliance or sovereignty.

Google Distributed Cloud is designed to extend Google Cloud services to customer-controlled locations, including on-premises data centers and edge sites. Google noted that public-sector organizations face growing pressure to extract value from data while complying with strict regulatory frameworks. GDC addresses this by enabling the deployment of cloud-native services and advanced AI in sovereign and disconnected environments.

As part of this effort, Google has expanded the availability of its AI capabilities for regulated use cases. Gemini models are now supported in GDC environments, enabling generative AI functions such as automation, content generation, discovery, and summarization directly on-premises. These capabilities can run in fully disconnected deployments, allowing organizations to leverage advanced AI while maintaining strict security and compliance boundaries.

The NetApp and Google Cloud partnership reflects a broader trend of bringing cloud and AI capabilities into controlled environments. By combining enterprise storage with sovereign cloud infrastructure, the companies are targeting organizations that require both advanced data services and strict operational isolation.

The post NetApp Expands Google Cloud Collaboration for Sovereign, Air-Gapped Deployments appeared first on StorageReview.com.

Comino Grando RTX PRO 6000 Review: 768GB of VRAM in a Liquid-Cooled 4U Chassis

16 April 2026 at 16:27

Comino recently sent us the latest version of the Comino Grando for review, configured with eight NVIDIA RTX PRO 6000 Blackwell cards, each with 96GB of VRAM, for a total of 768GB of GPU memory. We reviewed the Comino back in in 2024, configured with 6x RTX 4090s, offering 144GB of total GPU memory, as well as a version with NVIDIA H100’s. This latest update marks a substantial generational leap in both raw memory capacity and the range of workloads the platform can address. 

Comino Grando RTX PRO 6000 full front bezel and GPU I/O

The Grando is a purpose-built 4U platform designed to resolve the critical conflict between high-density GPU compute and thermal management. While standard air-cooled chassis crumble under the sustained 600W+ TDP demands of modern professional cards, the Grando takes a fundamentally different approach, built from the ground up around a liquid-cooled architecture capable of dissipating a massive 6.5kW of continuous heat. This is not a retrofit or an afterthought; the entire chassis, from its inverted motherboard layout to its color-coded quick-disconnect manifold system, has been engineered around the cooling loop.

The result is a platform that can sustain eight full-TDP professional GPUs in a single 4U chassis, running 24/7 in ambient environments of 3-38°C, without thermal throttling, without the acoustic assault of high-RPM air cooling, and without compromising serviceability. For organizations deploying AI inference, machine learning training, or high-performance simulation workloads at scale, the Grando offers something genuinely rare: a server that does not ask you to choose between density, thermals, and reliability.

Comino Grando Specifications

The table below shows the physical specifications and supported hardware configurations for the Comino Grando platform.

Specification / Feature Comino Grando
Comino Grando Server & Rackable Workstation
Cooling Capacity 6.5kW (Maximum 6 500 W @ 20°C intake air T)
Motherboards Up to EATX & EBB
GPUs (Server) Up to 8;
NVIDIA: RTX A6000, RTX 6000 ADA, RTX PRO 6000, A40, L40, L40S, A100, H100, H200
GPUs (Rackable Workstation) Up to 6;
NVIDIA: 3090, 4090, 5080, 5090, RTX A6000, RTX 6000 ADA, RTX PRO 6000, A40, L40, L40S, A100, H100, H200;
AMD: W7800, W7900
CPUs Up to 2;
Single Socket: Intel Xeon W-2400/2500 & 3400/3500, Intel Xeon Scalable 4th Gen, 5th Gen, XEON 6, AMD Threadripper PRO 5000WX, 7000WX, 9000WX, AMD EPYC 9004/9005
Dual Socket: Intel Xeon Scalable 4th Gen, & 5th Gen, XEON 6, AMD EPYC 9004/9005
RAM Up to 2TB
M2 drives Up to 8x NVME
Storage Back panel hot swap cages: up to 4x hot swap SSDs (4x 7mm or 2x 15mm) and up to 4 more (4x 7mm or 2x 15mm) instead of 4th PSU;
Internal 3.5″ cage up to 4x 3.5″ or 4x 2.5″ 15mm or 12x 2.5″ 7mm;
Internal 2.5″ slots: up to 4x 2.5″ SSD 7mm
Power Supply & Operating Voltage Up to 4x 2000W Hot Swap CRPS @ 180-264V
Up to 4x 1000W Hot Swap CRPS @ 90-140V
Redundancy modes: 4+0, 3+1, 2+2
Noise level 39dB-70dB
Lan Up to 2x 10 Gbit/s on the MoBo and up to 400Gbit/s in PCIe
OS Ubuntu / Windows 11 (Pro/Home) / Windows Server
Physical & Cooling Specifications
Liquid cooling CPU with VRM and GPU with GDDR and VRM
Reservoir Comino custom 450ml with integrated pumps
Fans 3x Ultra High Flow 6200RPM (high noise level) or
3x High Flow 3000RPM (low noise level)
Installation 19″ rack-mountable or standalone as a Workstation
Required rack space 4U
Size 439 x 681 x 177mm (without handles and protruding parts)
Weight 4 GPUs: 49kg (net), 67kg (gross)
6 GPUs: 52kg (net), 70kg (gross)
8 GPUs: 55kg (net), 72kg (gross)
Operating & storage temperature range Storage: -5..50°C / 23..122°F
Operating: 3..38°C / 38..100°F
Comino Monitoring System (CMS)
Overview Controller Board with Sensors & Software for Real-Time Monitoring
Key Advantages Cooling System & CPU/GPU Monitoring, Web Interface, Cooling System Log, Centralized Monitoring for Workgroups
Sensors & Connected Devices Temperature (air and coolant), % Humidity, Voltage, Coolant flow, Reservoir coolant level, Fans, Pumps, Motherboard, Display, and buttons
Integration Possibilities Establish monitoring via a REST API and push sensor data to monitoring software (e.g., Zabbix, Grafana) or databases (e.g., InfluxDB).
CMS Technical Requirements
OS Windows 11/10
Ubuntu 22.04/20.4 (Dependency for Ubuntu: the target system must have nvidia-smi and sensors utilities installed)
Web Browsers Mozilla Firefox, Google Chrome, Chromium, Apple Safari, Microsoft Edge (Attention: Internet Explorer 11 is not supported)
Hard disk drive 300MB
Controller firmware version 1.0.6 or newer
Controller PCB version 2.xx.xx

Design, Build, and GPU Density

Chassis Layout and Deployment

The Grando Server is a masterclass in space optimization, measuring 17.3 x 26.8 x 6.97 inches (4U). Unlike traditional servers, it places the motherboard’s rear at the front of the chassis, inverting the conventional internal layout. This ensures that air-cooled components, such as RAM modules and VRMs, receive the coldest possible intake air before it reaches the liquid-cooling radiator at the rear.

The chassis itself is built to the same exacting standard, featuring solid steel construction with a matte black powder-coat finish applied inside and out. This deliberate choice extends to the tubing, cables, radiator, and PCB solder mask, reflecting a clear intention for a clean, professional aesthetic throughout. Furthermore, the system supports versatile deployment, functioning seamlessly as either a 19-inch rack-mountable unit or a standalone desktop unit. Depending on the configuration, it weighs between 148 and 159 lbs.

Comino Grando RTX PRO 6000 top down view

GPU Cold Plates and Water Blocks

The proprietary copper water blocks form the core of the Grando’s density, cooling not only the GPU die but also the other components like memory and voltage regulators. Each GPU ships as an off-the-shelf card, on which Comino mounts a custom cold-plate assembly. In practice, this thin-profile design reduces each card to a single-slot footprint, allowing six or even eight professional GPUs to sit side by side within a single 4U chassis. Our review unit shipped with eight NVIDIA RTX PRO 6000 Blackwell cards, each with a TDP of 600W, resulting in a total cooling requirement of 4,800W under full load.

Comino Grando NVIDIA RTX PRO 6000 Pair cooler side profile

Achieving the Comino’s 8 single slot GPU density would be nearly impossible with air cooling, since stock NVIDIA RTX PRO 6000 cards each occupy two slots and require substantial airflow. In contrast, these custom-cooled cards occupy just one slot each. The cold plates are built solidly, adding noticeable weight to each card, but that weight reflects the quality and cooling performance required at this level.

Each pair of GPUs is plumbed through a dedicated sub-manifold that consolidates both cards into a single inlet and outlet connection to the main coolant manifold. This paired approach simplifies the overall loop architecture, reduces the number of connections at the main manifold, and allows a technician to disconnect a single pair of quick-disconnect couplings to remove two cards at once, further streamlining maintenance.

Comino Grando connected pair of GPU cards tubing and quick connect fittings

Water Distribution and Manifold

At the center of the system sits a large water distribution manifold that supplies cool liquid to each GPU and CPU cold plate and provides the return path to the radiator. All connections between the manifold and the GPU’s and CPU use Comino’s “TheQ” Quick Disconnect Couplings. These stainless steel dripless fittings are color-coded with red and blue rings to clearly identify the hot and cold sides of the loop, removing any ambiguity during installation or servicing.

Comino Grando TheQ Quick Disconnect Couplings close up

They leave minimal residue on the mating surface when disconnected, allowing technicians to remove or replace individual GPUs or the CPU without draining the 450ml reservoir or the rest of the loop. In this way, the Grando brings the maintenance simplicity of air-cooled systems to a high-performance liquid-cooled platform.

CPU Cooling and Memory

The CPU and its voltage regulators also benefit from a dedicated cold plate connected directly to the coolant loop, preventing the processor from becoming a bottleneck during intense multi-GPU workloads. Our review unit shipped with an AMD Turin/Genoa board featuring a single AMD EPYC 9474F 48-core processor. The cold plate mirrors the quality of the card cold-plates, machined from solid copper and secured with stainless-steel hardware.

Comino Grando CPU waterblock

Flanking the CPU on both sides are eight fully populated DRAM slots that support configurations up to 2TB of RAM. Our review unit came equipped with 512GB of DDR5 RAM. A support bar spans above the GPU and CPU area of the chassis, perpendicular to them, securing sensitive components like the GPU’s and maintaining chassis rigidity during transport.

Radiator and Fans

Cooling is handled by a large triple 140mm radiator mounted at the rear of the chassis, paired with three high-speed 140mm fans capable of reaching 6,200 RPM and moving up to 1,000 m³/h of airflow. The dense fin stack provided by the thick radiator underscore the thermal headroom designed into the platform, which is built to dissipate up to 6.5kW of sustained heat in our configuration.

What is perhaps most surprising is that despite that workload and those fan speeds, the unit manages to stay within a tolerable noise envelope, with sound levels sitting 70+ dB at full tilt. That is loud by workstation standards but notably restrained for a system dissipating the thermal output of a small electric furnace, which speaks to how effectively the Comino’s liquid loop transfers heat away from the components.

Comino Grando radiator and fans

Front Panel and Telemetry Display

On the front panel, an LED display provides a live readout of key telemetry data, including pump status, ambient air temperature, coolant temperature, and fan speed. Users navigate the menu using illuminated buttons on the cooling module, with short presses to scroll through available data. A long press on the PB2 button opens additional menu branches, including Commands, Service settings, and an Event Log. In addition, the front I/O panel includes a VGA port for display output, alongside a serial port, multiple USB ports, and network connections for peripheral and device connectivity.

Comino Grando front I/O and Power button with LCD

Power and Storage Architecture

Power Delivery and Redundancy

Supporting this level of compute requires equally robust power delivery. The Grando supports up to four hot-swap 1000W or 2000W CRPS modules in a redundant configuration, delivering up to 8.0kW at 180–264V. With support for 4+0, 3+1, and 2+2 redundancy modes, the system can tolerate PSU failures while maintaining continuous operation for 24/7 AI and HPC workloads.

Comino Grando RTX PRO 6000 rear power and storage.

Our review unit shipped with four Great Wall 2000W 80 Plus Platinum hot-swap power supplies, forming the full 8.0kW configuration.

Comino Grando single hot-swap 2000w PSU

Power delivery to each GPU runs through a centralized 12-pin power distribution board mounted between the GPU array and the main cable run. The Grando uses this distribution board to consolidate incoming power feeds and then branch them to each GPU in an organized, space-efficient manner.

Comino Grando GPU power breakout and cables

PCIe, Storage, and Networking

The Grando comfortably supports six GPUs without compromising slot bandwidth, and the chassis scales to a full eight-card configuration for maximum density. The Comino’s ASRock Rack GENOAD8X-2T/BCM motherboard provides seven x16 and one x8 PCIe Gen 5 slots, meaning seven of the eight GPUs run at full x16 bandwidth with the eighth card operating at x8. This is a trade-off between the number of PCIe lanes a single-socket CPU can support and Comino’s reluctance to add the size, cost, and complexity of a PCIe switch plate. Moving to a dual-socket motherboard would provide more PCIe lanes but offer even fewer slots, since the 2nd socket would occupy space otherwise used by PCIe slots in the space-constrained form factor.

Comino Grando GPU display connectivity.

Running eight GPUs in a single-socket system consumes the lion’s share of available PCIe lanes, and that comes with trade-offs. Our review unit, based on AMD Genoa, has 128 PCIe Gen 5 lanes available in total. With eight GPUs consuming 120 of those lanes, the remaining 8 lanes are split x4 to each M.2 SSD slot, so it is not possible to simultaneously run eight GPUs and a full complement of NVMe drives in the rear of the chassis connected via the two MCIO connectors. In our full 8-GPU configuration, only 2 M.2 slots were available for storage. Administrators who need additional NVMe capacity alongside maximum GPU density should be aware that adding rear hot-swap NVMe storage via the back-panel cages will consume additional PCIe lanes and disable some GPU capacity in their system.

Comino Grando Single Socket motherboard block diagram

ASRock Rack GENOAD8X-2T/BCM motherboard block diagram showing CPU, PCIe Gen 5 slots, DIMM channels, M.2 slots, BMC, USB, SATA, and networking connections.

With that said, storage is equally modular and expansive, though the configuration does affect the PCIe lane budget for GPUs, which is worth planning around for the intended use case. The rear panel of our review unit features a 2.5″ drive cage that supports up to four 2.5-inch SSDs in either 4x 7mm or 2x 15mm configurations, with an optional second set of up to four available in place of the fourth PSU slot. Because our review unit required all four power-supply bays to support the full 8-GPU configuration, we had access only to the first of the two hot-swap bays. Internally, the chassis can support a 3.5-inch cage that accommodates up to four 3.5-inch drives, four 2.5-inch 15mm drives, or up to twelve 2.5-inch 7mm drives, plus four additional internal 2.5-inch 7mm SSD slots if configured.

Comino Grando 2.5" SSD Trays

For networking, two onboard RJ45 10 Gb/s ports powered by the Broadcom BCM57416 are standard on the motherboard, alongside a dedicated Gigabit Ethernet IPMI management port. Administrators can further increase bandwidth by installing PCIe NICs that support up to 400 Gb/s for high-bandwidth fabric connectivity, though note that additional PCIe NICs occupy GPU slots, reducing the maximum number of GPUs the system can host.

Comino Grando view of card tubes and M.2 storage

Remote Management and System Intelligence

To safeguard the hardware and optimize performance, the system includes the Comino Monitoring System (CMS). A separate, autonomous controller board drives the CMS and serves as the server’s “brain,” independent of the main operating system. In practice, this controller reads a comprehensive array of sensors that track air and coolant temperatures, humidity levels, coolant flow rates, and reservoir levels in real time. Crucially, this autonomous design enables the CMS to perform self-diagnosis and trigger emergency shutdowns upon detecting a leak or a pump failure, protecting the expensive internal hardware from damage.

A web-based GUI handles day-to-day management, providing administrators with clear visibility into cooling performance, uptime, and real-time energy consumption for the CPU and GPUs. For enterprise-scale deployments, the CMS also connects to centralized monitoring tools via REST APIs, such as Zabbix, Grafana, and InfluxDB. Together, these capabilities help administrators maintain a 3-year interservice period and keep the server running at peak efficiency without thermal throttling, even in high-ambient environments.

Beyond AI: Creative and Engineering Applications

While our testing focused on AI inference workloads, the Grando serves an equally practical role for creative professionals and engineers who need substantial local GPU compute. The 768GB of aggregate VRAM across eight RTX PRO 6000 cards unlocks capabilities that conventional workstation configurations cannot match.

FX artists and motion graphics professionals can render complex scenes with massive texture sets entirely in VRAM, eliminating the disk-swapping bottlenecks that plague productions using 8K footage or high-polygon environments. CAD engineers running computational fluid dynamics or structural simulations can tackle assemblies of unprecedented complexity without partitioning their models into multiple runs. Video editors working with multi-stream 8K RAW timelines, colorists applying ML-based noise reduction at full resolution, and 3D artists rendering path-traced finals locally rather than waiting for cloud farm availability all benefit from this density of GPU memory and compute.

The Grando does not require a full eight-GPU configuration. Comino offers the platform in four-GPU, six-GPU, and eight-GPU configurations, with all variants available for immediate shipment. Smaller studios, independent creators, and engineering teams can right-size their investment to current needs while retaining a clear upgrade path as workloads grow.

Platform Trade-offs: Density vs. Expandability

The Grando’s compact design delivers exceptional GPU density and thermal management within a standard 4U footprint, but that density involves architectural trade-offs worth understanding before deployment.

The chassis accommodates motherboards with EATX and EEB form factors, but not extended server boards found in traditional dual-socket platforms. This limits the total number of PCIe lanes available for peripherals beyond the GPU array. In our eight-GPU configuration, the AMD EPYC processor’s 128 PCIe Gen 5 lanes are almost entirely consumed by the GPUs, leaving little bandwidth for additional NVMe storage or high-speed networking beyond the onboard 10GbE ports.

This contrasts with the eight-GPU platforms we have reviewed from Dell, HPE, and Supermicro. Those systems use larger chassis, dual-socket configurations, and PCIe switch topologies to support significantly more peripheral connectivity. They typically accommodate four to eight additional NICs or DPUs alongside the full GPU complement, plus eight or more hot-swap NVMe bays, making them well-suited for distributed inference workloads that require high-bandwidth fabric interconnects.

However, that expanded capability comes at a substantial cost. Power draws exceed 8kW. Thermal loads require dedicated data center cooling infrastructure. Noise floors preclude deployment outside purpose-built machine rooms. And lead times frequently stretch six to eighteen months due to persistent supply constraints on enterprise GPU platforms.

The Grando occupies a different position. For organizations that prioritize rapid deployment, manageable operating environments, and inference or creative workloads over large-scale distributed training, the trade-offs are often favorable. Teams that need their hardware now, in an environment they can actually work with, may find the Grando’s approach to density more practical than waiting in a queue for a platform they cannot realistically deploy once it arrives.

Comino Grando Performance Testing Results

Comino Grando top view water cooling manifold

System Configuration

  • Chassis: Comino Grando
  • Motherboard: ASRock Rack GENOAD8X-2T/BCM
  • CPU: AMD EPYC 9474F 48C
  • Memory: 512GB DDR5
  • GPU: 8 x NVIDIA RTX PRO 6000
  • Storage: M.2 SSD

Claude Code Serving – MiniMax M2.5

Beyond traditional raw LLM inference benchmarks, we wanted to evaluate how well this hardware performs in an agentic coding workflow, specifically by serving multiple concurrent Claude Code sessions using a locally hosted model. This use case maps directly to development team productivity: how many engineers can simultaneously use an AI coding assistant served from a single node before the experience degrades?

To test this, we built a benchmark harness that generates a dataset of moderately difficult coding problems (such as implementing an LRU cache, building a CLI todo application, writing a markdown converter, and constructing a REST API) and runs each Claude Code session in a separate Docker container against the local vLLM server. A transparent proxy sits between the sessions and the inference endpoint, capturing per-request metrics for each Claude Code instance. The model used was MiniMax M2.5, served via vLLM on the system’s eight NVIDIA RTX PRO 6000 GPUs. While not the top-ranked coding model on public leaderboards, M2.5 is a capable model that many users, including our developer friends, run locally.

For a baseline reference point, we use Anthropic’s Claude Opus 4.6 average output throughput via OpenRouter.ai, one of the most popular routing services for production API access. That baseline comes in at approximately 37 tokens per second per API request.

We measured two key metrics: the average output tokens per second per Claude Code session (what each developer experiences) and the aggregate output tokens per second across all sessions (the total work the server produces).

Based on the results, a single concurrent Claude Code session delivers 67.3 tok/s per user and an aggregate output of 64.7 tok/s. At two sessions, per-instance throughput drops modestly to 57.4 tok/s, while aggregate output climbs to 95.1 tok/s as vLLM’s batching begins to amortize overhead. Four concurrent sessions maintain 49.2 tok/s per user, still a highly responsive experience for interactive coding workflows, while aggregate throughput reaches 177.2 tok/s. Eight sessions represent the sweet spot for aggregate output, peaking at 206.7 tok/s total, while per-instance throughput settles at 38.7 tok/s, a level that remains comfortable for real-time code generation and iteration.

At 16 concurrent sessions, the system exhibits the classic batching trade-off: per-instance throughput drops to 31.1 tok/s, and aggregate output falls to 105.8 tok/s. This suggests that, at this concurrency level, the 230B MiniMax M2.5 model is pushing the limits of what eight GPUs can sustain without introducing meaningful latency for each user. The aggregate dip from 8 to 16 sessions reflects the memory-bandwidth demands of a large MoE architecture under heavy simultaneous decode load, rather than a scheduling inefficiency.

For organizations evaluating self-hosted AI infrastructure for developer tooling, the Grando makes a strong case. Running a frontier-class 230B model, it can comfortably serve up to eight simultaneous Claude Code sessions at throughput levels that feel genuinely interactive, with per-user speeds exceeding 38 tok/s at peak aggregate output. Teams of four to eight engineers can operate at near-optimal throughput without perceptible degradation in responsiveness.

The liquid-cooled architecture also makes this level of compute practical in environments where traditional GPU servers cannot operate. The system runs quietly enough to sit in a startup office, a small machine room, or a dedicated corner of an open workspace. Air-cooled systems with similar GPU density typically reach 90 dB or higher, which is loud enough to require dedicated data center space or, at a minimum, a closed server closet with serious acoustic treatment. The Grando can coexist with the team that uses it. Combined with full data locality, no per-token API costs, and complete control over model selection, it offers a self-hosted path that scales with a growing development team without requiring datacenter infrastructure or lockstep cost increases.

vLLM Online Serving – LLM Inference Performance

vLLM is one of the most popular high-throughput inference and serving engines for LLMs. The vLLM online serving benchmark evaluates the real-world serving performance of this inference engine under concurrent requests. It simulates production workloads by sending requests to a running vLLM server, with configurable parameters such as request rate, input and output lengths, and the number of concurrent clients. The benchmark measures key metrics, including throughput (tokens per second), time to first token, and time per output token (TPOT), helping users understand how vLLM performs under different load conditions.

We tested inference performance across a comprehensive suite of models spanning various architectures, parameter scales, and quantization strategies to evaluate throughput under different concurrency profiles.

Summary Of Results

Model Precision Equal (256/256) Prefill-Heavy (8k/1k) Decode-Heavy (1k/8k)
Comino Grando w/ 8× RTX PRO 6000 Blackwell — vLLM Inference Results (tok/s, peak at BS=256)
GPT-OSS 20B ep_dp1 17,280 32,061 11,187
GPT-OSS 120B ep_dp1 11,726 21,636 7,570
Llama 3.1 8B Instruct FP8 12,109 20,137 7,353
Llama 3.1 8B Instruct FP4 11,954 20,206 7,239
Llama 3.1 8B Instruct BF16 11,752 17,346 6,155
Qwen3 Coder 30B A3B FP8 10,985 16,659 4,907
Qwen3 Coder 30B A3B BF16 10,588 16,680 4,829
Mistral Small 3.1 24B BF16 8,925 11,846 4,975
MiniMax M2.5 (230B) ep_dp1 5,753 7,357* 2,555
All values in tok/s, peak throughput at BS=256. *MiniMax M2.5 prefill-heavy peaked at BS=128 (7,357 tok/s); BS=256 was 7,141 tok/s.

GPT-OSS 120B and 20B

The GPT-OSS model family was tested in both 120B and 20B configurations on the Comino Grando.

GPT-OSS 120B

Under equal workload (256/256), the 120B model delivers 268.85 tok/s at BS=1, reaches 6,666.23 tok/s at BS=64, and peaks at 11,726.04 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 1,375.69 tok/s, climbs to 16,374.19 tok/s at BS=64 and 17,944.55 tok/s at BS=128, and peaks at 21,636.41 tok/s at BS=256. Decode-heavy (1k/8k) grows from 196.28 tok/s at BS=1 to 7,569.97 tok/s at BS=256, with latency well-controlled at lower concurrency levels.

GPT-OSS 20B

The 20B model delivers 334.80 tok/s at BS=1 under equal workload, reaches 10,303.56 tok/s at BS=64, and peaks at 17,280.12 tok/s at BS=256. Prefill-heavy starts at 2,007.90 tok/s, climbs to 24,990.46 tok/s at BS=64 and 26,866.25 tok/s at BS=128, peaking at 32,060.72 tok/s at BS=256, the highest absolute prefill throughput recorded across both model sizes. Decode-heavy grows from 286.08 tok/s at BS=1 to 11,187.36 tok/s at BS=256, delivering roughly 1.5× the decode throughput of the 120B at peak concurrency while maintaining tighter latency throughout.

Qwen3 Coder 30B A3B Instruct and FP8 Instruct

The Qwen3-Coder-30B-A3B-Instruct model was tested with both BF16 and FP8 precision.

Qwen3-Coder-30B-A3B-Instruct (BF16)

Under an equal workload (256/256), the BF16 model delivers 1,902.32 tok/s at BS=8, reaches 6,683.58 tok/s at BS=64, and peaks at 10,587.56 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 1,256.03 tok/s at BS=1, climbs to 14,400.57 tok/s at BS=64 and 15,308.35 tok/s at BS=128, and peaks at 16,679.52 tok/s at BS=256. Decode-heavy (1k/8k) grows from 169.19 tok/s at BS=1 to 4,828.82 tok/s at BS=256, with latency well-controlled at lower concurrency levels.

Qwen3-Coder-30B-A3B-Instruct (FP8)

The FP8 model delivers throughput comparable to BF16 across most scenarios, with equal workload reaching 6,478.54 tok/s at BS=64 and peaking at 10,984.61 tok/s at BS=256, a slight improvement over BF16 at peak concurrency. Prefill-heavy starts at 987.48 tok/s at BS=1, climbs to 14,036.46 tok/s at BS=64 and 15,156.69 tok/s at BS=128, and peaks at 16,658.98 tok/s at BS=256. Decode-heavy grows from 130.70 tok/s at BS=1 to 4,906.51 tok/s at BS=256, marginally outpacing BF16 at peak concurrency while the two configurations remain closely matched throughout the rest of the concurrency range.

Mistral Small 3.1 24B Instruct 2503

Under an equal workload (256/256), the model delivers 1,598.79 tok/s at BS=8, reaches 4,713.84 tok/s at BS=64, and scales strongly to 8,925.12 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 897.84 tok/s at BS=1, climbs to 9,632.58 tok/s at BS=64 and 11,488.13 tok/s at BS=128, peaking at 11,846.15 tok/s at BS=256. Decode-heavy (1k/8k) grows from 124.98 tok/s at BS=1 to 2,653.82 tok/s at BS=64, then accelerates noticeably at higher concurrency levels, reaching 4,262.53 tok/s at BS=128 and peaking at 4,975.06 tok/s at BS=256, reflecting the model’s ability to sustain strong decode throughput as concurrency scales.

Llama 3.1 8B Instruct

The Llama-3.1-8B-Instruct model was tested across three precision configurations on the Comino, providing a clear view of how quantization affects throughput for this model size.

Llama 3.1 8B Instruct BF16

Under an equal workload (256/256), the BF16 model delivers 2,776.42 tok/s at BS=8, reaches 7,369.01 tok/s at BS=64, and peaks at 11,751.56 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 1,645.29 tok/s at BS=1, climbs to 14,990.47 tok/s at BS=64 and 17,140.71 tok/s at BS=128, and peaks at 17,345.80 tok/s at BS=256. Decode-heavy (1k/8k) grows from 234.78 tok/s at BS=1 to 6,154.73 tok/s at BS=256.

Llama 3.1 8B Instruct FP8

FP8 quantization delivers a meaningful uplift across all scenarios. The equal workload reaches 7,530.39 tok/s at BS=64 and peaks at 12,108.98 tok/s at BS=256. Prefill-heavy climbs to 16,546.53 tok/s at BS=64 and 19,306.49 tok/s at BS=128, peaking at 20,137.35 tok/s at BS=256, roughly a 16% gain over BF16 at peak concurrency. Decode-heavy peaks at 7,353.40 tok/s at BS=256, approximately 19% ahead of BF16.

Llama 3.1 8B Instruct FP4

FP4 delivers throughput that is closely competitive with FP8 at higher concurrency levels, though it falls slightly behind at lower batch sizes. The equal workload peaks at 11,954.40 tok/s at BS=256, and prefill-heavy reaches its highest point at 20,205.57 tok/s at BS=256, narrowly edging out FP8 at peak concurrency. Decode-heavy peaks at 7,239.29 tok/s at BS=256, remaining within a few percent of FP8 throughout, making FP4 a compelling option when memory efficiency is a priority without a meaningful sacrifice in throughput.

MiniMax M2.5

The MiniMax-M2.5 230B, tested on the Comino Grando, was the largest and most demanding model we used.

Under an equal workload (256/256), the model starts at 16.35 tok/s at BS=1, reaches 2,751.25 tok/s at BS=64, and scales strongly at higher concurrency, peaking at 5,753.24 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 606.97 tok/s at BS=1, climbs steadily to 5,351.02 tok/s at BS=32 and 6,557.92 tok/s at BS=64, reaching its peak at 7,357.26 tok/s at BS=128 before slightly tapering to 7,140.74 tok/s at BS=256, suggesting the model approaches saturation in prefill throughput beyond BS=128. Decode-heavy (1k/8k) grows consistently from 82.21 tok/s at BS=1 to 1,485.28 tok/s at BS=64, peaking at 2,554.87 tok/s at BS=256, reflecting the expected memory bandwidth demands of a 230B MoE architecture under sustained decode workloads.

Conclusion

The Comino Grando is best understood as a system purpose-built to unlock the full potential of eight NVIDIA RTX PRO 6000 GPUs. Every major design decision, from the inverted motherboard layout to the cooling loop and integrated monitoring stack, is intended to ensure those GPUs can operate continuously at full 600W TDP without thermal or power constraints.

Comino Grando RTX PRO 6000 GPUs

What makes the Grando compelling is not any single feature in isolation but the way the entire system coheres. The liquid cooling is not a bolt-on addition; it is the architecture. The power delivery is redundant, hot-swappable, and scaled to the 4,800W load of eight 600W cards with headroom to spare. The monitoring system goes beyond reporting temperatures; it autonomously protects the hardware when something goes wrong. Nothing here feels like an afterthought.

The performance numbers reinforce that cohesion. Across a diverse suite of models, from Llama 3.1 8B to the 230B MiniMax M2.5, the Grando delivered throughput figures that hold up well for a self-hosted platform. Claude Code concurrency testing put a finer point on the practical value: eight engineers can run simultaneous agentic coding sessions against a locally hosted 230B model at interactive speeds, with per-user throughput exceeding 38 tok/s at peak aggregate output. Teams of four to eight can operate at near-optimal throughput without perceptible degradation.

The value of this configuration extends beyond AI inference. With 96GB of VRAM per GPU and dense multi-GPU scaling, the platform is equally well suited for high-end creative and engineering workloads, including VFX rendering, large-scale simulation, and complex CAD pipelines. The system scales down to four-GPU and two-GPU configurations, making this level of performance accessible to smaller studios and teams that still require workstation-class density.

Where the Grando differs most from the enterprise eight-GPU platforms we have reviewed is in deployment practicality. Those systems offer more PCIe lane headroom, more NIC slots, and deeper storage connectivity, but they also require dedicated data center infrastructure, draw well over 8kW, and have lead times that can stretch beyond a year. The Grando trades some of that peripheral expandability for a system that runs quietly enough to share a room with its users, dissipates less heat into the surrounding environment, and ships now. For organizations that prioritize rapid deployment and manageable operating environments over maximum fabric connectivity, the trade-off is favorable.

Product Page – Comino Grando
Comino Configurator – Page

The post Comino Grando RTX PRO 6000 Review: 768GB of VRAM in a Liquid-Cooled 4U Chassis appeared first on StorageReview.com.

Yesterday — 16 April 2026StorageReview

Broadcom Extends VMware Tanzu Platform with Agent Foundations for Enterprise AI

15 April 2026 at 18:02
VMware tanzu platform graphic VMware tanzu platform graphic

At the AI in Finance Summit, Broadcom introduced VMware Tanzu Platform agent foundations, positioning it as a secure-by-default runtime for building and operating autonomous AI applications on VMware Cloud Foundation (VCF). The release extends Tanzu’s established code-to-production model to AI agents, targeting enterprise teams seeking to move from isolated AI experiments to governed, production-scale deployments.

Moving AI Agents into Enterprise Operations

As AI agents assume execution and decision-making roles, operational requirements shift toward governance, security, and integration with enterprise systems. Many organizations still run AI workloads in isolated environments that lack access to core data and standardized controls.

VMware tanzu platform graphic

Tanzu Platform agent foundations address this gap by providing a pre-engineered platform-as-a-service layer for agent workloads, built directly on VCF. This enables platform engineering teams to manage AI services alongside traditional applications with familiar tooling and processes, without requiring deep specialization in AI infrastructure.

Deny-by-Default Agent Runtime

The agentic runtime introduces a set of controls to constrain agent behavior and reduce operational risk.

The software supply chain is managed using trusted Buildpacks rather than user-defined Dockerfiles. Containers are automatically built, patched, and verified, reducing exposure to embedded vulnerabilities or malicious components.

Secrets management is enforced at the structural level, preventing agents from accessing credentials outside their scope. This isolation is reinforced by VMware vDefend, which extends protections across infrastructure services and external SaaS integrations, limiting lateral movement.

Networking uses a zero-trust model. Agents operate within predefined resource and connectivity boundaries and have no default access to internal systems or models. Access is granted explicitly via secure service bindings, ensuring agents interact only with authorized data sources and services.

Developer Onboarding and Integrated Data Services

The platform includes pre-built agent templates to accelerate onboarding. Developers can provision agents with governed access to models, Model Context Protocol servers, and curated marketplace services defined by IT.

Data services are integrated into the platform, including Tanzu for Postgres with pgvector, as well as caching, streaming, and data flow services. Support for Spring AI memory services enables stateful agent behavior that aligns with enterprise application patterns.

Operational Scaling on VMware Cloud Foundation

Tanzu Platform agent foundations integrate with VCF infrastructure APIs to abstract away resource provisioning and lifecycle management. This ensures that agent workloads and their dependencies receive the required compute, storage, and networking resources without direct interaction with the infrastructure.

Elastic scaling allows environments to scale up or down based on workload demand, supporting both short-lived and persistent agents while optimizing cost and utilization.

High availability is achieved through multiple layers of redundancy and automated remediation. The platform continuously monitors and self-heals the underlying infrastructure to maintain service continuity for mission-critical autonomous applications.

An integrated AI gateway provides centralized control of model and tool access. It manages availability, usage policies, cost controls, and safety filtering for both public and private models on VCF.

According to Purnima Padmanabhan, General Manager of the Tanzu Division at Broadcom, the rapid agentic application development is driving collaboration with customers to accelerate innovation. She highlights that Tanzu Platform agent foundations enable the rapid deployment of agentic ideas on modern private clouds, specifically using VMware Cloud Foundation 9.

With agent foundations, Broadcom is aligning Tanzu Platform with emerging enterprise AI requirements, with a focus on governance, security, and operational consistency. The approach builds on existing VMware infrastructure investments and introduces a standardized runtime for agent-based applications, making AI deployment more predictable and manageable at scale.

The post Broadcom Extends VMware Tanzu Platform with Agent Foundations for Enterprise AI appeared first on StorageReview.com.

Wasabi Technologies to Acquire Seagate Lyve Cloud Business

15 April 2026 at 18:01

Wasabi Technologies has reached a definitive agreement to acquire the Lyve Cloud business from Seagate Technology. As part of the transaction, Seagate will receive an equity stake in Wasabi, officially becoming a shareholder. While specific financial details remain undisclosed, the move marks a significant consolidation in the pure-play cloud storage market.

Wasabi security graphic

David Friend, co-founder and CEO of Wasabi, noted that the acquisition bolsters the company’s position as a leader in independent cloud storage. The integration of Lyve Cloud brings a dedicated enterprise customer base into Wasabi’s ecosystem. These customers will transition to Wasabi’s global data center infrastructure, which features specialized security tools such as Covert Copy and integrated AI capabilities. The provider intends to maintain high levels of technical support and partner integration for the incoming Lyve Cloud users.

For Seagate, the divestiture serves a specific strategic purpose. Gianluca Romano, Seagate’s CFO, indicated that the sale allows the company to refocus resources on its core mass-capacity storage hardware business. As the demand for high-capacity drives continues to climb, Seagate aims to prioritize manufacturing and innovation in hard drive technology. By transitioning the cloud service to Wasabi, Seagate ensures that a specialized provider services its existing cloud customers while the manufacturer maintains an indirect interest through its new equity position.

Engineering for Enterprise Scale

The proliferation of AI initiatives, large-scale analytics, and extensive video workloads is currently driving demand for enterprise-grade storage. As organizations manage data volumes reaching the petabyte scale, the total cost of ownership and vendor complexity become critical factors in infrastructure design. Many firms are moving away from traditional hyperscalers in favor of providers that offer predictable pricing models and robust security without the egress fees often associated with legacy cloud platforms.

Lyve Cloud established itself as a viable enterprise platform by prioritizing compliance and security features. By merging these assets with Wasabi’s established channel reach and execution strategy, the combined entity provides a streamlined alternative for professional IT environments. The acquisition aims to deliver consistent performance at scale while addressing the economic challenges of long-term data retention.

Ecosystem Integration and Data Protection

The consolidation of these two platforms simplifies the data protection and backup landscape for administrators. Both Wasabi and Lyve Cloud maintain deep integrations and certifications with leading backup software providers, including Veeam, Rubrik, and Commvault. This overlap ensures that existing automated workflows and S3-compatible API calls remain functional during and after the transition.

For channel partners and system integrators, the acquisition reduces the overhead of managing multiple independent S3-compatible storage vendors. By unifying the service under a single banner, Wasabi enhances its ability to support mission-critical backup and recovery workloads. This move strengthens the broad ecosystem of independent storage solutions, providing technical teams with a reliable, cost-effective target for enterprise data offloading.

The post Wasabi Technologies to Acquire Seagate Lyve Cloud Business appeared first on StorageReview.com.

Before yesterdayStorageReview

Supermicro Unveils Three New Edge AI Systems Built on AMD EPYC 4005

14 April 2026 at 15:52

Supermicro has introduced three new compact edge computing systems based on AMD’s EPYC 4005 series processors, expanding its push into AI workloads beyond the traditional data center. The new lineup includes the AS-E300-14GR, AS-1116R-FN4, and AS-3015TR-i4.

The systems are designed for deployments where space is tight, power is limited, and dedicated IT support may not be readily available. That makes them a good fit for settings such as retail stores, manufacturing sites, healthcare environments, and branch offices, where companies increasingly process data locally rather than sending it back to centralized infrastructure.

Supermicro says the systems are built for real-time inference and other business-critical edge workloads, with a focus on keeping power consumption and operating costs in check. Use cases include loss prevention, frictionless checkout, and in-store analytics, as well as other applications that rely on fast on-site processing.

All three systems include security features that are becoming standard requirements in edge and distributed IT environments. They support TPM 2.0 and AMD Secure Encrypted Virtualization (SEV) to help protect workloads and data. Those features are paired with IPMI 2.0 remote management, which simplifies monitoring and administration for systems deployed far from centralized IT teams.

The platforms also include four GbE ports, enabling connections to point-of-sale infrastructure, cameras, and enterprise networks. This capability is important in edge environments where a single system may need to interface with multiple devices and applications simultaneously, particularly in retail and industrial settings.

At the processor level, the new systems are built around AMD’s EPYC 4005 series, based on the company’s Zen 5 architecture. The chips support DDR5 memory and PCIe Gen 5, with TDPs starting at 65W. Some models also feature AMD’s 3D V-Cache, which can boost performance in data-intensive workloads by improving access to frequently used data.

Supermicro AS-E300-14GR

First is the AS-E300-14GR, a compact 1U mini box system housed in a 2.5-liter enclosure. It supports up to 16-core processors and up to 192GB of DDR5 memory, and is designed for embedded or space-constrained environments. Supermicro said it is suited to point-of-sale applications via HDMI and MiniDisplay connectivity, as well as network gateway roles. It includes a dedicated out-of-band management port alongside four GbE ports.

Supermicro AS-E300-14GR front view

The AS-E300-14GR is a Mini-1U embedded system that supports up to 192GB of DDR5-5600 memory across four DIMM slots, a substantial amount for a compact edge box.

On the storage side, it includes one internal 2.5-inch SATA bay, two M.2 PCIe 5.0 x4 NVMe slots, and a low-profile PCIe 5.0 x16 slot for expansion or accelerator support. Connectivity is a strong point, with four 1GbE LAN ports, a dedicated BMC management port, rear USB 3.2 ports, HDMI 2.1, and Mini-DP. All of that fits inside a fan-based embedded chassis measuring just 264.8 x 43 x 225.8mm, making it a practical option for edge deployments where space is limited but performance, networking, and remote management still matter.

Supermicro AS-E300-14GR Specifications

Specification AS-E300-14GR
Overview
Model IoT SuperServer AS -E300-14GR
System Type Mini-1U embedded system with AMD EPYC™ 4004/4005 Series Processor up to 65W TDP
Key Applications Healthcare, Surveillance Security Server, AI Inference, Digital Signage / PoS,
Form Factor Fan-based Embedded
Chassis CSE-E300
Motherboard Super H14SRV-HLN4F
Processor and Memory
Processor Single Socket AM5 (LGA-1718)
AMD EPYC™ 4005/4004 Series Processor
16C/32T; 64MB Cache
System Memory Slot Count: 4 DIMM slots
Max Memory (1DPC): 192GB 5600MT/s ECC/non-ECC DDR5 UDIMM
Storage and Expansion
Drive Bays Configuration Default: Total 1 bay
1 internal fixed 2.5″ SATA* drive bay
(*SATA support may require additional storage controller and/or cables)
M.2 1 M.2 PCIe 5.0 x4 NVMe slot (M-key 2280)
1 M.2 PCIe 5.0 x4 NVMe slot (M-key 22110)
Expansion Slots Default*
1 PCIe 5.0 x16 (in x16) LP slot
(*Requires additional parts, please see the optional parts list for details. For more details on PCIe slot configuration options, please refer to the system callout images above.)
Networking and I/O
LAN 4 RJ45 1 GbE LAN ports (Intel I350-AM4)
1 RJ45 1 GbE Dedicated BMC LAN port (ASPEED AST2600)
USB 2 USB 3.2 Gen2 Type-A ports(Rear)
1 USB 3.2 Gen1 Type-A port(Rear)
1 USB 3.2 Gen1 Type-C port(Rear)
Video 1 HDMI 2.1 port(Rear)
1 Mini-DP port(Rear)
TPM 1 TPM header
1 TPMOnboardd/port 80Onboard
d Devices AMD B650
Power, Cooling, and Management
System Cooling Fans: Up to 1 CPU heatsink with 70x70x15mm Fan(s)
Up to 2x 4-PIN PWM 40x40x28mm Fan(s)
Power Supply 1 x 180W power supply
System BIOS BIOS Type: AMI 32MB UEFI
BIOS Features: ACPI 6.5
SMBIOS 3.7 or later
UEFI 2.9
Management SuperCloud Composer; Supermicro Server Manager (SSM); Super Diagnostics Offline (SDO); Supermicro Thin-Agent Service (TAS); SuperServer Automation Assistant (SAA) New!; Plug-ins for 3rd Party Software
PC Health Monitoring FAN: Status monitor for speed control
Physical and Environmental
Enclosure 264.8 x 43 x 225.8mm (10.43″ x 1.69″ x 8.89″)
Package 381 x 276 x 142mm (15″ x 10.87″ x 5.59″)
Weight Gross Weight: 7.5 lbs (3.4 kg)
Net Weight: 3.7 lbs (1.6 kg)
Available Color Black
Operating Environment Operating Temperature: 0°C to 40°C (32°F to 104°F) with 0.7 m/s airflow
Non-operating Temperature: -40°C to 70°C (-40°F to 158°F)
Operating Relative Humidity: 8% to 90% (non-condensing)
Non-operating Relative Humidity: 5% to 95% (non-condensing)

Supermicro AS-1116R-FN4

The AS-1116R-FN4 is a compact 1U rackmount system designed for installations where storage density and rack efficiency are priorities. It is geared toward branch offices and retail back-end consolidation, where organizations may want to consolidate multiple workloads into a smaller physical footprint.

Supermicro-AS1116R-FN4

The AS-1116R-FN4 takes a more rack-focused approach while keeping the footprint compact. It is also a Mini-1U system with a 249mm short-depth chassis and support for up to 192GB of DDR5-5600 memory. Storage is more flexible than in the smaller box system, with support for either two internal 2.5-inch NVMe bays or one internal 3.5-inch SATA bay, plus two M.2 PCIe 5.0 slots.

It also includes a low-profile PCIe 5.0 x16 expansion slot, four 1GbE LAN ports, a dedicated BMC management port, rear USB connectivity, HDMI 2.1, and Mini-DP. With a 200W Gold power supply, three counter-rotating fans, and remote management support, it is well-suited for branch, retail back-end, and other edge deployments that require server-class features in a very compact rackmount chassis.

Supermicro AS-1116R-FN4 Specifications

Specification AS-1116R-FN4
Overview
Model IoT SuperServer AS -1116R-FN4
System Type H14 1U Ultra-short depth 249mm chassis with AMD EPYC 4005/4004 65W server
Key Applications AI Inference and Machine Learning, Cloud Computing, Healthcare, Surveillance Security Server
Form Factor Mini-1U
Chassis CSE-505-203B
Motherboard Super H14SRV-HLN4F
Processor and Memory
Processor Single Socket AM5 (LGA-1718)
AMD EPYC 4005/4004 Series Processor
16C/32T; 64MB Cache
System Memory Slot Count: 2 DIMM slots
Max Memory (1DPC): 192GB 5600MT/s ECC/non-ECC DDR5 UDIMM
Storage and Expansion
Drive Bays Configuration Default: Total 2 bays
2 internal fixed 2.5″ NVMe drive bays
Option A: Total 1 bay
1 internal fixed 3.5″ SATA drive bay
M.2 1 M.2 PCIe 5.0 x4 slot (M-Key 2280)
1 M.2 PCIe 5.0 x4 NVMe slot (M-key 22110)
Expansion Slots 1 PCIe 5.0 x16 (in x16) LP slot
Networking and I/O
LAN 4 RJ45 1 GbE LAN ports (Intel I350-AM4)
1 RJ45 1 GbE Dedicated BMC LAN port (ASPEED AST2600)
USB 1 USB 3.2 Gen2 Type-C port(Rear)
2 USB 3.2 Gen2 Type-A ports(Rear)
1 USB 3.0 Gen2 Type-A port(Rear)
Video 1 HDMI 2.1 port(Rear)
1 Mini-DP port(Rear)
TPM 1 TPM header
1 TPOnboardrd/port 8Onboard
rd Devices AMD B650
Power, Cooling, and Management
System Cooling Fans: 3 counter-rotating 40x40x28mm Fan(s)
Air Shroud: 1 Air Shroud
Power Supply 1x 200W Gold Level (91%) power supply
System BIOS BIOS Type: AMI 32MB UEFI
BIOS Features: ACPI 6.5
SMBIOS 3.7 or later
UEFI 2.9
Management SuperCloud Composer; Supermicro Server Manager (SSM); Super Diagnostics Offline (SDO); Supermicro Thin-Agent Service (TAS); SuperServer Automation Assistant (SAA) New!; Plug-ins for 3rd Party Software
PC Health Monitoring FAN: Status monitor for speed control
Physical and Environmental
Enclosure 437 x 43 x 249mm (17.2″ x 1.7″ x 9.8″)
Package 655 x 155 x 465mm (25.8″ x 6.1″ x 18.3″)
Weight Gross Weight: 10 lbs (4.54 kg)
Available Color Black
Operating Environment Operating Temperature: 0°C to 40°C (32°F to 104°F)
Non-operating Temperature: -40°C to 70°C (-40°F to 158°F)
Operating Relative Humidity: 8% to 90% (non-condensing)
Non-operating Relative Humidity: 5% to 95% (non-condensing)

Supermicro AS-3015TR-i4

Lastly, the AS-3015TR-i4 is a slim tower system designed for quieter environments and for easier installation in edge locations without dedicated server rooms. The data sheet was unavailable; however, the tower can accommodate a dual-slot GPU measuring 2.7 inches high by 6.6 inches long, including support for a dual-slot GPU such as the NVIDIA RTX PRO 2000 Blackwell. The 9-liter chassis also includes options for a slim optical drive and a 3.5-inch disk drive, providing additional flexibility for edge deployments that still require local media or storage.

Supermicro AMD EPYC 4000 Systems

The post Supermicro Unveils Three New Edge AI Systems Built on AMD EPYC 4005 appeared first on StorageReview.com.

Ubiquiti UniFi G6 Turret Review: 4K PoE Camera with On-Device AI for $199

13 April 2026 at 20:52
wall mounted ubiquiti g6 turret wall mounted ubiquiti g6 turret

Ubiquiti’s G6 Turret is a 4K PoE camera with a turret design, featuring on-device face and license plate recognition and full UniFi Protect integration, all at a $199 price point. The turret design sets it apart from traditional domes by placing the lens module in a ball-and-socket housing. You can physically adjust the module on three axes after mounting, giving installers direct control over framing without being locked into the bracket’s angle. For jobs involving a specific entry lane, a retail counter, or a tight corridor, this hands-on adjustability considerably speeds up installation.

wall mounted ubiquiti g6 turret

Hardware Overview

The G6 Turret has a 1/1.8″ 8MP sensor and a quad-core processor powered by a Multi-TOPS AI Engine. In addition to local face and license plate recognition, this small camera offers 30-meter IR night vision and connects to UniFi Protect over standard PoE, without requiring PoE+.

The IK04 rating makes this camera better suited to controlled commercial spaces than high-exposure public areas. As a result, it belongs in offices, retail shops, or covered entrances rather than unmonitored street-side mounts, where frequent physical tampering isn’t expected.

Specification Ubiquiti UniFi G6 Turret
General
Dimensions ⌀100 × 95 mm (⌀3.9 × 3.7″)
Weight 550 g (1.2 lb)
Enclosure Material Aluminum alloy, polycarbonate
Weatherproofing IP66
Tamper Resistance IK04
Ambient Operating Temperature -30 to 50°C (-22 to 122°F)
Ambient Operating Humidity 0 to 90% noncondensing
Button (1) Factory reset
Video
Resolution 4K  8MP 3864 × 2160 (16:9)
Max. Frame Rate 30 FPS
Image Settings Color, brightness, sharpness, contrast, white balance, exposure control, 2DNR, 3DNR, NR by motion, masking, text overlay, HDR
Optics
Sensor 1/1.8″ 8MP
Lens Fixed focal length
Field of View H: 109.9°, V: 56.7°, D: 134.1°
Night Mode Built-in IR LED illumination and IR cut filter
IR Night Vision Range 30 m (98 ft)
Intelligence
Face Recognition
License Plate Recognition
Smart Detections People, Vehicles, Animals
Audio
Audio Microphone
Hardware
Processor Quad-core ARM Cortex-A53-based chip
Power
Power Method PoE
Supported Voltage Range 37 – 57V DC
Max. Power Consumption 12.5W
Networking
Networking Interface 10/100 MbE RJ45 port
UniFi Application Suite Protect
Cable
Cable Connector Type RJ45
Cable Diameter 4.5 mm (0.2″)
Cable Length 30 cm (1 ft)
Jacket Material Thermoplastic elastomer
Jacket Enclosure Dimensions ⌀20 × 70.6 mm (0.8 × 2.8″)
Jacket Enclosure Material Thermoplastic elastomer, polycarbonate, silicone rubber
Mounting
Included Mounting Ceiling, Wall
Optional Mounting Arm, Pendant, Junction Box

Design and Build

The turret form factor works differently from a dome. Rather than positioning a fixed lens behind a polycarbonate cover, the G6 Turret places its lens module in an exposed ball-and-socket housing that rotates freely until you tighten it down. Three-axis adjustment allows independent pan, tilt, and rotation, which is particularly useful on wall mounts, where a ceiling-only mount angle would otherwise require repositioning the entire bracket. Only a screwdriver is needed for adjustments, so framing the shot on-site is quick.

rear mounted-side view of the ubiquiti g6 turret

The camera measures ⌀100 × 95 mm and weighs 550 g (1.2 lb). Build quality is solid throughout, with an aluminum alloy and polycarbonate construction that matches the broader G6 lineup. The white finish blends cleanly against standard commercial ceilings, though the exposed ball joint makes this camera more visible than a low-profile dome. If a discreet install is a priority, a recessed dome is the better choice.

IP66 weatherproofing allows for outdoor use without a cover, so it handles car parks, entry canopies, and similar positions without issue. The IK04 rating covers standard commercial use cases but isn’t suited to high-impact or high-interference locations. The operating temperature range runs from -30 to 50°C (-22 to 122°F), so cold climates aren’t a concern either.

Optics and AI

The 1/1.8″ 8MP sensor records 4K at 30 FPS with a full image settings suite including HDR, 2DNR, 3DNR, masking, and text overlay. The field of view spans 109.9 degrees horizontal and 134.1 degrees diagonal, which is wide enough to cover most fixed camera positions without needing to zoom in on subjects. Built-in IR LED illumination handles night operation out to 30 meters (98 ft), and an IR cut filter switches automatically at dusk.

ubiquiti g6 turret night vision view

On-device AI runs via the quad-core Arm Cortex-A53 and the Multi-TOPS AI Engine. Face and license plate recognition process locally at the camera level rather than waiting on the NVR, which keeps alert latency low and reduces load on the host recorder. Smart detection monitors people, vehicles, and animals and works with Protect’s configurable zone masking to deliver targeted alerts.

in software face detection view of ubiquiti g6 turret

The fixed focal-length lens consistently covers the full field of view without barrel distortion, so identification accuracy remains high. Physical three-axis adjustment handles positioning, and once you tighten the ball joint, the framing holds reliably.

Management and Installation

The G6 Turret operates on standard PoE with a maximum power draw of 12.5W, which stays within the 15.4W limit of 802.3af, eliminating the need for PoE+ switching. Even so, you should account for the draw when budgeting power across a dense switch. The included 30 cm (1 ft) pigtail features an RJ45 connector with a thermoplastic elastomer jacket that seals the connection cleanly at the camera body. Protect detects the camera immediately upon first power-up and then guides you through setup.

ubiquiti unifi protect software device view of g6 turret

Once you adopt the camera, the UniFi Protect dashboard provides centralized management for connection status, image tuning, and recording settings. Adjusting the framing is straightforward and takes about a minute. Loosen the collar, rotate the lens, and retighten, which is considerably faster than repositioning a fixed dome’s backplate.

alternate rear view of camera adjustment for ubiquiti g6 turret

Before deploying the G6 Turret, a few practical details are worth noting. First, Protect lets you configure motion and smart detection masks independently, so you can exclude footpaths or busy roads from triggering alerts without turning off detection across the entire frame. Second, the G6 Turret has no MicroSD slot and therefore won’t record locally if the NVR connection drops. Finally, ceiling and wall mounts come in the box, and arm, pendant, and junction box mounts are available separately for non-standard implementations.

Conclusion

Overall, the G6 Turret delivers 4K at 30 FPS, on-device face and license plate recognition, and 30-meter IR, all for $199. The three-axis manual adjustment is a genuine practical advantage in the field, especially on wall mounts, where fixed cameras require more bracket work. Additionally, on-device AI processing via the Multi-TOPS engine keeps detection fast and reduces NVR load.

Ubiquiti G6 Turret Side view

That said, the IK04 rating and the absence of local storage are worth confirming against your deployment requirements before you commit. For controlled commercial spaces, retail, and offices, those limitations rarely matter. Overall, the G6 Turret is a well-specified camera that integrates cleanly into any UniFi Protect system.

Product Page – G6 Turret

The post Ubiquiti UniFi G6 Turret Review: 4K PoE Camera with On-Device AI for $199 appeared first on StorageReview.com.

Supermicro JumpStart Review: H14 with AMD Instinct MI350X

13 April 2026 at 19:50

Supermicro’s JumpStart program has established itself as one of the more useful tools in the pre-purchase evaluation toolkit for AI infrastructure. Rather than a scripted demo in a shared environment, JumpStart gives qualified users free, time-boxed, bare-metal access to real production servers via SSH, IPMI, and VNC, enabling them to run workloads on actual hardware. We covered the program in depth last November using an X14 system with an NVIDIA HGX B200, and came away with a clear picture of what a week of focused access can and cannot tell you. This time, Supermicro provided access to an H14 8U system with a very different accelerator story.

We tested the AS-8126GS-TNMR system, an 8U air-cooled platform built around dual AMD EPYC 9575F processors and eight AMD Instinct MI350X GPUs. The MI350X is AMD’s current flagship data center accelerator, built on the 4th Gen CDNA architecture at TSMC’s 3nm node and featuring 288GB of HBM3e per GPU. Across eight GPUs interconnected via AMD Infinity Fabric, the server offers 2.3 TB of total GPU memory in a single node, with an aggregate bandwidth of 1,024 GB/s. The full system uses six 5,250W Titanium-level power supplies in a 3+3 redundant configuration, and Supermicro has provisioned dedicated 400 Gbps networking per GPU for scale-out deployments.

GPU A+ Server AS -8126GS-TNMR Front.

AMD’s position in the data center GPU market has shifted meaningfully in the past two years, and the MI350X generation represents a more serious competitive challenge to NVIDIA than any prior Instinct product. ROCm 7, released in September 2025 and now at version 7.2, brought native MI350X support alongside dramatically improved inference performance, HIP API updates that close the CUDA compatibility gap, and broadened framework support, including PyTorch, JAX, TensorFlow, ONNX Runtime, vLLM, and SGLang.

The vLLM project added a dedicated AMD ROCm CI pipeline in late December 2025, making AMD hardware a first-class platform in that inference stack rather than a downstream port. The ecosystem’s adoption is also hard to ignore: AMD and Meta announced a multi-year, multi-generation 6-gigawatt GPU deployment agreement in February 2026, building on Meta’s existing production deployments of MI300 and MI350 series hardware. That level of commitment from one of the world’s largest AI infrastructure operators is no marketing footnote.

For organizations currently evaluating AI accelerator infrastructure, the lead time for NVIDIA hardware remains a concern. The question is whether AMD is a credible alternative rather than a fallback. Based on a week of testing with ROCm 7.2.0 and the current vLLM, the answer is meaningfully different from what it was 18 months ago.

GPU A+ Server AS -8126GS-TNMR side profile

Our testing covered a selection of popular models; the 2.3TB of HBM3e across a single node enabled single-server inference on large-parameter models, including Moonshot’s Kimi K2.5 and MiniMax M2.5.

AMD Instinct MI350X: Architecture and Generational Improvements

The MI350X represents AMD’s most architecturally ambitious generational leap in the Instinct product line to date. Understanding the engineering decisions behind it provides important context for interpreting the subsequent performance results.

CDNA 4 Architecture and Process Node Transition

The foundational shift from the MI300 series to the MI350 series centers on adopting TSMC’s N3P process node for the Accelerator Compute Chiplets (XCDs), moving from the 5nm fabrication used in the prior generation. The total transistor count reaches approximately 185 billion, a roughly 21% increase over the MI300 generation, achieved without a corresponding increase in power consumption.

The MI350X retains AMD’s proven multi-chiplet packaging strategy. At its core, the GPU package features eight Accelerator Compute Chiplets (XCDs) as the primary computational engines. Each XCD houses four shader engines, each with eight active CDNA 4 compute units, yielding 32 CUs per XCD and a total of 256 CUs for the full accelerator.

The I/O Die layer was also consolidated from four tiles to two in the CDNA 4 package design. This reorganization enabled AMD to double the Infinity Fabric bus width, improving bi-sectional bandwidth while lowering the bus frequency and operating voltage to reduce power consumption.

Redesigned Compute Units and Expanded Precision Support

With the CDNA 4 compute unit matrix math capabilities, there is a substantial boost: MI350 CUs deliver a 2x increase in throughput per CU for 16-bit (BF16, FP16) and 8-bit (FP8, INT8) operations compared to their MI300 counterparts.

Beyond raw throughput gains, CDNA 4 introduces hardware support for lower-precision data types absent from the MI300 series, specifically FP6 and FP4, alongside the existing FP8 support carried forward from the prior generation.

In addition to these standard formats, the MI350X adds native hardware support for the OCP microscaling variants: MXFP4, MXFP6, and MXFP8. Microscaling formats are designed to deliver the throughput advantages of lower-precision compute while maintaining output quality closer to higher-precision baselines than standard quantization typically allows. This is not an AMD-specific development. NVIDIA’s NVFP4 format operates on the same microscaling principles and has seen broad adoption across frontier model deployments, with the GPT-OSS family from OpenAI as one of the most prominent examples built around these formats. The MI350X’s native MXFP4 support allows it to serve these and similar quantized model families without falling back to software emulation or precision promotion.

The MI350X delivers 9.2 PFLOPs at MXFP4 and MXFP6, compared with 4.6 PFLOPs at OCP-FP8, with FP16 at 2.3 PFLOPs and a peak engine clock of 2,200 MHz. For inference-optimized deployments where microscaling quantization is viable, the compute headroom effectively doubles relative to FP8 workloads. A new vector ALU has also been added to the CDNA 4 compute unit, supporting 2-bit operations and capable of accumulating BF16 results into FP32, providing additional flexibility for low-precision vector workloads outside the primary matrix compute path.

Memory Subsystem: HBM3e, Infinity Cache, and Bandwidth Efficiency

The MI350 series features a substantially upgraded memory subsystem with eight HBM3e memory stacks, providing a total capacity of 288GB per GPU. Each 36GB stack, composed of 12-high 24Gbit devices, operates at the full HBM3e pin speed of 8Gbps per pin. The architecture retains AMD’s Infinity Cache, a memory-side cache positioned between the HBM and the Infinity Fabric/L2 caches. It comprises 128 channels, each backed by 2 MB of cache, for a total of 256 MB per GPU. AMD has widened the on-die network buses within the IODs and operates them at a reduced voltage, enabling approximately 1.3x higher memory bandwidth per watt compared to the MI300 series.

The increase in memory capacity from the MI300X’s 192GB to 288GB extends AMD’s lead in per-GPU memory headroom, with direct implications for large-model inference. Each MI350X GPU can independently host models with more than 500 billion parameters. Across an eight-GPU server, the aggregate 2.3TB of HBM3e eliminates the multi-node distribution requirements that complicate trillion-parameter deployments, as the Kimi K2.5 and MiniMax M2.5 results in this review demonstrate.

Flexible Partitioning and Deployment Architecture

The MI350 series supports flexible GPU partitioning per socket, with memory split into two separate clusters. This flexibility also applies to the XCDs, where the quad XCD cluster can be split into dual or single blocks, enabling the chip to support configurations such as 8 instances of 70B models in CPX+NPS2. For organizations running heterogeneous inference workloads across shared infrastructure, this partitioning capability reduces the need for dedicated hardware per model tier and improves utilization economics across mixed deployment environments.

The MI350 series also maintains drop-in compatibility with the UBB (Universal Base Board) infrastructure used in MI300 Series systems. Existing server chassis, power delivery, and cooling infrastructure carry forward without modification, reducing upgrade friction for organizations with active MI300 deployments.

MI355X: The Liquid-Cooled Sibling

The MI350 series ships in two variants built on identical underlying silicon and optimized for different thermal operating envelopes. The MI350X tested here is the air-cooled variant, while the MI355X is its liquid-cooled counterpart, designed for higher-density deployments where direct liquid-cooling infrastructure is available.

While both variants are built on the same fundamental hardware, the MI355X’s higher operational power envelope enables higher sustained clock frequencies, resulting in an approximate 20% performance advantage in real-world, end-to-end workloads compared to the MI350X. The MI355X carries a TBP ceiling of 1,400W versus the MI350X’s 1,000W, with clock speed topping out at 2.4 GHz compared to 2.2 GHz on the air-cooled variant.

In generational terms, the MI355X platform delivers up to 4x peak theoretical performance improvement over the MI300X, with real-world inference gains of approximately 4.2x in agentic and chatbot workloads and about 3x in content generation scenarios. For organizations evaluating MI350X deployments, the 20% performance differential between the two variants represents a clear ceiling. Facilities with DLC infrastructure should evaluate the MI355X to determine whether the thermal investment yields sufficient throughput uplift for their specific workload profile before committing to air-cooled configurations at scale.

Accessing the AMD Instinct MI350X via Supermicro JumpStart Program

Getting started with JumpStart requires registering on Supermicro’s portal, where qualified users can browse available systems and schedule a reservation window. Once approved, the portal provides SSH credentials, IPMI access, and a web-based remote console for the duration of the booking. The system arrives preinstalled with Ubuntu and ready to use. There is no provisioning delay and no support interaction required to get started. Our reservation ran from March 23 through March 27, 2026, giving us a full week on the platform, consistent with our prior JumpStart engagement on the HGX B200.

The screenshot below shows our terminal output from jumpstart for the H14 system, with the AMD-SMI tool displaying the eight AMD Instinct MI350X GPUs and their running software versions.

AMD Instinct MI350X Performance Testing Results

System Configuration

  • Chassis: Supermicro H14
  • CPU: Dual AMD EPYC 9575F
  • Memory: 3TB DDR5
  • GPU: eight AMD Instinct MI350X
  • Storage: 2x 3.8TB PCIe 4.0 M.2 NVMe SSD and 1 x 1.92TB NVMe M.2

Summary of Results

Model Precision Equal (256/256) Prefill-Heavy (8k/1k) Decode-Heavy (1k/8k)
GPT-OSS 20B NVFP4 62,247 123,714 32,468
GPT-OSS 120B NVFP4 33,538 84,018 20,602
Llama 3.1 8B Instruct BF16 51,467 77,658* 19,326
Mistral Small 3.1 24B FP8 40,742 56,093 14,557
Mistral Small 3.1 24B BF16 30,530 53,740 13,559
Qwen3 Coder 30B A3B BF16 34,980 51,550 11,782
Qwen3 Coder 30B A3B FP8 25,928 47,179 11,014
MiniMax M2.5 Block-Scaled FP8 14,391 23,689 6,068
Kimi K2.5 INT4 QAT + BF16 6,527 11,256 2,513
All values in tok/s, peak throughput at BS=256. *Llama 3.1 8B prefill-heavy peaked at BS=128 (77,658 tok/s); BS=256 was 76,893 tok/s.

Claude Code Serving – MiniMax M2.5

Beyond traditional raw LLM inference benchmarks, we wanted to evaluate how well this hardware performs in an agentic coding workflow, specifically serving multiple concurrent Claude Code sessions with a locally hosted model. This use case maps directly to development team productivity: how many engineers can simultaneously use an AI coding assistant served from a single node before the experience degrades?

To test this, we built a benchmark harness that generates a dataset of moderately difficult coding problems (tasks like implementing an LRU cache, building a CLI todo application, writing a markdown converter, and constructing a REST API) and runs each Claude Code session in its own Docker container against the local vLLM server. A transparent proxy sits between the sessions and the inference endpoint, capturing per-request metrics for each Claude code instance. The model used was MiniMax M2.5, served via vLLM on the eight MI350X GPUs. While not the top-ranked coding model on public leaderboards, M2.5 is a capable model that many users run locally, including many of our developer friends.

For a baseline reference point, we use Anthropic’s Claude Opus 4.6 average output throughput via OpenRouter.ai, one of the most popular routing services for production API access. That baseline comes in at approximately 37 tokens per second per API request.

We measured two key metrics: the average output tokens per second per Claude Code session (what each developer experiences) and the aggregate output tokens per second across all sessions (the total work the server is producing).

Looking at the results, a single concurrent session delivers 38.8 tok/s per user and 38 tok/s aggregate, slightly above the OpenRouter cloud baseline. At two sessions, the system edges up to 39.5 tok/s per user as vLLM’s batching begins to amortize overhead, with aggregate throughput climbing to 63 tok/s. Four concurrent sessions are held at 37.3 tok/s per user, matching the cloud baseline while serving four developers simultaneously, with aggregate throughput reaching 128 tok/s. From eight sessions onward, per-instance throughput begins to decline: 34.6 tok/s per user at eight sessions, 31.4 tok/s at sixteen with an aggregate of 190 tok/s, and settles around 23 tok/s per user at 32 and 64 sessions, while aggregate throughput climbs to 578 tok/s and 986 tok/s, respectively. This is the classic batching-throughput-versus-interactivity trade-off: the system can achieve significantly higher total throughput by batching more requests, but each user experiences slower response times. Even at 64 concurrent users, each developer still sees a usable interactive experience, though noticeably slower than the cloud baseline.

For organizations weighing the cost of dozens of simultaneous commercial API subscriptions against self-hosted infrastructure, the tradeoff is clear: a single MI350X node can serve a development team of 16 to 32 engineers, maintaining per-user response speeds within 60-85% of the cloud baseline while delivering aggregate output of 600 to 1,000 tok/s, with added benefits of data locality, no per-token API charges, and full control over model selection.

vLLM Online Serving – LLM Inference Performance

vLLM is one of the most popular high-throughput inference and serving engines for LLMs. The vLLM online serving benchmark evaluates the real-world serving performance of this inference engine under concurrent requests. It simulates production workloads by sending requests to a running vLLM server, with configurable parameters such as request rate, input/output lengths, and the number of concurrent clients. The benchmark measures key metrics, including throughput (tokens per second), time to first token, and time per output token (TPOT), helping users understand how vLLM performs under different load conditions.

We tested inference performance across a comprehensive suite of models spanning various architectures, parameter scales, and quantization strategies to evaluate throughput under different concurrency profiles.

GPT-OSS 120B and 20B

The GPT-OSS model family was tested in both 120B and 20B configurations on the Supermicro H14.

GPT-OSS 120B

The 120B model under an equal workload (256/256) delivers 313.42 tok/s at BS=1, reaches 11,261.72 tok/s at BS=64, and peaks at 33,538.23 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 1,724.84 tok/s, climbs to 36,156.80 tok/s at BS=32 and 79,247.76 tok/s at BS=128, peaking at 84,018.79 tok/s at BS=256. Decode-heavy (1k/8k) grows from 288.90 tok/s at BS=1 to 20,602.52 tok/s at BS=256, with latency remaining well-controlled at lower concurrency levels.

GPT-OSS 20B

The 20B model delivers 485.17 tok/s at BS=1 under the equal workload, reaching 17,986.36 tok/s at BS=64 and peaking at 62,247.52 tok/s at BS=256. Prefill-heavy starts at 3,120.72 tok/s, climbs to 48,132.52 tok/s at BS=32 and 83,968.71 tok/s at BS=64, peaking at 123,714.50 tok/s at BS=256—the highest absolute prefill throughput recorded across both model sizes. Decode-heavy grows from 378.20 tok/s at BS=1 to 32,468.67 tok/s at BS=256, delivering roughly 1.6× the decode throughput of the 120B at peak concurrency while maintaining tighter latency characteristics throughout.

Qwen3 Coder 30B A3B Instruct and FP8 Instruct

The Qwen3-Coder-30B-A3B-Instruct on the Supermicro H14 was tested at both standard (BF16) and FP8 precisions.

Qwen3-Coder-30B-A3B-Instruct (BF16)

At BF16, the equal workload (256/256) delivers 240.53 tok/s at BS=1, reaching 13,312.70 tok/s at BS=64 and 21,333.79 tok/s at BS=128, with peak throughput of 34,980.97 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 1,276.76 tok/s, climbs to 25,069.32 tok/s at BS=32 and 50,198.94 tok/s at BS=128, peaking at 51,550.66 tok/s at BS=256. Decode-heavy (1k/8k) grows steadily from approximately 188 tok/s at BS=1 to 11,782 tok/s at BS=256, maintaining the tightest latency profile of the three scenarios.

Qwen3-Coder-30B-A3B-Instruct (FP8)

The FP8 variant delivers 188.92 tok/s at BS=1 under the equal workload, reaching 10,866.27 tok/s at BS=64 and 17,617.60 tok/s at BS=128, peaking at 25,928.77 tok/s at BS=256—running slightly behind BF16 results across the full range. Prefill-heavy starts at 860.07 tok/s, climbs to 20,513.77 tok/s at BS=32 and 44,205.46 tok/s at BS=128, peaking at 47,179.15 tok/s at BS=256. Decode-heavy grows from 133.79 tok/s at BS=1 to 11,014.95 tok/s at BS=256, scaling consistently and remaining close to BF16 throughout.

Mistral Small 3.1 24B Instruct 2503

The Mistral-Small-3.1-24B-Instruct-2503 on the H14 was tested with both standard and FP8-dynamic precision, showing consistent scaling across all three workload profiles.

Mistral-Small-3.1-24B-Instruct-2503 (BF16)

With BF16 precision, the equal workload (256/256) delivers 236.15 tok/s at BS=1, reaching 15,494.56 tok/s at BS=64, 24,216.52 tok/s at BS=128, and peaking at 30,530.54 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 1,429.41 tok/s, climbs to 29,631.68 tok/s at BS=32 and 54,871.74 tok/s at BS=128, peaking at 53,740.04 tok/s at BS=256. Decode-heavy (1k/8k) grows from 242.66 tok/s at BS=1 to 13,559.19 tok/s at BS=256, scaling steadily across the full range.

Mistral-Small-3.1-24B-Instruct-2503 (FP8-dynamic)

The FP8-dynamic variant delivers 184.25 tok/s at BS=1 under the equal workload, reaching 16,113.95 tok/s at BS=64 and 26,409.01 tok/s at BS=128, peaking at 40,742.04 tok/s at BS=256. Prefill-heavy starts at 1,210.06 tok/s, climbs to 28,773.52 tok/s at BS=32 and 57,765.02 tok/s at BS=128, peaking at 56,093.09 tok/s at BS=256, leading the standard precision result from BS=64 onward. Decode-heavy grows from 183.94 tok/s at BS=1 to 14,557.94 tok/s at BS=256, tracking closely through mid-range before pulling slightly ahead at BS=128 and BS=256.

Llama 3.1 8B Instruct

For the Llama-3.1-8B-Instruct, we saw the equal workload (256/256) delivers 373.26 tok/s at BS=1, reaching 19,363.33 tok/s at BS=64, 34,155.70 tok/s at BS=128, and peaking at 51,467.30 tok/s at BS=256. Prefill-heavy (8k/1k) starts at 1,959.04 tok/s, climbs to 37,227.63 tok/s at BS=32 and 60,062.40 tok/s at BS=64, peaking at 77,658.50 tok/s at BS=128 before tailing off slightly to 76,893.77 tok/s at BS=256. Decode-heavy (1k/8k) starts at 326.48 tok/s, reaching 17,877.52 tok/s at BS=128 and 19,326.35 tok/s at BS=256, maintaining lower per-token latency further into the concurrency range than any of the larger models tested.

MiniMax M2.5

The MiniMax-M2.5 on the H14 rounds out the model lineup, sitting between the Kimi K2.5 and the mid-sized models in terms of throughput profile, with characteristics that reflect its mixture-of-experts architecture. The equal workload (256/256) delivers 79.31 tok/s at BS=1, reaching 5,029.76 tok/s at BS=64, 7,801.10 tok/s at BS=128, and 14,391.98 tok/s at BS=256. Prefill-heavy (8k/1k) shows the strongest scaling of the three scenarios, starting at 424.41 tok/s and climbing to 10,376.75 tok/s at BS=32 and 20,658.57 tok/s at BS=128, peaking at 23,689.18 tok/s at BS=256. Decode-heavy (1k/8k) scales steadily to 4,257.68 tok/s at BS=128 and 6,068.70 tok/s at BS=256, offering the most consistent latency growth across the full concurrency range.

Kimi K2.5

The Kimi K2.5 1-trillion-parameter model on the H14 is the largest and smartest model tested in this review, and its throughput reflects that weight.

The equal workload (256/256) delivers 72.06 tok/s at BS=1, reaching 2,693.07 tok/s at BS=64, 4,244.27 tok/s at BS=128, and peaking at 6,527.62 tok/s at BS=256. Prefill-heavy (8k/1k) scales more aggressively, starting at 185.29 tok/s and reaching 3,798.85 tok/s at BS=32 and 9,153.12 tok/s at BS=128, with peak throughput of 11,256.69 tok/s at BS=256. The step increase from BS=128 to BS=256 carries a significant latency cost, indicating the system is approaching its memory and compute limits at full batch depth for this model size. Decode-heavy (1k/8k) grows from 29.88 tok/s at BS=1 to 2,513.85 tok/s at BS=256, delivering the tightest scaling curve of the three scenarios while demonstrating consistent throughput gains across the full range.

Conclusion

The AMD Instinct MI350X delivers very competitive inference performance across the workload profiles tested here, and the Supermicro AS-8126GS-TNMR provides a well-engineered platform to take advantage of it. With 288GB of HBM3e per accelerator and eight GPUs interconnected over Infinity Fabric, the 2.3TB of aggregate GPU memory available in a single node is sufficient to serve trillion-parameter models like Kimi K2.5 and MiniMax M2.5 without requiring multi-node distribution or model-partitioning workarounds. This capability materially simplifies deployment architecture for large-scale inference.

Smaller models also delivered strong results. Llama 3.1 8B exceeded 77,000 tok/s under prefill-heavy workloads, and mid-range architectures such as Mistral Small 3.1 24B and Qwen3 Coder 30B sustained high throughput with well-controlled latency across the concurrency range. Across the board, the results indicate a hardware platform that scales predictably under load rather than falling off a cliff at higher batch depths.

GPU A+ Server AS -8126GS-TNMR rear

ROCm 7.2 brings significant improvements to the AMD inference software stack, particularly when paired with vLLM 0.18. This pairing delivers a noticeably more stable and higher-performing serving experience than prior ROCm generations, with broader framework support and fewer of the rough edges that characterized earlier Instinct deployments. The ecosystem momentum around AMD hardware is also worth noting: upstream vLLM now maintains a dedicated AMD ROCm CI pipeline, and Meta’s multi-generation deployment commitment at the 6-gigawatt scale reinforces that production validation extends well beyond controlled benchmarking environments.

The Claude Code serving evaluation adds a practical lens to the raw throughput numbers. A single MI350X node sustained near-cloud-baseline response speeds for up to 16 concurrent coding sessions and remained interactive with up to 64 simultaneous users while producing nearly 1,000 tok/s of aggregate output. For organizations weighing the cost of commercial API subscriptions against self-hosted infrastructure, the economics become straightforward at that density, with additional advantages in data locality, elimination of per-token costs, and unrestricted model selection.

Supermicro’s JumpStart program continues to earn its place in the infrastructure evaluation process. Bare-metal access to production hardware, with no provisioning overhead, allowed us to run real workloads under real-world conditions throughout the full test window. For teams conducting accelerator procurement evaluations, this level of hands-on access remains far more informative than spec sheet comparisons or curated vendor demonstrations.

Supermicro JumpStart Program

Product Page – GPU A+ Server AS -8126GS-TNMR

The post Supermicro JumpStart Review: H14 with AMD Instinct MI350X appeared first on StorageReview.com.

Scale Computing and Nexsan Address Asymmetric Growth in HCI Environments

8 April 2026 at 16:32
Scale computing hypervisor graphic Scale computing hypervisor graphic

While hyperconverged infrastructure (HCI) has simplified virtualization via streamlined deployments and reduced operational overhead, traditional architectures often struggle with asymmetric scaling. This is particularly evident when storage requirements for large unstructured datasets outpace compute needs, forcing IT teams into inefficient and costly node expansions.

scale computing and nexsan logos

To address this imbalance, Scale Computing and Nexsan have introduced a joint architecture that integrates the SC//HyperCore virtualization suite with enterprise-grade external storage. This combined solution allows organizations to decouple storage growth from compute resources, providing a scalable and cost-effective model for capacity-intensive workloads like video retention, backup repositories, and long-term archives.

Addressing Real-World Infrastructure Constraints

Many IT teams are modernizing their infrastructure while still managing legacy storage investments and growing volumes of unstructured data. Requirements such as long-term video retention, secure backup strategies, and preservation of existing SAN and NAS assets create architectural friction. Traditional approaches often force a tradeoff between adopting fully integrated HCI stacks or continuing with less efficient legacy systems.

The combined Scale Computing and Nexsan approach avoids this binary decision. It enables organizations to retain the simplicity of HCI for core workloads while extending storage capacity through external systems that scale independently.

Architecture Overview

SC//HyperCore provides a tightly integrated virtualization platform with built-in high availability and simplified lifecycle management. It is designed to minimize administrative overhead, particularly in edge and remote deployments.

Scale computing hypervisor graphic

Nexsan complements this with a portfolio of external storage platforms that support block, file, and object protocols. These systems are designed for capacity scaling, long-term retention, and data protection. Together, the platforms enable a hybrid model in which performance-sensitive workloads remain on-cluster while capacity-heavy datasets are offloaded to external storage.

This separation allows IT teams to align infrastructure decisions with actual workload characteristics rather than forcing all applications into a single scaling model.

Edge and Distributed Use Cases

The joint solution is particularly relevant in edge environments across sectors such as retail, healthcare, manufacturing, education, and government. These deployments often require local compute resources to ensure application performance while supporting centralized data strategies.

Nexsan e-series e60 image front facing

SC//HyperCore simplifies operations at remote sites with limited IT presence, while Nexsan platforms handle the associated data growth. This includes centralized archives, backup repositories, and long-term video storage. The result is an edge-to-core architecture that maintains edge simplicity without sacrificing enterprise storage capabilities.

Flexible Storage Integration

A key aspect of the joint approach is support for multiple storage access methods based on workload requirements. Organizations can deploy iSCSI for block-based virtual machine storage, NFS or SMB for file services, and S3-compatible object storage for modern data workflows.

This flexibility enables use cases such as immutable backups, lifecycle-managed archives, and centralized data repositories. It also supports edge-to-core data flows, in which applications run locally while large datasets are aggregated centrally.

Security and Data Protection Considerations

Infrastructure decisions increasingly prioritize cyber resilience alongside performance and capacity. Nexsan platforms incorporate features such as immutable snapshots, object locking, replication, and encryption. These capabilities support secure backup, compliance retention, and rapid recovery workflows.

The Unity NV-Series targets mixed workloads with an emphasis on ransomware resilience, while the E-Series P focuses on dense, high-capacity block storage scenarios such as surveillance. These design points align with environments where data protection and recoverability are critical operational requirements.

Use Cases

The joint solution is best suited for environments with uneven growth patterns and a need for operational simplicity. Common use cases include video surveillance retention, backup and disaster recovery repositories, centralized file services, and long-term archival storage.

It also aligns well with organizations that are modernizing their virtualization while preserving existing storage investments. For channel partners and managed service providers, the architecture supports repeatable solution design that can be tailored to specific vertical requirements.

By separating compute and storage scaling while maintaining a unified operational model, Scale Computing and Nexsan provide a pragmatic approach to modern infrastructure design that reflects how enterprise workloads and data actually grow.

The post Scale Computing and Nexsan Address Asymmetric Growth in HCI Environments appeared first on StorageReview.com.

Dell PowerEdge R770AP Review: Dell’s Purpose-Built Answer for Latency-Sensitive Workloads

8 April 2026 at 13:51

The Dell PowerEdge R770AP is not a general-purpose server, and that is entirely the point. Where most 2U dual-socket platforms chase flexibility, the R770AP strips it away, trading GPU support, mixed storage options, and raw memory capacity for the highest core density, memory bandwidth, and execution determinism available in Dell’s current Intel lineup. It is a server built around a specific processor architecture for a particular class of workloads, and it makes no excuses for what it omits.

Dell PowerEdge R770AP Front with Bezel

To understand why it exists, start with the platform it sits alongside. Dell’s PowerEdge R7x0 line has historically been the company’s most versatile 2U Intel server, with the AMD-powered PowerEdge R7725 filling an equivalent role on the EPYC side. The PowerEdge R770 carries that Intel tradition forward with support for both Xeon 6 P-core and E-core processors, GPU accelerators, mixed SAS/SATA/NVMe storage, up to 8 TB of memory across 32 DIMM slots, and enough PCIe Gen5 expansion to cover everything from virtualization to AI inference.

The PowerEdge R770AP is not that server.

The “AP” designation stands for Advanced Performance, but the name undersells how different this machine is. While the R770 uses Intel’s Granite Rapids-SP silicon on the LGA 4710 socket with 8 memory channels and up to 86 P-cores, the R770AP moves to the Granite Rapids-AP platform on the LGA 7529 socket, delivering up to 128 P-cores per socket (120 cores in our test configuration) and 12 DDR5 memory channels. This is the same distinction Intel draws across its entire Xeon 6 6900-series strategy: the 6900P parts on the AP platform represent Intel’s highest-performance server silicon, purpose-built for workloads where per-core performance, memory bandwidth, and execution determinism matter more than overall server configuration flexibility.

Dell PowerEdge 770AP LGA 7529 socket

Intel’s new Granite Rapids-AP LGA 7529 Socket

Intel’s broader Xeon 6 architecture splits the data center into two lanes. E-core processors target density and power efficiency for cloud-native, scale-out workloads like microservices and content delivery. P-core processors target compute-intensive work where consistent per-thread performance is critical: HPC simulations, real-time analytics, large in-memory databases, and latency-sensitive financial compute. The 6900P series sits at the top of that P-core stack, pairing the highest available core counts with 12-channel memory bandwidth, up to 96 PCIe Gen5 lanes per socket, up to 6 UPI 2.0 links, and L3 cache pools that reach 504MB on top SKUs like the Intel Xeon 6978P. The architectural goal is not just raw throughput but predictable throughput, minimizing scheduling jitter and memory-access variability that erode performance in timing-critical environments.

The R770AP is Dell’s chassis expression of that philosophy. It strips away everything the Granite Rapids-AP platform doesn’t need: GPU support is gone entirely, SAS and SATA storage options are removed in favor of NVMe-only configurations (up to 16x 2.5-inch Gen5 NVMe or up to 32x E3.S Gen5 NVMe, configuration dependent), memory capacity tops out at 3 TB across 24 DIMM slots (12 per socket, 1DPC for maximum per-channel speed), and PCIe expansion is trimmed to five Gen5 x16 slots plus dual OCP NIC 3.0. What remains is a 2U dual-socket platform optimized for compute density, memory bandwidth, and the deterministic behavior demanded by workloads such as high-frequency trading, real-time risk analysis, and massively parallel simulation.

kevin with intel

Kevin holding the R770AP Heatsink with the Intel 6900 Series chip

Our review unit pairs two Intel Xeon 6978P processors, each with 120 P-cores running at a 2.1 GHz base and 3.2 GHz all-core turbo, with 3TB of DDR5-6400 memory across all 24 DIMM slots. Compared to the R770, which is equipped with dual Xeon 6787P processors (86 cores each, 8 memory channels, 2 TB DDR5), the R770AP offers 39.5% more cores and 50% more memory channels. The question is whether those architectural advantages translate into proportional real-world gains, and whether the platform trade-offs are worth it for the workloads Dell and Intel are targeting.

Dell PowerEdge R770AP Specifications

The table below highlights the physical and supported configuration specifications for the Dell PowerEdge R770AP Platform.

Specification Dell PowerEdge R770AP
Processor
Processor Two Intel® Xeon® 6 6900-series processors with P-Cores, up to 128 Cores each
Memory
DIMM Slots 24 DDR5 DIMM slots
Maximum Memory 3 TB
Memory Speed Up to 6400 MT/s
Memory Type Registered ECC DDR5 RDIMMs only
Storage
Storage Controllers (RAID) PERC H975i DC-MHS front (internal)
Internal Boot BOSS-N1 DC-MHS: HWRAID 1, 2x M.2 NVMe SSDs or USB
Front Drive Bays Up to 16x 2.5-inch G5 x4 NVMe SSD (max 245.76 TB)
Up to 16x 2.5-inch G5 x2 NVMe SSD (max 245.76 TB)
Up to 32x EDSFF E3.S Gen5 NVMe SSD (max 491.52 TB)
Rear Drive Bays N/A
Power
Power Supplies 1500 W Titanium, 100-120 LLAC or 200-240 HLAC, 240 VDC, hot swap redundant
1800 W Titanium, 200-240 HLAC, 240 VDC, hot swap redundant
2400 W Titanium, 100-120 LLAC or 200-240 HLAC, 240 VDC, hot swap redundant
3200 W Titanium, 200-220 HLAC or 220.1-240 HLAC, 240 VDC, hot swap redundant
3200 W Titanium, 277 Vac & HVDC, hot swap redundant*
Cooling & Fans
Cooling Options Air cooling
Fans Up to 6 hot swappable fans
Form Factor & Dimensions
Form Factor 2U rack server
Height 86.8 mm (3.42 inches)
Width 482 mm (19.0 inches)
Depth (with bezel) 802.40 mm (31.59 inches)
Depth (without bezel) 801.51 mm (31.56 inches)
Bezel Optional metal bezel
Networking & Expansion
OCP Network Options Up to two OCP NIC 3.0 cards
Slot 4: 1×8 or 1×16 Gen5 OCP 3.0
Slot 10: 1×16 Gen5 OCP 3.0
Embedded NIC 1 Gb dedicated BMC Ethernet port
PCIe Slots Up to 5 Gen5 PCIe slots (x16 connectors)
Slot 2: 1×16 Gen5, full height, half length
Slot 3: 1×16 Gen5, full height/low profile, half length
Slot 5: 1×16 Gen5, full height, half length
Slot 7: 1×16 Gen5, full height, half length
Slot 9: 1×16 Gen5, full height/low profile, half length
GPU Options N/A
Ports
Front Ports 1x USB 2.0 Type-C
Rear Ports 1x Dedicated BMC Ethernet port
2x USB 3.1 Type-A
1x VGA
Internal Ports 1x USB 3.1 Type-A
Management
Embedded Management iDRAC10, iDRAC Direct, iDRAC RESTful API with Redfish, RACADM CLI, iDRAC Service Module
Security
Security Features Cryptographically signed firmware, Data at Rest Encryption (SEDs with local or external key mgmt), Secure Boot, Secured Component Verification (hardware integrity check), Secure Erase, Silicon Root of Trust, System Lockdown (requires iDRAC10 Enterprise or Datacenter), TPM 2.0 FIPS/CC-TCG certified, Chassis Intrusion Detection
Operating Systems & Hypervisors
Supported OS / Hypervisors Canonical Ubuntu Server LTS, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, VMware vSAN / VMware ESXi*, Microsoft Windows, Microsoft Windows Server, Microsoft Windows Server Datacenter

Design and Build

The Dell PowerEdge 770AP is a 2U rack server in Dell’s 17th Generation PowerEdge lineup, sharing the same aesthetic design language as the R770 we reviewed. It measures 3.42 inches tall, 19.0 inches wide, and 31.59 inches deep. The front bezel is optional. The front ear houses iDRAC Direct access, a USB 2.0 Type-C port, a power button, and a system ID button.

Dell PowerEdge 770AP front power button and I/O

Storage

The 770AP supports three storage configurations. The unit shipped with up to 16x 2.5-inch Gen 5 x4 NVMe SSDs, with a maximum capacity of 245.76 TB. Also available are up to 16x 2.5-inch Gen 5 x2 NVMe SSDs, capping at 245.76 TB, and up to 32x EDSFF E3.S Gen 5 NVMe SSDs, which scale up to 491.52 TB. In the 16-bay configurations, Dell divides the drives into two banks of eight, left and right of the server, with the middle being an intake for airflow.

Dell PowerEdge R770AP front storage bays

Looking further back into the chassis, the 770AP features a clean, direct NVMe cabling layout. The cables run straight from the storage backplane to the front edge of the motherboard, keeping the signal path short and the interior organized.

Dell PowerEdge R770AP backplane and nvme direct cabling.

Rear I/O and Networking

Two redundant 2400W PSUs anchor the rear of the 770AP on either end. The BOSS-N1 module handles boot duties and houses two 480GB drives for the OS.

For expansion, the server offers up to five Gen 5 PCIe slots across slots 2, 3, 5, 7, and 9, all using x16 connectors in full-height configurations. OCP 3.0 networking is handled by up to two cards: slot 4 supports x8 or x16 Gen 5, and slot 10 provides a dedicated x16 Gen 5 connection. Our unit shipped with a 200GbE OCP card alongside multiple 100GbE cards, leaving no shortage of network bandwidth.

Standard rear I/O includes a dedicated BMC Ethernet port, two USB 3.1 Type-A ports, and a VGA port.

Dell PowerEdge R770AP rear PCIe , storage , power and I/O

A closer look at the BOSS-N1 module reveals the two 480GB boot drives side by side, both hot-swappable and quick to access and replace when needed.

Dell PowerEdge R770AP boss drives

With the top cover and airflow shrouds removed, the R770AP’s interior layout is clean and well organized. Six hot-swappable fans push air across the large heatsinks, cooling the Xeon 6900 series processors, with the dual-CPU and memory configuration laid out symmetrically across the board. Also visible are the blue tabs throughout the chassis, which serve as disassembly guides for cable removal and component access.

Dell PowerEdge R770AP top down view

Processor

With the CPU removed, the sheer size of the Intel Xeon 6900 series chip is immediately apparent. The R770AP uses the LGA 7529 socket, and our review unit shipped with two Intel Xeon 6978P processors. Each chip carries a 500W TDP and 120 cores, bringing the total core count to 240 across both sockets.

Dell PowerEdge R770AP heatsink and cpu

Cooling and Memory

To manage 1000W of CPU thermal output through air cooling alone, Dell engineered a deliberate heatsink design. The front and rear heatsinks use horizontal fins with heat pipes to move heat efficiently. At the same time, the center section features a vertical fin stack that increases airflow dwell time and surface area, giving the fans more opportunity to pull heat away before it exits the chassis. Nestled among the coolers are 24 DIMM slots in total, with each CPU flanked by 12 slots split six per side.

Dell PowerEdge R770AP memory and heatsinks

Power

The R770AP supports four PSU options, all 80 Plus Titanium-rated and hot-swap redundant: 1500W, 1800W, 2400W, and 3200W. With up to 1000W consumed by the CPUs alone, the 1500W baseline leaves little headroom once drives and expansion cards are factored in. Our unit shipped with the 2400W units, rated at 96% efficiency, which is the practical minimum for a fully loaded storage configuration.

Dell PowerEdge R770AP hot swap redundant 2400w PSU

iDRAC 10 Management

Remote management for the R770AP is handled by iDRAC10, the same platform Dell ships as standard across its entire 17th-generation PowerEdge lineup, including the PowerEdge R770 and PowerEdge R7725 we previously reviewed. The interface is consistent across the portfolio, so administrators already familiar with iDRAC on other PowerEdge platforms will feel right at home.

The iDRAC10 dashboard provides a full, at-a-glance health summary of every major subsystem: System Health, Processor, Memory, Cooling, Storage, Voltages, Power Supplies, Batteries, and Intrusion Detection. The review unit shows that all subsystems were reporting as healthy at the time of testing. System information and firmware version details are displayed directly on the dashboard alongside license status, which, on the review unit, is confirmed as Enterprise. The Task Summary panel tracks pending, in-progress, and completed jobs, with the review unit showing completed jobs from an initial provisioning cycle, including a small number with errors and one failed, typical of a fresh deployment.

Drilling into the System Environments section reveals cooling details, including individual fan status, PWM speeds, thermal profile settings, and inlet temperature readings, all in real time. This is especially useful for validating airflow in dense rack configurations or troubleshooting thermal issues without needing physical access to the server.

 

Power visibility follows the same pattern. The Power Info section breaks down PSU health, current draw, and capacity utilization alongside a rolling historical trend graph. Administrators can quickly see average and peak wattage over time, which is valuable for capacity planning and identifying workload-driven power spikes without needing a separate power monitoring tool.

Together, these views make iDRAC10 a capable out-of-band management solution that covers the full operational lifecycle of the R770AP, from initial deployment through day-to-day monitoring, all accessible remotely via browser or the RESTful Redfish API.

Dell PowerEdge R770AP Performance

To evaluate the R770AP, we compared it directly against the R770. The R770AP is equipped with dual Intel Xeon 6978P processors, each with 120 cores, for a total of 240 cores and 3 TB of DDR5 memory. The R770, by contrast, runs dual Intel Xeon 6787P processors, for a total of 172 cores and 2 TB of DDR5 memory.

Dell PowerEdge R770AP cooling shroud for memory and CPU's

To stress the CPUs across both systems, we used a focused set of compute benchmarks. y-cruncher was used to evaluate raw arithmetic throughput and multithreaded floating point performance. Blender provided a real-world rendering workload that scales with available cores and memory bandwidth. Phoronix Test Suite rounded out the benchmark set with a broader collection of CPU-bound workloads, giving a more complete picture of sustained compute performance across both platforms.

Test System Specifications

  • Platform: Dell PowerEdge R770AP
  • CPU: Dual Intel Xeon 6978P, 120 cores
  • Memory: 3 TB DDR5
  • Storage: Boss RAID1

y-cruncher

y-cruncher is a popular benchmarking and stress-testing application that launched back in 2009. This test is multithreaded and scalable, computing Pi and other constants up to the trillions of digits. Faster is better in this test. This software has been fantastic for testing high-core-count platforms and demonstrating compute advantages between single- and dual-socket platforms.

In the y-cruncher benchmark, the R770AP consistently outperformed the R770 across all test sizes. At the 1-billion-digit run, the R770AP completed in 2.692 seconds, compared to 2.753 seconds on the R770. At 10 billion digits, the R770AP finished in 30.399 seconds compared to 34.873 seconds on the R770. At 50 billion digits, the R770AP turned in 192.128 seconds against 221.255 seconds on the R770. The gap widened at the largest workload, with the 100-billion-digit run completing in 430.208 seconds on the R770AP compared to 491.737 seconds on the R770, a difference of roughly 61 seconds and an approximately 12.5% performance advantage for the R770AP.

Y-cruncher (lower duration is better) Dell PowerEdge R770 (2x Intel Xeon 6787P | 2TB RAM) Dell PowerEdge R770AP (2x Intel Xeon 6978P | 3TB RAM)
1 Billion 2.753 seconds 2.692 seconds
2.5 Billion 7.365 seconds 6.747 seconds
5 Billion 16.223 seconds 14.235 seconds
10 Billion 34.873 seconds 30.399 seconds
25 Billion 99.324 seconds 86.298 seconds
50 Billion 221.255 seconds 192.128 seconds
100 Billion 491.737 seconds 430.208 seconds

Blender

An open-source 3D modeling application. This benchmark was run using the Blender Benchmark utility. The score is samples per minute, with higher being better.

In the Blender 4.3 benchmark, the R770AP outperformed the R770 across all three scenes. On the Monster scene, the R770AP scored 2,200.116 samples per minute compared to 1,706.002 on the R770. The Junkshop scene saw the R770AP turn in 1,565.643 samples per minute, compared to 1,169.370 on the R770. In the Classroom scene, the R770AP scored 1,076.122 samples per minute, compared to 791.475 on the R770, representing roughly a 36% performance advantage on that workload.

Blender 4.3 CPU Benchmark (higher samples per minute is better) Dell PowerEdge R770 (2x Intel Xeon 6787P | 2TB RAM) Dell PowerEdge R770AP (2x Intel Xeon 6978P | 3TB RAM)
Monster 1,706.002 samples/min 2,200.116 samples/min
Junkshop 1,169.370 samples/min 1,565.643 samples/min
Classroom 791.475 samples/min 1,076.122 samples/min

Phoronix Benchmarks

Phoronix Test Suite is an open-source, automated benchmarking platform that supports over 450 test profiles and 100+ test suites via OpenBenchmarking.org. It handles everything from installing dependencies to running tests and collecting results, making it ideal for performance comparisons, hardware validation, and continuous integration. We will focus on comparing the R770AP and R770 against Stream, 7-Zip, Linux kernel build, Apache, and OpenSSL tests.

Stream

In the Stream memory bandwidth test, the R770AP delivered a substantial leap over the R770, scoring 869,965.3 MB/s compared to 472,135.6 MB/s. This nearly doubles the memory bandwidth of the baseline system, reflecting the R770AP’s larger and faster memory configuration.

7-Zip

In the 7-Zip compression benchmark, the R770AP scored 806,375 MIPS, compared to 628,206 MIPS on the R770, a solid uplift driven by the higher core count of the 6978P processors.

Kernel Compile

In the Linux kernel compile test, where a lower time is better, the R770AP completed the allmod build in 176.391 seconds compared to 188.793 seconds on the R770, shaving roughly 12 seconds off the compile time.

Apache

The Apache test was the one area where the R770 edged out the R770AP, scoring 60,258.5 requests per second versus 48,729.63 on the R770AP. This is worth noting, as web-serving workloads do not always scale linearly with core count and can be influenced by memory latency and I/O characteristics.

OpenSSL

In the OpenSSL verification test, the R770AP scored 2,515,270,390,853 verify/s compared to 2,216,883,554,350 verify/s on the R770, a meaningful gain in cryptographic throughput that highlights the compute efficiency of the 6978P at scale.

Phoronix Benchmarks Dell PowerEdge R770 (2x Intel Xeon 6787P 86C) Dell PowerEdge R770AP (2x Intel Xeon 6978P | 3TB RAM)
Stream 472,135.6 MB/s 869,965.3 MB/s
7-ZIP 628,206 MIP/s 806,375 MIP/s
Kernel Compile (allmod) (lower is better) 188.793 Seconds 176.391 Seconds
Apache (requests per second) 60,258.5 R/s 48,729.63 R/s
OpenSSL 2,216,883,554,350 Verify/s 2,515,270,390,853 Verify/s

 

Dell PowerEdge R770AP: High-Frequency Trading and Deterministic Performance

While our standard benchmark suite focuses on compute throughput, memory bandwidth, and general workload scaling, the R770AP’s design priorities extend into territory we don’t typically test: microsecond-level execution determinism. To illustrate what this platform can do for its most demanding target audience, Dell published a technical brief in partnership with Metrum AI that evaluates the R770AP specifically for high-frequency trading workloads. We did not conduct this testing, nor did we independently audit the results. Still, we’re including a summary here because it provides the most direct demonstration of why this server is a distinct product from the R770.

The Metrum AI methodology centers on a custom tool called jitter-c, which measures per-core wake-up latency jitter, essentially how consistently a thread scheduled to execute at a precise moment actually begins running. This metric isolates CPU scheduling variability from network, memory, and application-level factors, making it a clean point of comparison across processor generations. Using an R770AP with dual Xeon 6980P processors (256 total cores) against a prior-generation R760 with dual Xeon Platinum 8592+ processors (128 total cores), the study found that the Granite Rapids-AP architecture reduced p99 wake-up jitter to approximately 1 microsecond, roughly half that of the older platform, while simultaneously doubling core density. Those jitter profiles were then injected into a backtesting simulation engine to model the financial impact, with the results summarized below.

Metrum AI HFT Backtest Results Dell PowerEdge R760 (2x Xeon 8592+, 128 cores) Dell PowerEdge R770AP (2x Xeon 6980P, 256 cores)
p99 Wake-Up Jitter ~2 µs ~1 µs
Mean Reversion: Total Trades 5,175 6,229 (+20.4%)
Mean Reversion: Trades/sec 819 991 (+21.1%)
Market Making: Total Trades 21,765 32,491 (+49.3%)
Market Making: Trades/sec 2,067 3,072 (+48.6%)

 

As Dell’s Seamus Jones framed it in his commentary on the study, the value proposition is not about being fast but about being predictably fast, because in trading, a system that is quick but inconsistent is a source of risk. In contrast, a deterministic system is a strategic asset.

Conclusion

The Dell PowerEdge R770AP occupies a purposeful, narrow position within the 17th-generation PowerEdge lineup. It is not a replacement for the R770, and Dell is not positioning it as one. The R770 remains the versatile, broadly configurable 2U Intel platform it has always been, with GPU support, mixed SAS/SATA/NVMe storage, E-core and P-core processor options, and up to 8TB of memory across 32 DIMM slots. For organizations running general virtualization, mixed enterprise applications, or workloads that benefit from that flexibility in configuration, the R770 is still the right call.

Dell PowerEdge R770AP in dell lab with SR sloth

The R770AP exists for the workloads that the R770 was never optimized to serve. By moving to the Granite Rapids-AP platform, with its 12-channel memory architecture, up to 128 P-cores per socket, and 504 MB of L3 cache, Dell has built a 2U system that prioritizes compute density, memory bandwidth, and execution determinism over versatility. Our benchmarks reflect that focus: STREAM bandwidth nearly doubled, Blender rendering improved 29-36%, and y-cruncher scaling widened consistently as working sets grew beyond cache. The Apache regression is worth noting, as it demonstrates that the R770AP’s NUMA topology requires workload awareness to extract full performance, and not every application will benefit from the platform shift without tuning.

The Metrum AI testing Dell published alongside this platform puts a finer point on the determinism story. Cutting p99 scheduling jitter in half while doubling core density is a meaningful architectural improvement for firms running high-frequency trading, real-time risk engines, large-scale in-memory analytics, and massively parallel simulations. For those workloads, the R770AP is a well-executed, purpose-built platform. For everything else, the R770 and R7725 remain the better-suited options in the mainstream PowerEdge portfolio.

Product Page – Dell PowerEdge R770AP

The post Dell PowerEdge R770AP Review: Dell’s Purpose-Built Answer for Latency-Sensitive Workloads appeared first on StorageReview.com.

Nutanix and NetApp Announce Integration to Align ONTAP with Nutanix Cloud Platform

7 April 2026 at 20:51

At the Nutanix .NEXT Conference in Chicago, Nutanix and NetApp announced a strategic partnership to integrate NetApp Intelligent Data Infrastructure with the Nutanix Cloud Platform (NCP), including support for the Nutanix AHV hypervisor. The integration is expected later this year and targets enterprises looking to align virtualization and data management strategies across on-premises, hybrid cloud, and containerized environments.

Nutanix cloud platform graphic

The joint approach brings NetApp ONTAP into the Nutanix ecosystem as a primary data layer, combining ONTAP’s mature data services with NCP’s unified operational model. This positions ONTAP as the storage backbone while Nutanix continues to deliver compute, virtualization, and cloud orchestration through AHV and its broader platform stack.

NetApp data platform graphic

NetApp and Nutanix are streamlining the modernization of virtualized environments by providing secure and efficient solutions, according to Sandeep Singh, Senior Vice President and General Manager of Enterprise Storage at NetApp. Singh emphasized the importance of Intelligent Data Infrastructure as the foundation for transforming virtualization and data operations, highlighting that their collaboration simplifies the operation of virtualized workloads at the enterprise level.

NFS-Based Connectivity

The integration centers on NFS-based connectivity between Nutanix and ONTAP systems. This allows virtual machines to run on Nutanix while leveraging external NetApp storage, enabling a disaggregated architecture where compute and storage scale independently. This model is particularly relevant for organizations seeking to optimize resource utilization or extend existing NetApp investments into Nutanix environments.

Migration is a key focus area. The companies are aligning NetApp Shift and Nutanix Move to enable faster VM migrations to AHV environments. The tooling is designed to enable data-in-place conversions, reducing the need for full data copies and shortening migration timelines to minutes in some scenarios. This approach is intended to minimize operational disruption while accelerating adoption of the Nutanix platform.

Operational simplification is another stated goal. By offloading storage services to ONTAP, organizations can centralize data management functions such as snapshots, replication, and tiering, while Nutanix manages compute and virtualization. The combined environment is expected to offer unified visibility and control, reducing administrative overhead and simplifying troubleshooting across the stack.

VM-Level Granularity

The integration also introduces VM-level granularity for storage operations. Administrators can apply policies for performance, capacity, and data protection at the individual VM level, rather than managing resources at a broader datastore or cluster level. This aligns with enterprise requirements for fine-grained control in multi-tenant or mixed workload environments.

Cyber resilience is addressed through native ONTAP capabilities. The solution is expected to incorporate Autonomous Ransomware Protection with AI and NetApp’s ransomware resilience services, providing real-time detection of anomalies and potential data exfiltration. These features extend Nutanix’s existing security posture with deeper storage-layer intelligence.

NetApp Chief Commercial Officer Dallas Olson emphasized that partnering with Nutanix enhances NetApp’s position as a leader in storage and data management for virtualization. The collaboration aims to provide enterprises with a robust foundation for building an Intelligent Data Infrastructure that offers high performance, resilience, and scalability to support growing virtualization requirements.

Nutanix President and Chief Commercial Officer Tarkan Maner announced that their partnership with NetApp enables customers to modernize their virtualization platforms and leverage Intelligent Data Infrastructure at their own pace, combining modernization with advanced data management capabilities.

Overall, the partnership reflects a shift toward composable infrastructure models, in which best-of-breed compute and storage platforms are integrated via standard protocols and unified management layers.

The two vendors also plan to collaborate on AI initiatives. ONTAP integration with the Nutanix Agentic AI stack is intended to support emerging enterprise AI use cases, focusing on data accessibility, governance, and performance in AI-driven workflows.

The post Nutanix and NetApp Announce Integration to Align ONTAP with Nutanix Cloud Platform appeared first on StorageReview.com.

NVMe Performance Compared: Windows Server 2025 vs. Ubuntu Server 24.04.4 LTS

3 April 2026 at 19:06

After publishing our article about Microsoft’s opt-in native NVMe feature on Windows Server 2025, we received multiple requests for a direct comparison of storage performance between Windows Server 2025 with native NVMe and a Linux-based server OS. One especially enthusiastic Redditor even offered us beer to do it! Since there were obviously no other reasonable options, we decided to run the same tests on Linux.

Windows Server 2025 NVMe vs Linux performance ssds

A Long Time Ago, in an OS Version Far, Far Away

The Linux kernel has supported NVMe since version 3.3, released in March 2012. Similarly, the protocol has been supported on Windows Server (non-natively, via SCSI translation) since 2012 R2, around October 2013. More than a decade later, users are still debating whether Windows or Linux is better for storage, so we thought we’d add some more fuel to the fire with benchmark results comparing the two.

Since we have test results for Windows Server 2025 using both non-native and native storage stacks, we thought it was appropriate to evaluate two storage stacks on Linux. For our FIO benchmarks, we used both libaio and io_uring, two of the most popular APIs for storage transactions. While io_uring is considerably newer and provides many improvements for asynchronous I/O, libaio remains widely used for its flexibility and ease of use (Didona, Pfefferle, Ioannou, Metzler, & Trivedi, 2022). Complete architecture overviews for both stacks are beyond the scope of this article, but we are still providing results for a direct comparison.

Testing NVMe on Ubuntu Server 24.04.4 LTS

Our hardware platform for this comparison is the same server used in our Windows Server 2025 native NVMe article. To ensure maximum throughput and consistent results, it is equipped with two 128-core AMD EPYC 9754 CPUs, 768GB of DDR5 memory at 4800 MT/s, and fifteen 30.72 TB Solidigm P5316 NVMe SSDs with PCIe 4.0 in a JBOD configuration.

As we mentioned in our previous article, the Solidigm P5316 has an indirection unit size of 64 kilobytes, which means write performance for smaller sizes (such as 4K tests) is often worse than expected. Once again, we ran different test patterns at block sizes of 4K, 64K, and 128K to provide a broad range of results for read and write operations.

We selected Ubuntu Server 24.04.4 LTS as our Linux example because of its popularity and long-term support. It runs on the Linux kernel 6.8 by default, which isn’t the newest or most advanced, but likely represents a large portion of installations worldwide.

Highlights

  • Windows Server 2025 native NVMe wins in three out of four read performance benchmarks
  • Lower CPU usage observed with Windows Server during most tests
  • Ubuntu Server 24.04.4 LTS wins in three out of four write performance benchmarks
Metric Random 4K Random 64K
Windows Non-Native Windows Native Linux libaio Linux io_uring Windows Non-Native Windows Native Linux libaio Linux io_uring
Random Read
Bandwidth (GiB/s) 6.1 10.058 9.198 9.504 74.291 91.165 77.517 77.7
IOPS 1,598,959 2,636,516 2,411,000 2,491,000 1,217,176 1,493,637 1,270,000 1,273,000
Average latency (ms) 0.169 0.104 0.198 0.192 0.239 0.207 0.377 0.376
Total CPU Usage (%) 72.67 74.22 99.77 99.76 68.44 65.11 83.16 84.72

 

Metric Sequential 64K Sequential 128K
Windows Non-Native Windows Native Linux libaio Linux io_uring Windows Non-Native Windows Native Linux libaio Linux io_uring
Sequential Read
Bandwidth (GiB/s) 35.596 35.623 31.867 31.433 86.791 92.562 97.05 97
IOPS 583,192 583,638 522,000 515,000 710,978 758,252 795,000 795,000
Average latency (ms) 0.809 0.812 0.919 0.932 0.613 0.608 0.603 0.604
Total CPU Usage (%) 44.89 37.11 53.94 41.74 61.56 49.56 75.14 76.90

 

Metric Random 4K Random 64K
Windows Non-Native Windows Native Linux libaio Linux io_uring Windows Non-Native Windows Native Linux libaio Linux io_uring
Random Write
Bandwidth (GiB/s) 1.803 1.756 1.876 1.815 7.654 7.655 7.652 7.651
IOPS 472,725 460,383 492,000 476,000 125,391 125,406 125,000 125,000
Average latency (ms) 0.992 1.028 0.974 1.007 3.814 3.816 3.827 3.828
Total CPU Usage (%) 26.00 20.67 45.76 22.80 12.22 9.33 20.07 10.90

 

Metric Sequential 64K Sequential 128K
Windows Non-Native Windows Native Linux libaio Linux io_uring Windows Non-Native Windows Native Linux libaio Linux io_uring
Sequential Write
Bandwidth (GiB/s) 44.67 50.087 52.283 52.25 50.477 50.079 52 52.083
IOPS 731,859 820,603 856,000 856,000 413,495 410,232 426,000 427,000
Average latency (ms) 0.399 0.558 0.560 0.560 1.022 1.149 1.126 1.125
Total CPU Usage (%) 70.44 57.78 61.88 62.75 58.44 47.33 61.49 44.27


Note:
Our Linux IOPS results are rounded to the nearest thousand due to differences in FIO reporting between Windows Server 2025 and Ubuntu Server 24.04.4 LTS. Bandwidth, latency, and CPU usage results are rounded consistently across both platforms.

The Numbers Don’t Lie

Immediately, we see that Ubuntu does not outperform Windows in every category. While libaio and io_uring delivered excellent throughput in our random-read bandwidth tests, they did not match the performance of Microsoft’s native NVMe stack. The Windows NT kernel beat the Linux kernel by about 17% in our random-read 64K tests, with a winning 91.165 GiB/s on native NVMe versus io_uring’s best of 77.7 GiB/s.

However, not all hope is lost for Torvalds’ technological terror. Ubuntu Server narrowly beat Windows Server in one of our read performance benchmarks: the sequential 128K test. Here, Linux’s libaio performed best at 97.05 GiB/s, compared to Windows’ native NVMe at 92.562 GiB/s, a difference of about 5%. This indicates that Linux may hold a slight edge when managing block sizes larger than the drives’ indirection units.

Random write bandwidth was consistent across both Linux and Windows, especially in 64K benchmarks. The best and worst results from those tests differed by only 0.05%, suggesting that all storage stacks realized the drives’ full potential.

Interestingly, the Linux 6.8 kernel claimed victory in sequential write bandwidth tests for block sizes of 64 K and 128 K. While the difference was not massive, the open-source software stacks beat Windows Server’s native NVMe by about 2 GiB/s in both cases.

Latency results generally followed throughput test results, shown best by the difference in random read averages. Unfortunately for Tux, libaio and io_uring had higher latency, with the largest difference of 0.17 ms between Windows Server native NVMe (0.207 ms) and libaio (0.377 ms) for 64 K random reads.

Perhaps the most shocking revelation from our benchmarks is the massive delta in CPU usage between Windows Server 2025 and Ubuntu Server 24.04.4 LTS. In three of four random and sequential-read benchmarks, Windows Server native NVMe had the lowest CPU usage. The most notable result was seen during the sequential read 128K benchmark run, where Windows used 27.34% less than Linux.

CPU usage with libaio and io_uring performed slightly better in random and sequential write tests, but it was still not enough to prevent native NVMe on Windows Server from winning in three of the benchmarks. A notable exception was libaio’s CPU usage during the random write 4K test, which reached 45.76% of the system’s CPU, while the other storage stacks hovered around 20%.

Winner Winner, CPU Dinner

As our results show, Windows Server and Ubuntu Server perform closely when tested head-to-head in both random and sequential performance tests at different block sizes. In terms of bandwidth, Windows Server 2025 with native NVMe generally outperformed Linux in most read tests, while Linux responded with slightly better results in our write tests. Our latency figures told a similar story, but the real highlight was Windows Server 2025’s CPU efficiency when using native NVMe.

Microsoft has clearly put effort into making their newest storage stack their best, and while it does not always win against libaio and io_uring, it puts up a good fight. While these results are not definitive across all use cases and server setups, they may help server administrators decide whether to deploy a Windows or Linux server when storage performance is more critical than OS compatibility.

Let us know what you think about these results by commenting on our social platforms or the SR Discord! Did you expect Windows Server to do so well in our tests, or were you rooting for Linux? Would you like to see more Linux server distributions or kernels tested? We’re always looking for your feedback, and reader-requested tests like these often become our favorite articles.

References

Didona, D., Pfefferle, J., Ioannou, N., Metzler, B., & Trivedi, A. (2022, June 13). Understanding Modern Storage APIs: A systematic study of libaio, SPDK, and io_uring. SYSTOR ’22, 120-121. Retrieved April 3, 2026, from https://atlarge-research.com/pdfs/2022-systor-apis.pdf

The post NVMe Performance Compared: Windows Server 2025 vs. Ubuntu Server 24.04.4 LTS appeared first on StorageReview.com.

Backblaze Publishes Q1 2026 Cloud Storage Performance Results

2 April 2026 at 23:26
Backblaze Performance Stats Q1 2026 US East Upload comparisons Backblaze Performance Stats Q1 2026 US East Upload comparisons

Backblaze has published its Q1 2026 Performance Stats report, a quarterly series comparing cloud storage performance across Backblaze B2, AWS S3, Cloudflare R2, and Wasabi Object Storage. The report covers testing in US-East and EU-Central and includes both the results and the methodology used, with the stated goal of letting others review, reproduce, and compare the findings. As with any vendor-produced benchmark, the data comes from the company running the tests, so the results should be read with that context in mind.

Backblaze says its early Q1 2026 testing showed faster average upload and download times in US-East for most providers and file sizes than in Q4 2025, while results in EU-Central followed a different pattern. The data also showed wider variation in sustained throughput than in average transfer times, especially in multithreaded tests, and Backblaze noted that some of its own larger-file throughput tests hit rate limits, which it disclosed in the methodology update.

Backblaze Cloud Storage Performance Results Q1 2026: US-East

US-East was one of two regions included in Backblaze’s Q1 2026 testing, alongside EU-Central. The upload test measures average time, in milliseconds, to upload files of 256KiB, 2MiB, and 5MiB, using averages collected across a month. Lower times indicate better results. Based on Backblaze’s test data, Backblaze B2 posted the lowest average upload time for 256KiB files at 7.08 ms and for 5MiB files at 87.62 ms, while Wasabi recorded the lowest result for 2MiB files at 56.74 ms.

Backblaze Performance Stats Q1 2026 US East Upload comparisons

The quarter-over-quarter comparison indicates lower average upload times across all providers that had prior-quarter data in US-East. Backblaze, AWS S3, and Cloudflare R2 each posted lower averages than in Q4 2025 across the three file sizes shown. Wasabi was not included in the earlier quarter’s chart, so there is no direct Q4 comparison for that provider in this section.

Backblaze Performance Stats Q1 2026 US East avg Upload comparisons

In US-East multithreaded upload testing over five minutes, higher totals indicated more data transferred during the test window. Backblaze included two sets of results for its own service: one under standard conditions and one marked rate-limited. The 256KiB and 5MiB figures were identical in both cases at 80.00 and 324.00, while the larger file sizes diverged after rate limits were triggered.

For 50MiB files, Backblaze B2 was listed at 1,194.80, versus 544.70 for the rate-limited account. For 100MiB files, it was 1,726.10 versus 563.50. Backblaze says these larger-size results reflect bandwidth caps encountered during testing and notes that other providers may apply different limits under their own policies.

Across the same test, Wasabi posted the highest figures in all four file-size categories shown: 157.30 for 256KiB, 815.80 for 5MiB, 2,488.50 for 50MiB, and 3,030.90 for 100MiB. AWS S3 was listed at 84.40, 774.70, 2,238.90, and 2,947.20, while Cloudflare R2 was listed at 28.40, 366.30, 510.10, and 1,450.40.

Backblaze also notes that quarter-over-quarter comparisons in this test are limited because the methodology changed, so these figures are best read as part of an early dataset produced under the company’s test setup.

US-East Five Minute Single-Threaded Upload Test

The five-minute single-threaded upload test measures sustained upload throughput with a single thread across four file sizes. In US-East, Backblaze posted the highest result for 256KiB at 9.40, 50MiB at 119.20, and 100MiB at 164.00. AWS S3 led the 5MiB category at 44.70, just ahead of Wasabi at 44.60. Cloudflare R2 trailed the other providers across all four sizes, with results of 1.20, 15.00, 50.90, and 64.20.

The spread between the highest and lowest results widened as file sizes increased. At 256KiB, the gap ran from 1.20 to 9.40. At 100MiB, it ranged from 64.20 to 164.00. Based on Backblaze’s test setup, the single-threaded results show a narrower contest among Backblaze, AWS, and Wasabi at 5MiB and above. At the same time, Cloudflare R2 posted materially lower figures in this set of US-East upload measurements.

US-East Download Testing

In US-East download testing, AWS S3 had lower average times and lower TTFB for 256KiB and 5MiB downloads, while Backblaze posted the lowest average download time for 2MiB files. The quarter-over-quarter view showed lower download times across many categories for providers with prior data, but the dataset remains limited, and Wasabi had no Q4 comparison in these charts.

In the five-minute multithreaded download benchmark, Backblaze led at 256KiB; AWS S3 led at 5MiB and 50MiB; and Cloudflare R2 led at 100MiB, with Backblaze also reporting separate rate-limited results for its own service. In the five-minute single-threaded download test, Wasabi led at 256KiB and 5MiB, while Backblaze led at 50MiB and 100MiB. Across these download results, rankings varied by test type and file size, making the data more useful as a snapshot of the Backblaze test environment than as a definitive measure of overall provider performance.

Five-minute single-threaded download throughput

In US-East single-threaded download throughput over five minutes, Wasabi posted the top result for smaller files, leading at 256KiB with 11.20 and at 5MiB with 55.70. Backblaze led the larger file sizes, recording 95.90 at 50MiB and 164.00 at 100MiB. AWS S3 stayed close to the leaders in each category, with 5.60, 47.00, 87.90, and 138.10, while Cloudflare R2 trailed on all four file sizes at 2.70, 28.40, 78.10, and 64.20. The spread was narrower at the smallest file sizes and wider at 100MiB, where Backblaze posted the highest result.

Backblaze Cloud Storage Performance Results Q1 2026: EU-Central

Upload Averages

In the EU-Central average upload times, Cloudflare R2 posted the lowest results for 256KiB files at 8.94 ms, while Backblaze posted the lowest times for 2MiB at 47.32 ms and 5MiB at 87.54 ms. AWS S3 recorded 17.14 ms, 68.93 ms, and 87.98 ms across the three file sizes, and Wasabi posted 11.48 ms, 63.24 ms, and 98.89 ms. These results differed from the US-East pattern, with provider rankings changing by region in Backblaze’s test setup.

Upload Throughput

In the EU-Central five-minute multithreaded upload throughput, higher results favored Wasabi at 256KiB with 147.10 and at 100MiB with 2,990.60, while AWS S3 led at 5MiB with 848.10 and at 50MiB with 2,515.00. Backblaze listed 104.70, 216.40, 843.40, and 896.70 for its standard account, alongside separate rate-limited results of 104.70, 216.40, 561.00, and 500.70.

In the five-minute single-threaded upload test, Wasabi led all four file sizes at 8.20, 47.40, 113.60, and 169.60, ahead of Backblaze at the smallest size and AWS S3 at the larger sizes. Across both throughput tests, Cloudflare R2 trailed the other providers in EU-Central.

Download Averages and TTFB

In the EU-Central average download testing, lower times favored Cloudflare R2 in three categories. It posted the lowest time to first byte at 108.84 ms, the lowest 256KiB average at 76.97 ms, and the lowest 2MiB average at 110.79 ms. AWS S3 led the 5MiB category at 141.63 ms. The spread across providers was wide in several categories. Backblaze recorded 284.61 ms for TTFB and 230.01 ms, 318.20 ms, and 407.49 ms for average downloads of 256KiB, 2MiB, and 5MiB, which were the highest figures in this EU-Central set.

Download Throughput

In the EU-Central five-minute multithreaded download throughput, Backblaze led the 256KiB category at 198.30, while AWS S3 led the 5MiB category at 1,382.80, the 50MiB category at 1,761.00, and the 100MiB category at 1,665.10. Backblaze also showed a separate rate-limited line for its own service, with materially lower figures at the larger file sizes.

In the five-minute single-threaded download test, Wasabi led 256KiB at 9.20, while AWS S3 led 5MiB at 50.10, 50MiB at 90.10, and 100MiB at 91.40. Across these EU-Central download tests, the top result varied by file size and test type, while Cloudflare R2 was strongest in average download latency rather than sustained download throughput.

What to Make of the Data

Backblaze says it ran these benchmarks using repeatable synthetic tests from a Vultr-hosted Ubuntu virtual machine in each region, with traffic routed through Catchpoint to each provider’s object storage service. The testing looked at average upload times for 256KiB, 2MiB, and 5MiB files, along with download time to first byte and average download times for those same file sizes. It also ran separate five-minute throughput tests for uploads and downloads using both single-threaded and 20-thread workloads, with file sizes of 256KiB, 5MiB, 50MiB, and 100MiB. Since Backblaze designed and ran the tests itself, it’s important to note that the figures should be viewed as vendor-published benchmark data rather than as an independent third-party study.

Backblaze also lays out several limitations, noting that synthetic testing does not capture the full range of real production workloads, global network conditions, concurrency patterns, or data locality. Backblaze also said that they cannot control routing and peering once traffic leaves the test node, and that caching, traffic shaping, or rate limiting may have affected some results. Moreover, its current setup still has some limitations regarding large-file testing. It points to Wasabi temporarily blacklisting test IPs during high-volume testing as one example of how provider policies can influence the numbers.

The results also varied depending on region, file size, and test type, which is why they described the dataset as directional rather than definitive. Backblaze says it plans to expand the testing over time to cover more regions, workloads, and conditions.

For anyone comparing cloud storage options, these results are most useful as one reference point alongside testing in their own environment.

Backblaze Performance Stats Q1 2026 full report

The post Backblaze Publishes Q1 2026 Cloud Storage Performance Results appeared first on StorageReview.com.

Brady M511 Review: We Finally Labeled Our Lab

2 April 2026 at 18:43

StorageReview has been testing enterprise storage and infrastructure hardware for over 25 years. During that time, we have reviewed petabytes of data across drives, dozens of all-flash arrays, countless switches, servers, and networking gear. We have benchmarked hardware that costs more than most people’s houses. What we have never done, not once, is properly label any of it.

This is not something we are especially proud of. For years, the StorageReview lab has operated on a system best described as “whoever plugged it in probably remembers where it goes.” Cables have been traced by feel. Ports have been identified largely by educated guessing. New team members have received the time-honored orientation of “just follow the cable and see where it ends up.” It has mostly worked, like a lot of things do until they suddenly don’t.

Recently, we started a major lab refresh. New Dell switches to support 800GbE connectivity, more powerful GPU systems, faster storage, retirement of old gear, the kind of upgrade that makes you look at your existing cable management situation and feel a special sort of organizational shame. At some point during the planning process, someone said sensible words that changed our plan: “We should probably label some of this.” And so here we are.

Brady label maker from DCWBrady has been on our radar since Data Center World last year, when we covered the company’s large-format label-printing systems designed for high-volume data center operations. When Brady reached out about the M511, a more compact and portable Bluetooth label printer aimed at smaller facilities, remote sites, and teams that need to move around a space rather than print from a fixed station, it felt like the data center universe was sending a message specifically to our unlabeled lab.

Launched in September 2023, the M511 emerged directly from customer feedback following Brady’s 2022 introduction of the M211, with users asking for wider labels, edge-to-edge printing, and the ability to share the printer across a team. For a lab environment like ours, where multiple people work across the same racks and infrastructure, that multi-user angle is exactly what makes the M511 relevant rather than just a bigger version of its smaller sibling.

The M511 prints up to 1.5″ wide labels from edge to edge at 300 dpi, connects via Bluetooth 5.0 with a 65-foot range to up to five devices simultaneously, and runs on an internal Li-Ion battery rated for 8+ hours or roughly 1,000 labels per charge. It carries MIL-STD-810G durability ratings, surviving 6-foot drops (which we inadvertently tested), 250-pound crushes, and blowing sand and dust, which is more punishment than our lab will ever dish out, but we appreciate the commitment. Print speed is 1.3 inches per second, and an auto label cutter holds each finished label in place until you’re ready to pull it.

Brady M511 mobile label maker

Brady sent us the M511-KIT configuration, which bundles the printer with a hard case, an indoor/outdoor vinyl label cartridge, self-laminating cable wraps, a nylon cloth label cartridge, an AC adapter, a mounting magnet, a utility hook, a power brick, and Brady Workstation Design and Print Pro software with the Product and Wire ID suite. Essentially, everything needed to walk into a facility and start labeling from scratch, which, as it turns out, describes our situation exactly.

Along with the kit’s included starter materials, we asked Brady to send a selection of label types specific to our needs. They provided four additional cartridges: M4C-375-595-WT-BK, an all-weather permanent adhesive vinyl continuous tape in 3/8″ width for general asset and rack identification; M4-1425-FP, a P-Flag polypropylene flag label designed for cable identification with strong solvent resistance; M4-214-483, Brady’s QuickFlag tapered polyester flag labels that wrap cables neatly without mismatched edges; and M4-48-417, a high-adhesion self-laminating vinyl wrap-around label built specifically for wire ID in challenging environments like high humidity and with newer wire jacketing materials such as Teflon and silicone. We detail each of these in our testing experiences.

Label design can be handled via the Express Labels mobile app over Bluetooth on a phone or tablet, or via Brady Workstation on a PC. Recent updates to the app added BradyVoice, a voice-dictation labeling assistant, and an Image-to-Text feature that uses the phone’s camera to convert printed text or handwritten notes directly into label content. The latter is particularly useful for anyone needing to replicate existing labels in a hurry without manually retyping everything.

The standalone M511 printer is priced at $399.99 direct from Brady. At the same time, our M511-KIT review unit comes in at $551.99 and includes the hard case, three label cartridges, mounting accessories, power brick, and Brady Workstation Product and Wire ID software.

Specifications

Specification Brady M511 Portable Bluetooth Label Printer Kit
Key Characteristics
Trade Name M511
UPC 888434620557
Color Black, Yellow
Dimensions
Height 3.6 in
Width 6 in
Depth 6.4 in
Weight 2.646 lb
Power & Battery
Battery Type Internal, not removable, Rechargeable Lithium-ion
Battery mAh Rating 2450 mAh
Shipped With Battery Yes – shipped with battery installed
Recharge Time 2.5 hours
Power Supply Voltage 110 – 240 V
Port Type USB-C
Auto Shut-Off / Power Conserve Yes, User Configurable
Connectivity & Interface
Connectivity Bluetooth® 5 Low Energy (Class II)
Device Connectivity Mobile device connected, PC connected
User Interface Mobile device, PC
Memory Via connected mobile device
Device Indicators Bluetooth indicating lights: pulsing blue = broadcasting signal; solid blue = device connected or paired; LEDs indicating battery life
Durability
Drop Test & Durability Resistant to 6-foot drops, Resistant to 250-lb crushes, Resistant to military-grade shocks (MIL-STD-810G), Sand & Dust
Printing Specifications
Print Technology Thermal Transfer
Print Resolution 300 dpi
Maximum Print Speed 1.33 in/s
Color Printing Capability Single Color
Maximum Label/Tape Width 1.5 in
Maximum Print Width 1.44 in
Minimum Label Length 0.240 in
Maximum Printed Label Length 39 in
Maximum Labels per Charge 1000
Cutter Type Auto Cutter
Calibration Automated through Smart Cell
Label Retention Feature Yes
Font Sizes 4 – 150 pt
Barcode Symbologies — 2D Data Matrix, PDF417, QR Code, More through Brady Workstation, More through software
Barcode Symbologies — Linear Code 128, Code 128A, Code 128B, Code 128C, Code 39, Code 39 Full ASCII, Code 93, Code 93 Full ASCII, EAN-13, EAN-13 Extension 2, EAN-13 Extension 5, EAN-8, EAN-8 Extension 2, EAN-8 Extension 5, GS1-128, HIBC, Interleaved 2 of 5, JAN-13, JAN-8, UPC-A, UPC-E
Built-In Label Wizards Breaker Box, Flags, General, Patch Panel, Pipe Marker, Safety, Sleeves, Slide, Terminal Block, Tube, Vial, Wire Wrap
Compatibility
Compatible Media M4-, M4C-, M5-, M5C-
Label Material Types BradyGrip® Polyester, FreezerBondz™ Polyester, Heat-shrink Polyolefin, Metalized Polyester, Nylon Cloth, Polyester, Polypropylene, Reflective Tape, Self-laminating Polyester, Self-laminating Vinyl, StainerBondz™ Polyester, Tamper-resistant Vinyl, Vinyl, Vinyl Cloth, Water Dissolvable Paper
Materials Supported Continuous, Die-cut
Phones & Tablets Supported Android devices with Android OS 6+, iPhone 5S or newer with iOS 10+
Software Compatibility Brady Workstation, Express Labels Mobile App, Windows-based driver for 3rd-party software use
Applications
Application Asset Tracking, Circuit Board Labeling, Component and Equipment Labeling, Data and Telecommunications Labeling, Electrical Labeling, Facility Identification, General Identification, Inventory and Inspection Labeling, Laboratory Labeling, Lean and 5S Labeling, Safety Identification, Warehouse Marking, Wire and Cable Labeling

Hands On, Labels Out

With the M511 kit unpacked and the Brady Express app installed, we put it straight to work on our first real task: a new batch of cables for our 800G networking deployment. In high-density environments, keeping breakouts organized by speed and strand numbering is not optional. It is the difference between a clean install and a troubleshooting nightmare down the road.

Brady P-Flag label on fiber cable.

Example of a Brady P-Flag label applied to a cable.

The first step was installing the Brady Express mobile app, available on both iOS and Android. On iOS, the pairing process was about as simple as it gets. Power on the M511. The Bluetooth indicator lights up blue and blinks, showing it is waiting to pair. Open the app, and the printer is immediately detected and ready to connect; then the light goes solid. No digging through settings, no manual pairing codes, no driver installs. From unboxing to first print took only a few minutes.

Once connected, the app automatically detects the installed label cartridge and adjusts accordingly. Before getting into labels, though, the app prompts you on first launch to select your trade. Brady offers four options: Electrical/Datacom, Lab, Maintenance/Mechanical, and Custom. This is a small but thoughtful touch. By selecting your trade upfront, the app reorganizes the dashboard to surface the label categories most relevant to your work and trims out the ones you are unlikely to need. For a lab or data center environment, selecting Electrical/Datacom keeps the clutter down and puts the right category types front and center.

For Electrical/Datacom, which we focused on specifically, Brady breaks the dashboard into 7 label categories: blank, label layouts, breaker box, flags, patch panel, sleeves, terminal blocks, and cable wraps. Each category is tailored to the types of labeling jobs common to that trade, so rather than browsing through a generic list, you are working from a focused set of options that actually map to what you are doing in the field or in the rack.

How We Labeled the Cables

For this deployment, we focused on three label types: flags, wraps, and blank tape. Each breakout in the batch received two labels. The trunk got a self-laminating wrap label identifying it as a 4x100G cable, and each breakout strand got a durable flag label identifying it as A-100G, B-100G, C-100G, or D-100G. This gives anyone pulling cables in the rack immediate context on both the cable type and the specific strand without having to trace anything back to a patch panel or documentation.

Brady M511 with labeled breakout cable.

Example breakout cable with flags A-D, with speeds and cable wraps.

Selecting a label category in the app automatically prompts you to use the installed cartridge material, preventing you from accidentally designing something that does not match the loaded material. From there, you get a live view of the printable area and full control over the layout.

Designing in the App

The design toolset in Brady Express is more capable than the compact hardware would suggest. You can add and format text, insert images, place barcodes, add dates, draw shapes, build sequences for batch numbering, and import data from a spreadsheet or scan from an external scanner. For repetitive labeling jobs like cable runs, the sequence and import features alone save significant time compared to designing labels one at a time.

Brady preloads 20+ barcode types, 85+ fonts, and 1,400+ symbols directly in the app, and if the built-in library does not cover your needs, you can upload your own fonts as well. The app also supports 35 languages, making it a practical option for teams operating across different regions or facilities. None of it requires an internet connection or a separate design tool to put together a professional label.

When you back out of a label design, the app prompts you to save it as a template. In practice, this proved more useful than expected. Working through multiple cycles of cables, having a saved template meant we could pull up the same design each time without rebuilding it or remembering which font size and styling were used in the previous batch. It keeps the labeling uniform throughout the entire run and reduces the small decisions that slow you down mid-job.

Brady Express also includes a feature called Brady Voice, which lets you speak to create labels instead of typing everything manually. For longer text strings or repetitive label content, this saves a noticeable amount of time. In a busy lab environment where your hands may already be occupied, it is a practical addition that offers more than novelty.

Compatibility and Multi-Device Printing

The Brady Express app works across a solid lineup of Brady printers, including the M211, M610, M611, M511, M710, S3700, i4311, i5300, and i7500. Connectivity varies by model. The M211, M610, and i7500 support one connected device at a time; the i4311 supports up to four; and the M511 sits in the top tier alongside the M611, M710, S3700, and i5300, supporting up to five simultaneous connected devices.

Brady M511 with Brady Express Labels mobile app.

That last point matters in a lab or data center setting. With five devices connected at once, multiple technicians can have the app open and queued to the same printer without anyone having to disconnect and reconnect. For a team working through a large batch of cables, that kind of parallel workflow adds up quickly.

Conclusion

The Brady M511 is one of those tools that is hard to appreciate until you actually have it in your hands and a real job in front of you. On paper, it is a compact Bluetooth label printer. In practice, it is the thing that finally gave the StorageReview lab a labeling workflow that we will actually use in day-to-day operation.

Durability is not a concern. MIL-STD-810G ratings, a battery good for 1,000+ labels per charge, and a material lineup covering vinyl, polyester, self-laminating wraps, nylon cloth, heat shrink, and more mean the M511 travels well beyond a lab setting. Remote sites, field deployments, warehouse floors, it handles them all. The 65-foot Bluetooth range and simultaneous connectivity for up to 5 devices reinforce its value as a shared team tool rather than something that gets passed around from person to person.

The experience from first pairing to finished labels is frictionless. The Brady Express app is well thought out; the trade-based setup keeps the interface focused, and features like template saving, Brady Voice, and sequence printing noticeably speed up your workflow. For larger or more complex labeling projects, Brady Workstation on the desktop extends that further with deeper design control, batch printing, and the full Product and Wire ID suite for teams that need to manage labeling at scale. For our 800G cable deployment, having a consistent, readable labeling system across every breakout strand is the kind of detail that pays off every time someone works in that rack. And with rack labeling, server identification, and asset tagging all on the roadmap as the lab projects continue, the M511 will stay busy.

On price, the standalone M511 at $399.99 is a straightforward buy for any facility serious about infrastructure organization. The M511-KIT at $551.99 is the most valuable entry point for most users, bundling a hard case, multiple label cartridges, a battery bank, mounting accessories, and Brady software. For a team starting from scratch, it covers everything needed in one purchase, and at that price point, the value is hard to argue with.

After 25 years of “just follow the cable and see where it ends up,” the StorageReview lab is finally labeled. It only took an 800GbE refresh and a little organizational shame to get us here.

Product Page – Brady M511 Portable Bluetooth Label Printer

The post Brady M511 Review: We Finally Labeled Our Lab appeared first on StorageReview.com.

NerdioCon ’26 Focuses on Azure Virtual Desktop, Windows 365, and AI Convergence

2 April 2026 at 14:26
nerdiocon 26 event graphic nerdiocon 26 event graphic

We’ve attended many events over the years, but something about NerdioCon feels different. Maybe it’s the focus. Maybe it’s the people. Or perhaps it’s that when you’re interested in Azure Virtual Desktop (AVD) and Windows 365 (W365), you find yourself surrounded by hundreds of others who care just as much as you do. Whatever the reason, we’re genuinely excited to attend NerdioCon in Palm Springs from May 4-6.

nerdiocon 26 event graphic

We cover End User Computing (EUC), virtualization, and increasingly, the intersection of AI. AVD and W365 are central to this convergence. These are no longer niche solutions; they are strategic platforms for organizations aiming to modernize their desktops without sacrificing control or user experience. Over the past few years, Nerdio has positioned itself at the heart of this transformation, and NerdioCon represents the peak of that momentum. Since we have not attended this event in the past, we reached out to recognized EUC experts for more information for this article.

Bernhard Tritsch captured it perfectly when he said, “AVD and Windows 365 knowledge in perfect combination with a spectacular venue – that’s NerdioCon in Palm Springs. This is the place to be if you want to learn how to provision, manage, and optimize Windows desktops on Azure.” We feel that the combination of deep technical knowledge and a destination that encourages real conversations is hard to replicate. Palm Springs is not just a backdrop; it creates an atmosphere where people are likely to linger, talk shop, and go deeper and more truthfully than they might in a rather sterile convention center ballroom.

Niall Jennings seemed to agree with Bernhard when he reflected on his time at NerdioCon. “I last attended NerdioCon in 2023 when it was still in Cancun, and it’s great to see it has only grown bigger and better since moving to Palm Springs. The sales and technical streams were clearly signposted, so it was easy to choose the sessions most relevant to me. I found the technical breakout sessions especially valuable from both an education and networking perspective. It’s a great opportunity to learn new things and connect with vendors across the virtual desktop space. The all-inclusive resort certainly helped with the networking! Overall, great event, hoping to get back soon.” We totally agree with Niall’s view on how transparent the topics are, which makes it easier to pick sessions relevant to your interests.

There is also the people factor. Tom Dodds from 10ZiG put it this way: “I’ve been to the past two NerdioCon in Punta Cana and Palm Springs. It’s fair to say that Nerdio really knows how to put on an event. It is the perfect balance of great people, great content, and great solutions, which all come together to make a thoroughly enjoyable and rewarding event.” That balance is critical; too much marketing and you tune out. Too much deep technical content without context, and you lose part of the audience. NerdioCon seems to walk that line intentionally.

Trentent Tye from ControlUp adds another layer to why this event matters. “You know what you get when you bring together a bunch of top-tier talent that really knows their AVD stuff? NerdioCon! If you’re ever looking for inspiration, knowledge, or ideas for your AVD environment, the people who attend NerdioCon will have your back! We agree with Trentent that an IT community is not about dispersing marketing fluff; it is about a deep understanding of the technology, and with cloud-based desktops, the technology can move quickly. Best practices evolve. Microsoft introduces new capabilities. Cost shift. Having a gathering of practitioners who deal with and understand this makes such gatherings invaluable.

Chantelle Morales Smith, Alliance and Channel Manager at Numecent, builds on this and summarizes NerdioCon when she said, “My role at Numecent is centered on collaboration. NerdioCon is the perfect venue to celebrate and explore that. I’m excited to dive in with my peers at Nerdio, Microsoft, and other strategic partners. Just as important is connecting with our customers and the MSP community to understand the challenges they face and how we can better support their evolving needs. It’s all about building a stronger, more connected community together.

NerdioCon is about perspective. It’s easy to get lost in daily testing, writing, benchmarking, and lab work. At StorageReview.com, we dedicate a lot of time and energy to evaluating platforms, trying out different configurations, and considering how emerging technologies like AI will intersect with desktop virtualization. Events like this give us a chance to step back, talk with others outside our usual circle, and see the bigger picture. To do our job well, we need to understand what customers are struggling with, what people in the ecosystem are building around AVD and Windows 365, and where the market is headed in the coming year.

We look forward to hallway conversations almost as much as the keynotes. Some of our best ideas for articles and projects come from informal chats at events like these. A casual comment about image sprawl can turn into an in-depth discussion on automation. A quick demo of a monitoring tool can ignite a broader conversation about observability in cloud desktops. Those accidental moments are hard to plan, but they are often the most valuable.

Yes, we are excited to be heading to NerdioCon. We’re eager to learn about the latest technology in cloud-based desktops. We’re also looking forward to reconnecting with people we’ve met before and meeting new ones who share our passion for AVD and Windows 365. Most of all, we’re excited to see how Nerdio has evolved its platform and matured its ecosystem. We believe we’ll return with new ideas, contacts, and a clearer understanding of where the cloud desktop industry is headed next.

NerdioCon 2026 will be held at the La Quinta Resort & Club. 49-499 Eisenhower Drive in La Quinta, near Palm Springs, California, from May 4-7. You can register for the event here.

The post NerdioCon ’26 Focuses on Azure Virtual Desktop, Windows 365, and AI Convergence appeared first on StorageReview.com.

AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROCm Stack

2 April 2026 at 11:50

AMD has released its MLPerf Inference v6.0 results, positioning the Instinct MI355X GPU as a scalable inference platform across single-node, multinode, and heterogeneous deployments. The submission extends beyond incremental gains by adding new workloads, demonstrating cluster-scale throughput exceeding 1 million tokens per second, and validating reproducibility across a growing partner ecosystem.

CDNA 4 Architecture Targets High-Capacity Inference

The Instinct MI355X GPU is based on AMD’s CDNA 4 architecture built on a TSMC 3nm | 6nm FinFET process (it uses a dual-process chiplet design: the compute dies (XCDs) use TSMC’s 3nm node while the I/O dies use 6nm), integrating 185 billion transistors and supporting FP4 and FP6 data formats. It’s worth noting that this is across the entire multi-chiplet package, not a monolithic die. Each GPU includes up to 288GB of HBM3E memory, enabling support for models up to 520 billion parameters on a single device. AMD positions this combination of compute density and memory capacity as critical to large-model inference without excessive model partitioning.

The platform is available in UBB8 configurations with both air-cooled and direct liquid-cooled options, aligning with data center deployment requirements.

Multinode Throughput Surpasses 1 Million Tokens per Second

A key result from this round is AMD surpassing 1 million tokens per second at the cluster scale. Using Instinct MI355X GPUs, AMD achieved this threshold on Llama 2 70B in both Server and Offline scenarios, and on GPT-OSS-120B in Offline.

AMD MLPerf 1M tokens per second graphic

These results reflect a shift toward evaluating inference performance at the cluster level rather than per accelerator. Aggregate throughput and time-to-serve are increasingly used to determine production readiness for large-scale AI deployments.

AMD also demonstrated efficient scaling. On Llama 2 70B, a configuration of 11 nodes and 87 GPUs reached over 1 million tokens per second across Offline, Server, and Interactive scenarios, with scale-out efficiency ranging from 93% to 98%. On GPT-OSS-120B, a 12-node, 94-GPU cluster achieved similar throughput with over 90% scaling efficiency. These results indicate that performance gains translate effectively as deployments expand beyond a single system.

Generational Gains and Competitive Single-Node Performance

AMD reported a 3.1x performance increase on Llama 2 70B Server compared to the prior Instinct MI325X generation, reaching 100,282 tokens per second. The improvement reflects both architectural changes and ROCm software optimizations. Offline scores improved by 4.4x and Server scores improved by 4.8x compared to prior rounds. These gains are primarily attributed to FP4 quantization.

AMD Inference results vs previous gen graphic

In single-node comparisons, MI355X demonstrated competitive positioning against NVIDIA platforms. On Llama 2 70B, AMD matched NVIDIA B200 in Offline throughput, reached near parity in Server performance, and exceeded Interactive performance. Against Nvidia’s B300, AMD’s GPU delivers 92% in offline mode, 93% in server mode, and exceeds it with 104% in interactive mode.

First-Time Model Enablement Expands Coverage

MLPerf Inference v6.0 includes several new workloads, and AMD used this round to demonstrate rapid model enablement. GPT-OSS-120B, a mixture-of-experts model, was introduced for the first time and achieved competitive results compared to NVIDIA systems across both Offline and Server scenarios.

AMD also submitted results for Wan-2.2 text-to-video generation, marking its entry into multimodal and generative video inference. While the official submission focused on Single Stream latency, the results were competitive with those of existing platforms. Post-submission tuning further improved performance, indicating headroom for optimization as software matures.

These additions highlight AMD’s focus on expanding beyond traditional LLM benchmarks to support emerging AI workloads.

ROCm Software Enables Scaling and Heterogeneous Inference

AMD attributes much of the performance and scalability to its ROCm software stack. Enhancements include optimized FP4 execution, improved GPU-to-GPU communication for distributed inference, and support for dynamic workload distribution across heterogeneous environments.

AMD MLPerf inference results instinct mI355x graphic

The initial MLPerf heterogeneous submission was developed using three AMD Instinct GPU models: MI300X, MI325X, and MI355X. Submitted by Dell and MangoBoost, the configuration achieved 141,521 tokens per second on Llama 2 70B Server and 151,843 tokens per second on Llama 2 70B Offline.

Worth noting, the AMD Instinct MI355X platform was located in Dell’s lab in the United States, while the Instinct MI300X and MI325X platforms were in Korea. This demonstrates the capability to coordinate systems across different geographic locations.

Ecosystem Growth and Reproducibility

AMD’s partner ecosystem expanded in this MLPerf round, with nine companies submitting results across multiple Instinct GPU generations. Participating vendors included Cisco, Dell, Giga Computing, HPE, MangoBoost, MiTAC, Oracle, Supermicro, and Red Hat.

Partner submissions closely matched AMD’s internal results, typically within 4% and, in some cases, within 1%. This consistency indicates that performance is reproducible across OEM and cloud platforms, reducing deployment risk and improving confidence in real-world outcomes.

The post AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROCm Stack appeared first on StorageReview.com.

Dell Technologies Enhances PowerProtect Portfolio for Improved Cyber Resilience

2 April 2026 at 11:45
Dell PowerProtect open top view Dell PowerProtect open top view

Dell Technologies has announced several updates to its PowerProtect portfolio, focusing on management simplicity, integrated artificial intelligence, and expanded hardware options for mid-sized environments. These enhancements target the growing complexity of distributed workloads across on-premises, edge, and cloud infrastructure.

PowerProtect Data Manager Evolution

The latest iteration of PowerProtect Data Manager introduces a unified dashboard that centralizes visibility across distributed systems. This interface consolidates monitoring into a single view to reduce operational overhead and provide a clearer picture of protection status across the enterprise.

Dell is also integrating a new AI Assistant directly into the Data Manager UI. This tool provides contextual guidance and intelligent navigation to assist administrators in troubleshooting and optimizing configurations. By offering proactive recommendations, the assistant aims to accelerate problem resolution and simplify compliance auditing processes.

Advanced Anomaly Detection and Security

Security capabilities within Data Manager now include expanded anomaly detection for Dell PowerStore snapshots. This feature is designed to identify potential ransomware threats early in the data lifecycle. All anomaly signals from workloads, storage, and protection policies are now aggregated on a dedicated landing page in the UI, enabling faster response to irregularities.

To meet evolving regulatory requirements and NIST standards, the PowerProtect Data Domain Operating System now supports TLS 1.3. This update ensures that the underlying infrastructure remains compliant with modern security protocols and encrypted communication standards.

PowerProtect Data Domain DD3410 Appliance

The PowerProtect Data Domain DD3410 is now available, targeting medium-sized businesses and remote office/branch office (ROBO) locations. Occupying a 2U footprint, the appliance scales from 8TB to 40TB of usable capacity. It is designed for low power and cooling requirements while maintaining the security features found in larger Data Domain models.

Dell PowerProtect open top view

 

The DD3410 supports both traditional and modern workloads and integrates natively with PowerStore for streamlined backup and recovery. Furthermore, the appliance is now supported as a vault target within the PowerProtect Cyber Recovery ecosystem, bringing enterprise-grade vaulting to smaller sites.

Storage Efficiency and Cyber Recovery

Data Domain continues to lead in storage efficiency, with real-world telemetry from Data Manager users indicating data reduction ratios as high as 75:1. This level of deduplication significantly lowers the total cost of ownership by reducing the physical storage footprint required for long-term retention.

For organizations deploying Cyber Recovery and CyberSense, Dell has introduced Cyber Recovery Essentials. This offering provides pre-validated reference architectures and standardized configurations to accelerate deployment. Additionally, the portfolio now includes enhanced analytics support for Oracle RAC with ASM, broadening the scope of protected database environments.

The post Dell Technologies Enhances PowerProtect Portfolio for Improved Cyber Resilience appeared first on StorageReview.com.

NVIDIA Sets MLPerf Inference v6.0 Records with Blackwell Ultra Platform

1 April 2026 at 18:35
NVIDIA MLPerf v6 graphic NVIDIA MLPerf v6 graphic

NVIDIA has published results for MLPerf Inference v6.0, highlighting system-level gains driven by tight co-design across hardware, software, and models. The company positions inference throughput and token economics as the primary metrics for AI factory performance, moving beyond peak accelerator specifications to measured output under real workloads.

In this round, systems built on NVIDIA Blackwell Ultra GPUs delivered the highest throughput across all submitted models and scenarios. The ecosystem around the platform also expanded, with 14 partners submitting results, including major OEMs, cloud providers, and integrators such as ASUS, Cisco, CoreWeave, Dell Technologies, GigaComputing, Google Cloud, HPE, Lenovo, Nebius, Netweb Technology, QCT, Red Hat, Supermicro, and Lambda.

Expanded Benchmark Coverage Reflects Emerging Workloads

MLPerf Inference v6.0 introduces several new benchmarks to represent current AI deployments better. NVIDIA was the only vendor to submit across all new tests, spanning large language models, multimodal systems, generative video, and recommendation engines.

Key additions include DeepSeek-R1 Interactive, which evaluates higher interactivity with faster token delivery and reduced time to first token compared to prior server scenarios. The suite also adds Qwen3-VL-235B-A22B, marking the first multimodal vision-language model in MLPerf Inference, and GPT-OSS-120B, a mixture-of-experts reasoning model tested across offline, server, and interactive scenarios.

Scenario DeepSeek-R1 GPT-OSS-120B Qwen3-VL Wan 2.2 DLRMv3
Offline 2,494,310 tokens/sec* 1,046,150 tokens/sec 79 samples/sec 0.059 samples/sec 104,637 samples/sec
Server 1,555,110 tokens/sec* 1,096,770 tokens/sec 68 queries/sec 21 secs**
(Single Stream)
99,997 queries/sec
Interactive 250,634 tokens/sec 677,199 tokens/sec *** *** ***

* Not a new scenario in MLPerf Inference v6.0
** Wan 2.2 features a single stream scenario, which measures end-to-end request latency, instead of a server scenario. Lower is better.
*** Not tested in MLPerf Inference v6.0

Generative media and recommendation workloads are now included. The Wan 2.2 text-to-video model features both latency-sensitive and throughput-focused tests, while DLRMv3 replaces previous recommendation benchmarks with a transformer-based architecture that boosts compute intensity and model complexity.

Software Optimization Drives Measurable Gains

A notable aspect of this submission is the performance uplift achieved on existing hardware through software updates. NVIDIA reports up to 2.7x higher token throughput on the GB300 NVL72 platform for DeepSeek-R1 server scenarios compared to results from six months prior. This improvement translates to materially lower cost per token and higher utilization of deployed infrastructure.

NVIDIA MLPerf v6 graphic

These gains are attributed to updates in the TensorRT-LLM stack and associated frameworks. Kernel-level optimizations and fusion techniques reduce execution overhead, while improved attention data parallelism more effectively balances workloads across GPUs. Additional enhancements in the Dynamo distributed inference framework enable disaggregated serving, allowing independent optimization of prefill and decode phases.

For mixture-of-experts models, techniques like Wide Expert Parallel distribute expert weights across GPUs to reduce memory bottlenecks. Multi-token prediction boosts compute efficiency in low-batch, latency-sensitive scenarios by generating and validating multiple tokens at once. KV-aware routing further enhances scheduling by directing inference requests based on estimated compute costs.

Benchmark GB300 NVL72
v5.1
GB300 NVL72
v6.0
Speedup
DeepSeek-R1
(Server)
2,907 tokens/sec/gpu 8,064 tokens/sec/gpu 2.77x
DeepSeek-R1
(Offline)
5,842 tokens/sec/gpu 9,821 tokens/sec/gpu 1.68x
Llama 3.1 405B
(Server)
170 tokens/sec/gpu 259 tokens/sec/gpu 1.52x
Llama 3.1 405B
(Offline)
224 tokens/sec/gpu 271 tokens/sec/gpu 1.21x

NVIDIA also demonstrated continued scaling on established models. On Llama 3.1 405B, the GB300 NVL72 platform achieved a 1.5x performance increase in server scenarios, indicating ongoing optimization for dense LLMs alongside newer architectures.

Open Ecosystem and Framework Integration

Submissions across new workloads leveraged a mix of NVIDIA and open-source frameworks. The Qwen3-VL benchmark used the vLLM framework, reflecting the rapid development in multimodal inference optimization. The Wan 2.2 text-to-video results were powered by TensorRT-LLM VisualGen, targeting diffusion-based pipelines on GPUs.

For DLRMv3, NVIDIA combined its recsys-example framework with GPU-accelerated embedding lookup technologies to handle the increased demands of transformer-based recommendation models. These integrations underscore the role of the broader software ecosystem in extracting performance from the underlying hardware.

Scale-Out Performance with InfiniBand

NVIDIA also showcased large-scale inference performance using four GB300 NVL72 systems connected via Quantum-X800 InfiniBand. This setup, with a total of 288 Blackwell Ultra GPUs, marks the largest MLPerf Inference submission to date and achieved system-level throughput of millions of tokens per second on DeepSeek-R1.

DeepSeek-R1 | 4x GB300 NVL72 Tokens/Second
Offline 2,494,310
Server 1,555,110

The results highlight the importance of high-performance interconnects in scaling inference workloads, particularly for distributed LLM serving and high-throughput batch processing.

Toward Service-Level Benchmarking

Looking ahead, NVIDIA is helping develop the MLPerf Endpoints within the MLCommons consortium. This upcoming benchmark aims to measure deployed inference services using real API traffic, giving insight into latency, throughput, and efficiency at the service level rather than just at the component level.

As AI workloads develop into agentic systems with longer context windows, benchmarks that measure end-to-end service performance are expected to become more important for both cloud providers and enterprise deployments.

The post NVIDIA Sets MLPerf Inference v6.0 Records with Blackwell Ultra Platform appeared first on StorageReview.com.

Veeam Releases Open-Source MCP Server for Backup and Recovery Intelligence

31 March 2026 at 17:03
Veeam Intelligence MCP Server Veeam Intelligence MCP Server

Veeam has launched the Veeam Intelligence MCP Server, designed to bring backup, recovery, malware, and compliance information into broader enterprise IT operations. The server is built to give teams a single conversational interface for day-to-day operations, planned infrastructure changes, and incident response, with customer control over deployment, data exposure, and integration with AI clients.

Veeam Intelligence MCP Server

Veeam states that the server is designed for environments where operational signals cover backup solutions, monitoring tools, ticketing systems, security platforms, and other systems. Veeam says it is designed to reduce the manual work of checking multiple consoles, correlating alerts across platforms, and moving information between teams during an outage or investigation. Instead of requiring staff to work through separate interfaces, the system is designed to make Veeam Intelligence available within broader operational workflows through the Model Context Protocol, or MCP.

Through MCP, the server enables organizations to integrate Veeam’s data protection and recovery with information from external systems into a single workflow. That includes the ability to compare Veeam’s protection, recovery, malware, and compliance signals with events from IT service management platforms, cloud environments, security products, storage systems, and monitoring tools. The goal is to make it easier for operators to ask natural-language questions, investigate issues that span multiple systems, and get a consolidated operational view without switching between separate products.

In its current form, the server focuses on read-only access, cross-system visibility, and investigation workflows. No destructive or configuration-changing actions are enabled by default. Queries are described as authenticated, authorized, and fully auditable, with a separation between present-day intelligence features and any future action-based capabilities. It is deployed locally as a Docker container and is fully operated and governed by the customer or their service provider.

Veeam also says customers can choose which MCP-compatible AI clients to use, including local or self-hosted large language models such as ChatGPT and Claude, depending on their own security and data sovereignty requirements.

Use Cases

Veeam highlights morning health checks, pre-change validation, ransomware triage, and root cause analysis as use cases for the server. These scenarios focus on giving teams a prioritized view of health and recoverability; confirming that backup jobs, repositories, and configuration backups are resilient to upcoming changes; correlating malware events with affected workloads and clean restore points; and consolidating information such as session histories, repository health, and proxy states when jobs fail.

It will launch with support for Veeam Backup & Replication, Veeam ONE, and Veeam Service Provider Console. The server is open source and available on GitHub.

The post Veeam Releases Open-Source MCP Server for Backup and Recovery Intelligence appeared first on StorageReview.com.

Dell PowerEdge R5715 Review: 2U Single-Socket AMD EPYC for Storage-Forward Workloads

31 March 2026 at 14:56

The PowerEdge R5715 is the second part of Dell’s SMB-focused extension to the 17th Generation PowerEdge family, starting with different priorities than its 1U sibling. Where the R4715 optimizes for compute density and core-per-rack-unit efficiency, the R5715 is built around storage capacity and I/O expandability in a 2U single-socket footprint. Readers coming from our R4715 review will find the platform fundamentals familiar: the same 5th Generation AMD EPYC processor family, the same 24-slot DDR5 memory architecture, and the same iDRAC 10 management stack. What changes slightly is the task the R5715 is asked to perform.

Dell PowerEdge R5715 front with bezel.

Our review unit was configured with a single AMD EPYC 9015, the 8-core entry in the Turin lineup, paired with 384GB of DDR5 and a BOSS RAID1 boot configuration. The R5715’s 12-bay 3.5-inch storage backplane was the focus of our testing, which is exactly where the 9015 makes sense. Workloads like file serving, backup targets, and retail video surveillance don’t need 32 cores; they need drive density, sustained throughput, and a reliable management story. The 9015 keeps power consumption and licensing costs low, while the platform delivers up to 288TB of raw storage capacity in a single 2U node.

The R5715 also increases the PCIe Gen5 slots to four, up from the R4715’s three, and adds support for an extra OCP 3.0 networking slot, providing more room to grow as I/O demands rise. Both platforms support 100 GbE and 400 GbE via PCIe AIC, making them a capable fit for environments with high-bandwidth networking requirements; however, neither platform officially supports Fibre Channel connectivity. Also neither platform supports GPUs or DPUs. They run on the same 800W and 1100W PSU options in Platinum or Titanium efficiency grades, with fault-tolerant redundancy supported and air cooling throughout.

Dell PowerEdge R5715 Specifications

The table below highlights the physical and hardware specifications for the Dell PowerEdge R5715 platform.

Specification Dell PowerEdge R5715
Processor
Processor One 5th Generation AMD EPYC 9005 Series processor, up to 32 cores
Form Factor 2U rack server
Memory
DIMM Slots 24 DDR5 DIMM slots
Maximum Memory 1.5 TB (up to 64 GB per DIMM)
Memory Speed Up to 5200 MT/s
Memory Type Registered ECC DDR5 RDIMMs only
Storage
Internal Controllers (RAID) PERC H365i, H965i
Internal Boot BOSS-N1 DC-MHS
External HBAs N/A
Front Drive Bays 12x 3.5-inch SAS/SATA
16x 2.5-inch SAS/SATA
Power
Power Supplies Platinum 800 W, 1100 W
Titanium 800 W, 1100 W
FTR supported
Cooling & Fans
Cooling Options Air cooling
Fans Up to six hot plug fans
Dimensions
Height 86.8 mm (3.41 inches)
Width 482.0 mm (18.97 inches)
Depth (with bezel) 802.4 mm (31.59 inches)
Depth (without bezel) 801.51 mm (31.55 inches)
Bezel Optional metal bezel
Networking & Expansion
OCP Network Options 2x OCP NIC 3.0 (optional), 1GbE, 10GbE, 25GbE
Slot 4: 1×16 Gen5 OCP 3.0
Slot 10: 1×16 Gen5 OCP 3.0
Embedded NIC 1 Gb dedicated BMC Ethernet port
PCIe AIC NIC 100 GbE and 400 GbE; NDR VPI (400 GbE)
PCIe Slots Up to 4 Gen5 PCIe slots (x16 connectors)
Slot 2: 1×16 Gen5 Full Height
Slot 3: 1×16 Gen5 Full Height
Slot 7: 1×16 Gen5 Full Height
Slot 9: 1×16 Gen5 Full Height
GPU Options N/A
Ports
Front Ports 1x USB 2.0 Type-A (optional LCP KVM)
1x USB 2.0 Type-C (HOST/BMC Direct)
1x MiniDisplayPort (optional LCP KVM)
Rear Ports 2x USB 3.1 Type-A
1x VGA
1 Gb dedicated BMC Ethernet port
Internal Ports 1x USB 3.1 Type-A
Management
Embedded Management iDRAC10, iDRAC Direct, iDRAC RESTful API with Redfish, RACADM CLI, Quick Sync 2 wireless module
OpenManage Software OpenManage Enterprise (OME), OME Power Manager, OME Services, OME Update Manager, OME APEX AIOps Observability, OME Integration for VMware vCenter, OME Integration for Microsoft System Center, OpenManage Integration for Windows Admin Center
Tools IPMI
Integrations OpenManage Integrations: Red Hat Ansible Collections, Terraform Providers
Change Management Dell Repository Manager, Dell System Update, Enterprise Catalogs, Server Update Utility (SUU)
Security
Security Features Cryptographically signed firmware, Data at Rest Encryption (SEDs with local or external key mgmt), Secure Boot, Secured Component Verification (hardware integrity check), Secure Erase, Silicon Root of Trust, System Lockdown (requires iDRAC10 Enterprise or Datacenter), TPM 2.0 FIPS/CC-TCG certified, Chassis Intrusion Detection, AMD Secure Encrypted Virtualization (SEV), AMD Secure Memory Encryption (SME)
Operating Systems & Hypervisors
Supported OS / Hypervisors Canonical Ubuntu Server LTS, Microsoft Windows Server with Hyper-V, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, VMware ESXi

The Dell PowerEdge R5715 is a 2U single-socket rack server built around AMD’s 5th Generation EPYC 9005 Series platform. Positioned as a storage-forward platform for organizations that need high capacity and solid I/O without the cost overhead of a dual-socket design, the R5715 targets workloads such as databases, file shares, backup targets, and virtualization, where a single powerful EPYC processor can handle the load more efficiently than a pair of older-generation CPUs. With support for up to 288 TB of raw storage and four PCIe Gen5 expansion slots, the R5715 punches well above its price class

Exterior and Front Panel

The R5715 ships with an optional metal bezel featuring Dell’s signature hexagonal mesh pattern. The bezel snaps cleanly onto the chassis and exposes the front-panel controls on the right-hand side: a power button, a USB 2.0 Type-C port for direct BMC access, an iDRAC Direct port, and a system ID button. The chassis itself measures 3.41 inches tall, 18.97 inches wide, and 31.55 inches deep without the bezel, fitting standard 2U rack positions. Build quality is enterprise-grade throughout, with tool-less drive-bay latches and blue retention clips used consistently across internal components to enable quick-release access.

Dell PowerEdge R5715 front right ear.

Storage Configuration

The review unit comes with a 12x 3.5-inch SAS/SATA front-bay setup, featuring four bays filled with 20 TB SATA 6 Gb/s 7.2k large-form-factor HDDs and eight bays left empty for future upgrades. An alternative 16x 2.5-inch SAS/SATA backplane configuration is also available, depending on workload requirements. RAID duties are managed by either the PERC H365i or the higher-tier PERC H965i internal controller. Boot is handled separately through a dedicated rear BOSS-N1 DC-MHS module, isolating the OS from the data pool. This clean design choice prevents the common mistake of running OS and workload storage on the same array.

Dell PowerEdge R5715 front drive bays.

Processor and Cooling

The R5715 is a single-socket platform built around AMD’s EPYC 9005 Series, supporting up to 32 cores. The large heatsink is a finned tower cooler with embedded copper heat pipes, mounted via six captive screws to the SP5 socket. Cooling is all-air; up to six hot-plug fans move airflow front-to-back through the chassis. Liquid cooling is not available on this platform.

Dell PowerEdge R5715 top overview lid off.

Memory

The R5715 carries 24 DDR5 DIMM slots arranged in two banks flanking the CPU socket. The platform is RDIMM-only; no support for UDIMMs or LRDIMMs. Maximum capacity tops out at 1.5 TB using 64 GB RDIMMs per slot, running at up to 5200 MT/s. The review unit ships with several slots populated, leveraging EPYC’s multi-channel memory architecture to deliver high aggregate bandwidth across the memory subsystem.

Dell PowerEdge R5715 heatsink and memory.

PCIe Expansion and Networking

The R5715 offers up to four full-height PCIe Gen5 x16 slots across slots 2, 3, 7, and 9, distributed across five labeled riser positions (Risers 1 through 5) visible throughout the chassis interior. Two additional OCP NIC 3.0 slots (slots 4 and 10, Gen5 x16) support 1GbE, 10GbE, or 25GbE OCP network adapters. For high-bandwidth connectivity, PCIe AIC NICs support up to 100 GbE and 400 GbE, with NDR VPI (400 GbE). A dedicated 1 Gb BMC Ethernet port is embedded on the rear panel for out-of-band iDRAC management. There are no GPU options on the R5715; this is a storage and compute platform, not an accelerator chassis.

Dell PowerEdge R5715 riser 3 OCP slot.

iDRAC10 Management

Remote management for the R5715 is handled by iDRAC10, the same platform Dell ships as standard across its entire 17th-generation PowerEdge lineup, including the PowerEdge R770 and PowerEdge R7725 we previously reviewed. The interface is consistent across the portfolio, meaning administrators already familiar with iDRAC on other PowerEdge platforms will feel at home immediately.

The iDRAC10 dashboard provides a full, at-a-glance health summary of every major subsystem: System Health, Processor, Memory, Cooling, Storage, Voltages, Power Supplies, Batteries, and Intrusion Detection. The review unit shows that all subsystems were reporting as healthy at the time of testing. System information and firmware version details are displayed directly on the dashboard alongside license status, which, on the review unit, is confirmed as Enterprise. The Task Summary panel tracks pending, in-progress, and completed jobs, with the review unit showing completed jobs from an initial provisioning cycle, including a small number with errors and one failed, typical of a fresh deployment.

Drilling into the System Environments section reveals cooling details, including individual fan status, PWM speeds, thermal profile settings, and inlet temperature readings, all in real time. This is especially useful for validating airflow in dense rack configurations or troubleshooting thermal issues without needing physical access to the server.

Power visibility follows the same pattern. The Power Info section breaks down PSU health, current draw, and capacity utilization alongside a rolling historical trend graph. Administrators can quickly see average and peak wattage over time, which is valuable for capacity planning and identifying workload-driven power spikes without needing a separate power monitoring tool.

Together, these views make iDRAC10 a capable out-of-band management solution that covers the full operational lifecycle of the R5715, from initial deployment through day-to-day monitoring, all accessible remotely via browser or the RESTful Redfish API.

Dell PowerEdge R5715 Performance

For performance testing of the Dell PowerEdge R5715, we paired it against its 1U sibling, the Dell PowerEdge R4715. The two platforms share identical memory configurations and the same overall PowerEdge architecture, making them a natural point of comparison. The key differentiator between the two review units is processor selection. The R4715 shipped with an AMD EPYC 9335 32-core processor, while the R5715 arrived with an AMD EPYC 9015 8-core processor.

It is worth noting that both platforms support the same EPYC 9005 Series processor lineup and can be configured with either chip depending on workload requirements. The core count delta between these two units will be reflected in the numbers, but the results reflect how each platform performs as shipped rather than a ceiling comparison between platforms.

Dell PowerEdge R5715 and R4715.

To stress the CPUs across both systems, we used a focused set of compute benchmarks. y-cruncher was used to evaluate raw arithmetic throughput and multithreaded floating point performance. Blender provided a real-world rendering workload that scales with available cores and memory bandwidth. Phoronix Test Suite rounded out the benchmark set with a broader collection of CPU-bound workloads, giving a more complete picture of sustained compute performance across both platforms.

Test System Specifications

  • Platform: Dell PowerEdge R5715
  • CPU: Single AMD EPYC 9015
  • Memory: 384GB DDR5
  • Storage: Boss RAID1

y-cruncher

y-cruncher is a multithreaded, scalable program that can compute Pi and other mathematical constants to trillions of digits. Since its launch in 2009, it has become a popular benchmarking and stress-testing application for overclockers and hardware enthusiasts.

The R5715 tracked predictably against the R4715 across all workload sizes. At 1 billion digits, the R5715 finished in 14.537 seconds against 5.305 seconds on the R4715, and the gap extended consistently from there. At 50 billion digits, the R5715 reached 1,273.734 seconds while the R4715 finished in 445.440 seconds, with the R4715 completing runs roughly 2.8 to 2.9 times faster across the full 1 billion to 50 billion range. Despite running only 8 cores, the EPYC 9015 is purpose-built server silicon with significantly higher memory bandwidth and larger cache than a typical desktop CPU, and it still runs well ahead of what most consumer processors can sustain on the same workloads.

Y-cruncher (lower duration is better) Dell PowerEdge R4715 (AMD EPYC 9335 32-Core | 384 GiB RAM) Dell PowerEgde R5715 (AMD EPYC 9015 8-Core | 384 GiB RAM)
25 Million 0.11 seconds 0.25 seconds
50 Million 0.23 seconds 0.51 seconds
100 Million 0.46 seconds 1.08 seconds
250 Million 1.22 seconds 3.00 seconds
500 Million 2.49 seconds 6.60 seconds
1 Billion 5.30 seconds 14.53 seconds
2.5 Billion 14.58 seconds 41.32 seconds
5 Billion 32.38 seconds 92.99 seconds
10 Billion 71.54 seconds 202.87 seconds
25 Billion 203.40 seconds 576.87 seconds
50 Billion 445.44 seconds 1,273.73 seconds

Blender 4.5

Blender 4.5 is an open-source 3D modeling application. This benchmark was run using the Blender Benchmark CLI utility. The score is measured in samples per minute, with higher values indicating better performance.

The Blender results follow a similar pattern to y-cruncher, with the R4715’s core-count advantage translating directly into rendering throughput. On the Monster scene, the R4715 posted 523.29 samples per minute against 135.21 on the R5715. The Junkshop scene came in at 355.43 versus 88.61, and Classroom landed at 264.70 against 68.48 on the R5715. Across all three scenes, the R4715 delivered roughly 3.8 to 4 times the rendering throughput of the R5715, a slightly wider margin than in Y-Cruncher, reflecting how heavily Blender’s CPU renderer scales with core count when parallelizing ray-tracing workloads across a scene.

Blender 4.5 CPU Benchmark (higher samples per minute is better) Dell PowerEdge R4715 (AMD EPYC 9335 32-Core | 384 GiB RAM) Dell PowerEdge R5715 (AMD EPYC 9015 8-Core | 384 GiB RAM)
Monster 523.29 samples/min 135.21 samples/min
Junkshop 355.43 samples/min 88.61 samples/min
Classroom 264.70 samples/min 68.48 samples/min

Phoronix Benchmarks

Phoronix Test Suite is an open-source, automated benchmarking platform that supports over 450 test profiles and 100+ test suites via OpenBenchmarking.org. It handles everything from installing dependencies to running tests and collecting results, making it ideal for performance comparisons, hardware validation, and continuous integration. We will focus on comparing the R5715 and R4715 against Stream, 7-Zip, Linux kernel build, Apache, and OpenSSL tests.

In Apache web serving throughput, the R4715 reached 177,839.86 requests per second, compared to 123,710.75 on the R5715, one of the closest results across the entire suite. Apache’s ability to achieve reasonable performance even with a lower core count, given sufficient memory bandwidth, keeps the gap narrower here than in more heavily parallelized workloads.

OpenSSL transfer rate showed a wider margin, with the R4715 posting 533,318,299,283 bytes per second compared to 148,168,050,733 bytes per second on the R5715. Cryptographic throughput is one of the workloads that scales most aggressively with thread count, and the separation clearly reflects that.

The Linux kernel compile test produced one of the most pronounced gaps in the suite, with the R4715 finishing in 379.53 seconds compared to 1,244.86 seconds on the R5715. Kernel compilation is among the more direct measures of how many threads a system can bring to bear simultaneously.

7-Zip compression came in at 260,124 MIPS on the R4715 versus 98,555 MIPS on the R5715, consistent with results across the rest of the suite.

Stream memory throughput was 370,228.9 MB/s on the R4715, compared to 230,123.6 MB/s on the R5715.

Phoronix Benchmarks Dell PowerEdge R4715 (AMD EPYC 9335 32-Core | 384 GiB RAM) Dell PowerEdge R5715 (AMD EPYC 9015 8-Core | 384 GiB RAM)
Apache Requests Per Second 177,839.86 123,710.75
OpenSSL Transfer Rate (byte/s) 533,318,299,283 148,168,050,733
Kernel Compile Time Taken (seconds) (lower is better) 379.531 1,244.86
7-ZIP MIPS 260,124 98,555
Stream Throughput (MB/s) 370,228.9 230,123.6

Conclusion

The Dell PowerEdge R5715 is a well-executed storage-focused 2U platform that makes a clear case for single-socket design in the right workload context. Organizations running file services, backup targets, video surveillance, or database workloads that prioritize drive density and I/O expandability over raw compute headroom will find the R5715 a compelling fit. The 12-bay 3.5-inch backplane supporting up to 288 TB of raw capacity, paired with four PCIe Gen5 slots and dual OCP NIC 3.0 support, gives the platform meaningful room to grow without requiring a move to a more expensive dual-socket chassis.

Dell PowerEdge R5715 with Dell 17th gen servers.

The performance results tell a straightforward story. Tested as shipped with the EPYC 9015, the R5715 trails the 32-core R4715 by a predictable margin across every benchmark, but that comparison is somewhat beside the point. The R5715 is not positioned as a compute workhorse, and the EPYC 9015 is not the processor Dell expects most customers to pair with this chassis. Configuring the R5715 with a higher-core-count EPYC 9005 processor significantly closes that gap, and the platform architecture is fully capable of supporting it.

Where the R5715 consistently delivers is in the areas that matter most for its target use cases: storage density, expansion flexibility, power efficiency, and management. iDRAC10 Enterprise provides a mature and consistent out-of-band management experience that carries over directly from the broader 17th-generation PowerEdge portfolio, reducing operational overhead for teams already invested in Dell’s management stack.

For SMB and midmarket buyers looking to consolidate storage workloads into a right-sized single-socket platform without overbuying compute, the R5715 is a strong choice and a natural complement to the R4715 in Dell’s current AMD-based lineup.

Product Page – Dell PowerEdge R5715

The post Dell PowerEdge R5715 Review: 2U Single-Socket AMD EPYC for Storage-Forward Workloads appeared first on StorageReview.com.

❌
❌