Benchmarking Guide: How to Test New PLC/QLC SSDs for App Workloads
Performance TestingStorageBenchmarks

Benchmarking Guide: How to Test New PLC/QLC SSDs for App Workloads

UUnknown
2026-02-25
10 min read
Advertisement

A practical benchmarking plan to evaluate PLC/disaggregated SSDs for DBs, logs and registry workloads — focus on tail latency, endurance, and real workload tests.

Hook: Why you must benchmark PLC / disaggregated-cell SSDs before deploying app stacks

If you manage cloud-hosted databases, logging pipelines or container registries, you know storage is the hidden bottleneck: suddenly high latency, unexplained I/O stalls, or premature drive wear can collapse an otherwise robust platform. Next‑gen PLC (penta-layer cell) and disaggregated‑cell SSDs promise much higher density and lower $/GB — but they also change I/O behavior, tail latency and endurance characteristics in ways that matter to apps. This guide gives a practical, step‑by‑step benchmarking plan for evaluating these drives in realistic app workloads in 2026.

TL;DR — What to test first (most important findings up front)

  • Start with workload mapping: model OLTP DBs, append‑only logging, and container registry patterns separately.
  • Measure tail latency: p95/p99/p999 latency matters more than average IOPS for user‑facing services.
  • Include endurance and thermal tests: accelerated P/E cycling and sustained writes reveal PLC weaknesses most clearly.
  • Use fio + nvme‑cli + SMART: combine synthetic and application‑level tests; capture SMART telemetry and APM metrics continuously.
  • Compare at scale: test with mixed concurrency (queue depths) and namespace configurations (including ZNS where available).

The 2026 context: why PLC and disaggregated‑cell drives are appearing in production

In late 2025 and early 2026 vendors accelerated PLC rollouts to relieve global flash supply pressure driven by AI datacenter demand and to reduce $/GB for archival and large‑capacity tiers. Innovations such as cell splitting and disaggregated‑cell architectures (announced by major fabs in 2025) enable denser arrays. That density changes tradeoffs: higher write amplification sensitivity, different error‑management behavior, and more aggressive thermal throttling. As a result, traditional flash assumptions — stable latency under sustained load, predictable endurance curves — no longer hold across all workloads.

Core benchmarking principles (how to think about SSD benchmarking in 2026)

Benchmarking modern SSDs is not one number — it is a matrix of workloads, concurrency, persistence models, and environmental factors. Follow these principles:

  • Simulate real app I/O patterns (not just sequential/sustained synthetic loads).
  • Prioritize tails and variability — p99/p999 latencies affect user experience and outage risk.
  • Measure telemetry (temperature, SMART, NVMe logs) while tests run.
  • Run endurance tests to reveal P/E limitations and write‑amplification impacts.
  • Document firmware, namespaces and driver versions — small changes change outcomes.

Key metrics to collect

  • Throughput (MB/s) and IOPS
  • Latency percentiles (avg, p50, p95, p99, p999)
  • Queue depth vs latency curves
  • CPU overhead and context switches
  • SMART attributes: media errors, wear leveling, remaining life
  • Power and temperature profiles
  • Write amplification (measured or inferred via controller telemetry)

Define the workloads — DBs, logging, container registries

Below are focused workload profiles that reflect common cloud app stacks. For each, I provide the fio test pattern, recommended duration, and what to watch.

1) OLTP databases (MySQL/Postgres, Redis persistence)

Characteristics: small random I/O, high read ratio for reads‑heavy workloads but critical small write latencies for commit durability. Typical block sizes 4K–8K. Measure sync fsync behavior and O_DIRECT latency.

Recommended fio job (example):

[fio_oltp]
ioengine=libaio
rw=randrw
rwmixread=90
bs=8k
numjobs=16
iodepth=32
direct=1
sync=0
runtime=1800
group_reporting
filename=/dev/nvme0n1
  

Notes: run with varying queue depths (iodepth 1, 8, 32) and numjobs to emulate connections. For fsync paths, use small write jobs with fsync: sync=1 and test commit latency at p99.

2) Logging/append workloads

Characteristics: mostly sequential or small sequential appends, frequent fsync or batch commits, high sustain write pressure. PLC drives with high density are sensitive to sustained sequential writes because of internal compaction and GC behavior.

[fio_logs]
ioengine=libaio
rw=write
bs=128k
numjobs=4
iodepth=16
direct=1
runtime=3600
stonewall
filename=/dev/nvme0n1
  

Notes: include runs with periodic fsyncs (simulate logging frameworks that call fsync every N records), and measure SMART metrics after each run to capture wear and media errors.

3) Container registry and artifact stores

Characteristics: mixed reads of large objects and random metadata operations (small reads/writes for indexes), bursty concurrency when CI/CD triggers pulls. Test both large sequential reads (pulls) and many small metadata operations at moderate concurrency.

[fio_registry_large_reads]
ioengine=libaio
rw=read
bs=1m
numjobs=8
iodepth=32
direct=1
runtime=1200
filename=/dev/nvme0n1

[fio_registry_meta]
ioengine=libaio
rw=randread
bs=4k
numjobs=16
iodepth=64
direct=1
runtime=1200
filename=/dev/nvme0n1
  

Design the test matrix

A thorough matrix varies these axes: workload type, queue depth, concurrency, host filesystem (XFS/ext4/Direct NVMe), NVMe namespaces (single vs multiple), and over‑provisioning (drive vendor OP vs custom). Example matrix rows:

  • OLTP — qdepth 1/8/32 — sync vs async
  • Logging — sustained write 1h/6h/24h — normal temp vs cooled
  • Registry — read burst (8→128 concurrent pulls) — metadata stress
  • Endurance — continuous write at 80% fill until SMART shows wear thresholds
  • Namespace/ZNS — split into small namespaces vs single large namespace

Testbed and environment setup (make results reproducible)

Use consistent, dedicated hardware. Document every component:

  1. Host CPU, memory, kernel version, NVMe driver version.
  2. SSD firmware version and vendor model (capture nvme‑cli output).
  3. Power state settings (disable adaptive power management unless tested).
  4. Thermal conditions and airflow. Use external thermocouples for steady state temp if possible.
  5. Filesystem and mount options (fsync behavior differs between ext4 and XFS; use O_DIRECT tests too).

Example commands to capture environment:

uname -a
nvme list
nvme id-ctrl /dev/nvme0
cat /etc/os-release
  

Tools and telemetry to run alongside fio

  • fio — the core I/O generator for synthetic and pattern tests.
  • nvme-cli — read SMART attributes, namespace info, firmware logs (nvme smart-log).
  • iostat / sar / pidstat — CPU and I/O wait monitoring.
  • blktrace / btt — when you need block layer traces.
  • prometheus exporters — collect metrics over time for dashboards.
  • power meters / PMBus — to measure power draw under load.

Endurance, write amplification, and destructive testing

Endurance testing is essential for PLC drives. Accelerated P/E cycling reveals controller behavior and wear leveling limits faster than waiting months. Key steps:

  1. Set up a continuous full‑drive write using fio with a pattern similar to production writes.
  2. Monitor SMART attributes (media_wear, estimated_remaining_life) at intervals.
  3. Capture write amplification: if controller exposes total bytes written, compare host writes vs media writes to compute WA.
  4. Include power‑loss tests (if supported) to validate metadata journaling and recovery behavior.
# Continuous write for endurance (careful: destructive)
fio --name=endurance --filename=/dev/nvme0n1 --direct=1 --rw=write --bs=1m --iodepth=64 --numjobs=8 --time_based --runtime=86400
  

Caveat: endurance testing is destructive. Run on pre‑production drives and track vendor RMA policies.

Latency tails and QoS: measuring p99/p999

Average latency can be misleading. For interactive database transactions and API services, p99 or p999 spikes cause retries and timeouts. Use fio's percentile reporting and run sufficiently long tests (30–60 minutes) to capture rare events. Example fio flags:

--output-format=json --latency-log --log_avg_msec=1
  

Graph latency histograms and compute distribution. If p99 is >10× average, investigate queue build‑up, thermal throttling or internal GC behavior.

Interpreting results — what to look for and how to decide

When you have raw numbers, use this checklist to interpret:

  • Throughput vs latency tradeoffs: if throughput increases but p99 latency doubles under the same queue depth, prefer drives with better QoS for latency‑sensitive apps.
  • Consistency: look for stable latency curves over time — repeated runs should not diverge greatly.
  • Endurance signals: rapid drop in SMART estimated life or high media errors after accelerated cycles is a red flag for PLC in write‑heavy roles.
  • Thermal throttling: abrupt throughput drops and rising latency correlated with temperature indicate inadequate cooling or aggressive thermal management.
  • Write amplification: WA>>1 indicates inefficient internal GC — optimize overprovisioning or change workload placement.

Advanced strategies and mitigations

If a PLC/disaggregated‑cell drive shows weaknesses, you can often mitigate at the system level:

  • Over‑provisioning: reserve additional space (10–30%) to reduce write amplification.
  • Namespace segmentation or ZNS: use Zoned Namespaces if supported to align app writes with drive GC behavior.
  • Rate limiting and batching: batch small writes or use async fsync where acceptable to reduce fsync storm impact.
  • Cooling and power management: ensure consistent thermal headroom to avoid throttles during sustained writes.
  • Firmware tuning: coordinate with vendors for firmware versions optimized for your workload profile (e.g., DB vs archival).

Case study: hypothetical run that shows the tradeoffs

Example summary from a 2026 lab run comparing a high‑density PLC SSD to an enterprise TLC SSD under OLTP and logging:

  • OLTP (randrw 90/10, iodepth 32): PLC delivered 15% higher average IOPS but p99 latency was 2× compared with TLC.
  • Logging (sustained 6h sequential writes): PLC sustained throughput initially, then dropped 35% after internal GC kicked in; temperature rose 12°C.
  • Endurance: accelerated write test showed estimated life dropping faster on PLC and higher write amplification (2.6 vs 1.8).

Interpretation: PLC drives may be attractive for capacity‑optimized workloads (cold storage, large sequential reads), but for latency‑sensitive OLTP or high‑write logging, either choose TLC/Penta‑optimized firmware or add overprovisioning and cooling.

"Capacity is cheaper, but not free — measure latency tails and endurance before you trust PLC for production DBs."

Reproducibility checklist — runbook for every test

  1. Record host, kernel, nvme‑cli and firmware versions.
  2. Zero‑fill or secure‑erase the drive between major workload types (to avoid residual mapping effects).
  3. Set up Prometheus or other time‑series collection for latency, temperature, SMART.
  4. Run each test at least three times and take median results; include a long‑duration run (30–60 min) for tail capture.
  5. Log raw fio outputs, nvme logs, and system counters for later analysis.

Through 2026 we’ll see broader PLC deployments for capacity tiers and more drives exposing richer telemetry (detailed WA counters, internal GC metrics). Expect greater adoption of ZNS and host‑managed namespaces to improve write amplification and QoS for app workloads. Vendors will ship firmware tuned for mixed workloads, and cloud providers will increasingly offer mixed media tiers (PLC for archive, TLC for transactional) with automated placement policies. Keep testing as firmware and controller improvements will materially change results quarter to quarter.

Actionable takeaways

  • Map your real app I/O patterns and pick tests that mirror them — don’t rely on synthetic sequential tests alone.
  • Prioritize tail latency (p99/p999) and endurance, not just headline IOPS and MB/s.
  • Include telemetry (SMART, temperature, write amplification) in every run and monitor it continuously.
  • Use overprovisioning, ZNS, or firmware tuning when PLC shows weak behavior for a target workload.
  • Make benchmarking part of procurement — require vendor test artifacts, reproduce in your environment, and include SLOs tied to latency percentiles.

Next steps — run a minimal validated benchmark in one afternoon

  1. Install fio and nvme‑cli on a test host; record system versions.
  2. Run a 30‑minute OLTP fio job (example above) at qdepth 1/8/32; capture p99/p999.
  3. Run a 2‑hour logging sustained write; capture temperature and SMART every 15 minutes.
  4. Analyze results: latency tails, throughput stability, SMART wear metrics.
  5. Decide: safe for capacity tiers, needs mitigations, or unsuitable for transactional workloads.

Final recommendation and call to action

PLC and disaggregated‑cell SSDs are an important part of the 2026 storage landscape — they can deliver dramatic capacity gains and reduce costs, but they also change the performance and endurance profile in non‑trivial ways. Use the practical test matrix in this guide to validate drives against your OLTP, logging and registry workloads before deploying at scale. If you’d like a jump‑start, download our reproducible fio job templates and a one‑page lab checklist (link available on our site) and run a baseline in your environment this week.

Ready to validate a new drive? Run the minimal afternoon benchmark above, capture the outputs, and reach out with your fio logs — we’ll help interpret results and recommend configuration changes or placement strategies tailored to your app stack.

Advertisement

Related Topics

#Performance Testing#Storage#Benchmarks
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-25T01:10:51.022Z