Methodology

What every subtest measures and how the score is computed.

Every score in Fast&Fast comes from real CPU / GPU / NPU / Memory workloads. No microbenchmarks; no synthetic shaders that don’t match real games. Here’s exactly what each subtest measures and how the final score is computed.

Score formula

subtest_score = round(measure / baseline.measure × baseline.points)
phase_score   = Σ subtest_score
total         = Σ phase_score

Baseline = Pixel 8 Pro at exactly 284 917. Rounding is half-up. Every subtest has a fixed point budget; the sum across subtests for the reference device adds up to the baseline total.

CPU — 4 subtests

Each runs in parallel across all cores; reported ops/s is the aggregate across cores. Source: apps/android/app/src/main/cpp/cpu_bench.cpp.

Integer64-bit XXHash-style mix loop5.8 Gops/s

FloatDense 64×64 SGEMM with feedback13 Gflops

CryptoXXHash-style hashing on 64 KiB blocks18 Gops/s

CompressionNaive RLE encode + decode5.0 Gops/s

GPU — 3 subtests

Offscreen EGL 3.0 pbuffer at 1024×1024.

Triangles250 000 small triangles per frame with non-trivial vertex shader (chain of mat3 multiplies)90 Mtri/s

FillrateMandelbrot 96-iter fragment shader on a fullscreen quad — heavily ALU-bound0.7 Gpix/s

ComputeGLES 3.1 compute shader, 1 M-element SSBO, 32 inner FMA iterations. Falls back to a second fillrate pass on GLES 3.0-only devices.14 Gops/s

NPU — 3 subtests

Image classificationMobileNet V3 small (int8) via NNAPI, 5000 inferences200 inf/s

Object detectionSSD MobileNet V1 (int8) via NNAPI, 1500 inferences80 inf/s

Token generation256-step autoregressive proxy via iterative MobileNet inference. No NNAPI-compatible tiny LM exists yet at <1 MB.200 tokens/s

Memory — 3 subtests

RAM bandwidthSystem.arraycopy on 64 MiB buffers ×500041 GB/s

Sequential I/O1 GiB write to cache + fsync740 MB/s

Random 4K1 M random 4 KiB reads on a 128 MiB file150 000 IOPS

Bench modes

Quick — ¼ iterations, ~20 s total. Stored locally only.
Standard — 1× (default). The only mode aggregated in the public leaderboard.
Long — 3× iterations, ~3 min, useful for throttling observation. Stored locally only.

Cross-implementation validation

The score formula has a canonical TypeScript implementation, a Kotlin mirror in the :score-formula module, and a C++ mirror. CI replays score-vectors.json across all three on every push to keep them byte-identical. If a Kotlin score ever differs from the TypeScript reference by even one point, the build fails.