Benchmark suite that evaluates a differentially private binary aggregation tree inside a Gramine-protected JVM. A TLS client streams floating-point values to the enclave; the server keeps the noisy private sum and exposes the familiar initBinaryAggregation, addToBinaryAggregation, and getBinaryAggregationSum calls. The layout mirrors the companion Java aggregation benchmarks so that results stay comparable while swapping Teaclave SGX for Gramine.
src/server/– TLS enclave server (com.benchmark.gramine.enclave.BenchServer) executed inside Gramine. Implements the command protocol (INIT,ADD,GET) and logs each ingestion for traceability.src/server/com/.../dp/BinaryAggregationTree.java– Gaussian-noise aggregation tree shared with the other benchmark suites to preserve behaviour.src/client/– Host harness (com.benchmark.gramine.host.BenchClient) that prepares workloads, drives weak/strong scaling runs, and prints JSON metrics.tools/run-benchmarks.py– Automation wrapper that builds each server variant, runs the client once, and writes combined CSV/JSON artifacts underscaling-results/<timestamp>/.tools/generate_plots.py– Turns the CSV/JSON outputs into PNG plots (per-variant throughput, speedup/efficiency, and startup-time comparisons).
Each benchmark execution follows three phases that align with the companion Teaclave suite:
- Baseline workload – The client uses
GRAMINE_BENCH_DATA_SIZEwith the smallest requested thread count to derive the per-thread workload. A short warmup runs before measurements. - Weak scaling – The number of worker threads increases while the per-thread workload remains fixed.
- Strong scaling – The total workload stays fixed while the number of worker threads increases.
The client prints JSON summaries for all phases so downstream plotting scripts can consume the output directly.
Example dataset: scaling-results/20251030_004543/ (data size = 1024, sigma = 0.5, warmup = 3, measure = 5). The run compares three server variants executed through Gramine: jvm-gramine, native-dynamic, and native-static. Artifacts land in scaling-results/<timestamp>/ and plots in plots/.
Throughput scales from a single client up to 16 clients before Gramine scheduling overheads flatten the gains. Speedup and efficiency are highest for the native variants (native-dynamic, native-static), reaching ~5× speedup at eight clients. Refer to plots/<variant>_strong_throughput.png and plots/<variant>_strong_speedup_efficiency.png for the detailed traces.
Weak scaling stays close to linear: aggregate throughput increases steadily as threads are added, and the native variants show the largest improvements. See plots/<variant>_weak_throughput.png and plots/<variant>_weak_speedup_efficiency.png.
plots/startup_times.png summarises the bootstrap time for each server variant when launched through Gramine. Native images start faster than the JVM-backed variant once the enclave is provisioned.
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip matplotlib numpy
python tools/generate_plots.py \
--results scaling-results/20251030_004543/scaling_results.csv \
--startup scaling-results/20251030_004543/benchmark_results.json \
--output plotsYou can work from the host OS or inside the preconfigured devcontainer.
- Install Docker and either the VS Code Dev Containers extension or the
devcontainerCLI. - From the repository root run
task devcontaineror open the folder in VS Code and choose “Reopen in Container”. - The container provisions GraalVM, Gramine, and the helper scripts automatically.
Convenient go-task entries:
| Task | Description |
|---|---|
task devcontainer |
Build, start, and attach to the devcontainer (wrapper around the tasks below). |
task devcontainer-up |
Start or reuse the devcontainer without attaching. |
task devcontainer-attach |
Exec into the running devcontainer shell. |
task devcontainer-down |
Stop and remove the container and volumes. |
task devcontainer-recreate |
Rebuild the container from scratch for a clean environment. |
Install the dependencies locally:
- GraalVM (JVM plus
native-image) - Gramine (
gramine,gramine-sgx,gramine-manifest) - GNU Make, Python 3.9+, and OpenJDK tooling
Clone the repository and continue with the build instructions below.
From the project root:
# Compile client/server classes
make server client
# Generate TLS certificates (if not already produced)
./tools/generate-certs.sh
# Optional: build GraalVM native images for the TLS server/client set
make APP_NAME=native-bench-dynamic STATIC_NATIVE=0 SGX=1 all
make APP_NAME=native-bench-static STATIC_NATIVE=1 SGX=1 alltools/run-benchmarks.py runs make clean before each variant build so the benchmarks always start from a known baseline.
The automation wrapper builds the selected variants, launches the Gramine server, and drives one client run per variant:
python tools/run-benchmarks.py --variants jvm-gramine native-dynamic native-staticKey options:
| Option | Description |
|---|---|
--variants <list> |
Limit the run to the specified variants. |
--all |
Execute all variants (default when no subset is provided). |
--output <dir> |
Override the target directory under scaling-results/. |
Each run produces:
scaling-results/<timestamp>/benchmark_results.json– Startup metrics and per-variant summaries.scaling-results/<timestamp>/scaling_results.csv– Flattened metrics per variant and scaling mode.scaling-results/<timestamp>/logs/*.out/*.err– Raw client output (and stderr when present).
Defaults come from environment variables or an .env file at the repository root:
GRAMINE_BENCH_SIGMA=0.5
GRAMINE_BENCH_WEAK_SCALES=1,2,4,8,16,32
GRAMINE_BENCH_STRONG_SCALES=1,2,4,8,16,32
GRAMINE_BENCH_DATA_SIZE=1024
GRAMINE_BENCH_WARMUP=3
GRAMINE_BENCH_MEASURE=5
GRAMINE_BENCH_NATIVE_PARALLELISM=32
Apply them in one shot:
set -a
source .env
set +aCLI flags override any setting (java com.benchmark.gramine.host.BenchClient --help lists all options).
Redirect the client output to capture the workload, weak scaling, and strong scaling summaries for custom post-processing:
python tools/run-benchmarks.py --variants native-dynamic > scaling-results/latest-run.jsonThe JSON contains per-pass throughput/latency measurements; the CSV is convenient for plotting tools that prefer tabular data.
After collecting metrics, generate the PNG figures:
python tools/generate_plots.py \
--results scaling-results/<timestamp>/scaling_results.csv \
--startup scaling-results/<timestamp>/benchmark_results.json \
--output plotsThe script writes:
plots/startup_times.pngplots/<variant>_strong_throughput.pngplots/<variant>_strong_speedup_efficiency.pngplots/<variant>_weak_throughput.pngplots/<variant>_weak_speedup_efficiency.png












