Add performance tests
Summary
Add performance tests. Fixes #1777.
Pull Request checklist
- ~[ ] Add/update test to cover these changes~
- [ ] Update documentation
- [ ] Update CHANGELOG file
Walkthrough
Adds a CI benchmarking system: new GitHub Actions workflow, k6/Vegeta benchmark drivers (TypeScript + Ruby), shared helpers and converter, dummy-app production/start scripts and configs, tooling/ignore updates, and docs/contributing updates to run and publish Core and Pro performance tests in CI and locally. (50 words)
Changes
| Cohort / File(s) | Change Summary |
|---|---|
CI Benchmark Workflow .github/workflows/benchmark.yml |
New GitHub Actions workflow to run Core/Pro benchmarks (Rails + Node renderer), install/cache tools, prepare apps, start/stop servers, run k6/Vegeta, validate/convert results, upload artifacts, optionally push data on master, and emit a workflow summary with guarded failures. |
Benchmark Orchestrators benchmarks/bench.rb, benchmarks/bench-node-renderer.rb |
New Ruby drivers: bench.rb runs k6 per Rails route (discovery, warm‑up, per-route JSON parsing); bench-node-renderer.rb runs Vegeta (HTTP/2) per production bundle, detects RSC bundles and emits textual JSON summaries. |
Benchmark Support & Conversion benchmarks/lib/benchmark_helpers.rb, benchmarks/convert_to_benchmark_json.rb, benchmarks/k6.ts |
New shared Ruby helpers for env/validation/server readiness/summary; converter to produce unified benchmark JSON; TypeScript k6 script with dynamic scenarios and runtime checks. |
Dummy App Prod Scripts & Procfiles react_on_rails/spec/dummy/bin/prod, react_on_rails/spec/dummy/bin/prod-assets, react_on_rails_pro/spec/dummy/bin/prod, react_on_rails_pro/spec/dummy/bin/prod-assets, react_on_rails_pro/spec/dummy/Procfile.prod |
Added production start and asset-build scripts with asset freshness checks, environment exports, CI bootsnap precompile hook, and process-supervisor selection (overmind/foreman) with fallbacks. |
Puma, Shakapacker & Env Configs react_on_rails/spec/dummy/config/puma.rb, react_on_rails/spec/dummy/config/shakapacker.yml, react_on_rails_pro/spec/dummy/config/puma.rb, react_on_rails_pro/spec/dummy/config/shakapacker.yml, react_on_rails_pro/spec/dummy/config/environments/production.rb, react_on_rails_pro/spec/dummy/config/database.yml |
Puma: ENV-driven min/max threads, production workers, worker_shutdown_timeout; Shakapacker: cache path, extract_css and ensure_consistent_versioning; production env tweaks and DB pool uses RAILS_MAX_THREADS. |
Asset Task Update react_on_rails_pro/spec/dummy/lib/tasks/assets.rake |
Replaced yarn invocations with pnpm in asset build tasks. |
Project Tooling & Ignore Files .gitignore, .prettierignore, react_on_rails_pro/.prettierignore, knip.ts, package.json, CLAUDE.md |
Ignore bench_results; add recursive **/vendor prettier ignore and bin/.* to knip ignores; include benchmarks/k6.ts in knip entries; add @types/k6 devDependency; add pre‑commit YAML lint guidance. |
Docs & Contribution docs/planning/library-benchmarking.md, CONTRIBUTING.md, CHANGELOG.md, CLAUDE.md |
New benchmarking planning doc, CONTRIBUTING updated with benchmarking section, CHANGELOG entry for CI benchmarking, and CLAUDE.md YAML lint guidance. |
Sequence Diagram(s)
sequenceDiagram
participant GH as GitHub Actions
participant Runner as CI Runner
participant Tools as Tools (k6, Vegeta, Node, Ruby, pnpm)
participant App as Dummy App (Rails + Node Renderer)
participant Store as Artifacts/Registry
GH->>Runner: trigger (workflow_dispatch / push / labeled PR)
Runner->>Tools: install & restore caches
Runner->>App: prepare apps (build assets / generate packs)
Runner->>App: start prod servers (rails, node-renderer)
Runner->>App: wait_for_server / health checks
Runner->>Tools: run Core benchmarks (k6 per route)
Runner->>Tools: run Pro benchmarks (Vegeta per bundle)
Tools->>App: send HTTP requests → collect metrics
Runner->>Runner: parse & convert summaries to benchmark JSON
Runner->>Store: upload artifacts & publish results
Runner->>GH: post workflow summary / status
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~50 minutes
- Focus review on:
-
benchmarks/bench.rb— route discovery, k6 invocation, JSON parsing. -
benchmarks/bench-node-renderer.rb— bundle discovery, RSC detection, Vegeta HTTP/2 usage and aggregation. -
.github/workflows/benchmark.yml— conditional gates, caching, secrets usage, artifact publishing. -
benchmarks/lib/benchmark_helpers.rb&benchmarks/convert_to_benchmark_json.rb— validation, failure metrics and JSON schema correctness. - dummy app
bin/prod/Procfile.prodscripts — asset freshness checks and supervisor fallbacks.
-
Possibly related PRs
- shakacode/react_on_rails#2153 — updates/unifies RuboCop/config for new performance/benchmark files (actionlint/yamllint touches).
- shakacode/react_on_rails#2121 — pnpm/Yarn migration and CI tooling changes that overlap with asset/build/CI steps.
Suggested reviewers
- AbanoubGhadban
- alexeyr-ci
- justin808
Poem
🐰
I hopped through scripts at break of day,
Laid k6 tracks where requests may play.
Rails and Node hum, numbers in sight,
Bundles sprint by under moonlight.
A rabbit cheers — benchmarks take flight!
Pre-merge checks and finishing touches
✅ Passed checks (5 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title 'Add performance tests' accurately summarizes the main objective of the pull request, which is to introduce performance testing capabilities via GitHub Actions workflows and benchmarking scripts. |
| Linked Issues check | ✅ Passed | The PR implements the requirement from issue #1777 to add performance tests using k6 for the dummy app, delivering benchmarking infrastructure via GitHub Actions workflow, benchmark scripts for Rails and Node Renderer, and helper utilities. |
| Out of Scope Changes check | ✅ Passed | All changes are scoped to performance testing infrastructure. Configuration updates (puma.rb, shakapacker.yml, database.yml, environments/production.rb) support the benchmark environment, and documentation/tooling changes prepare the project for CI benchmarking. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 90.63% which is sufficient. The required threshold is 80.00%. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
alexeyr/performance-tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Code Review - PR #1868: Add Performance Tests
Thanks for this contribution! This is a solid implementation of performance testing infrastructure.
✅ Strengths
- Comprehensive Tool Coverage: Using three industry-standard tools (Fortio, Vegeta, k6) provides good cross-validation
- Well-Structured Workflow: Clear separation of concerns with logical step grouping
- Good Error Handling: Proper use of set -euo pipefail and validation steps
- Caching Strategy: Smart use of GitHub Actions cache for benchmark tool binaries
- Security Conscious: SSH access properly gated with warnings and actor-only access
- Flexible Configuration: Parameterized inputs allow customization
🐛 Critical Issues
-
Node.js Version Mismatch (benchmark.yml:155)
- Uses Node 22, but PR #1925 documents Node 22 V8 cache crashes
- Recommend changing to Node 20 until #1925 is resolved
-
Missing Trailing Newlines (CRITICAL per CLAUDE.md)
- Files spec/dummy/bin/prod and spec/dummy/bin/prod-assets missing final newlines
- This will cause CI failures - Run: rake autofix to fix
-
Race Condition in Server Startup (benchmark.yml:220)
- Background server might not be fully initialized when curl check succeeds
- Recommend adding health check endpoint or verifying multiple successful responses
-
Incomplete PR Checklist
- Documentation and CHANGELOG boxes unchecked
- Per CLAUDE.md: Should update CHANGELOG for user-visible features
⚠️ Performance & Quality Issues
- bench.sh:66 - Unrealistic iteration count assumes 10 iterations/sec per VU
- Fortio Optimization Missing - Referenced guide not implemented
- No Resource Monitoring - Missing CPU/memory metrics
- Missing Test Validation - No thresholds to detect performance regressions
- Single Route Testing - Only tests server_side_hello_world_hooks
- Complex Validation Logic - Lines 17-27 hard to read
- Inconsistent Quoting - Variables not consistently quoted
- Hard-coded Timeouts - 30s and 60s timeouts not configurable
📚 Documentation Issues
- Missing Documentation - No README for running benchmarks locally
- Workflow Triggers - Runs on ALL PRs (expensive, should use labels/manual only)
- No Historical Tracking - Results not compared over time
🎯 Before Merging
- Fix Node.js version (22 → 20)
- Add trailing newlines
- Run bundle exec rubocop
- Update CHANGELOG.md
- Add performance thresholds
- Update workflow triggers
📊 Summary
Assessment: Good foundation but needs critical fixes before merge
Effort: Critical issues ~30 mins, High priority ~1-2 hours
Main concerns: Node.js version mismatch and missing trailing newlines will cause CI failures.
Let me know if you need help with any of these!
PR Review: Add Performance Tests (1868)
Summary
This PR adds comprehensive performance testing infrastructure using three industry-standard benchmarking tools (Fortio, Vegeta, and k6). Well-structured implementation addressing issue 1777.
Critical Issues
1. Missing Trailing Newlines
CRITICAL per CLAUDE.md: Files MUST end with newline character or CI will fail:
- spec/dummy/bin/prod (line 4)
- spec/dummy/bin/prod-assets (line 9)
- spec/performance/bench.sh (line 202)
2. RuboCop Required
CRITICAL per CLAUDE.md: Must run bundle exec rubocop and fix ALL violations before commit.
Potential Bugs
K6 JSON Parsing (bench.sh:187-192)
The jq expression uses add which fails on empty arrays. Use add // 0 for safety.
Server Startup Race (benchmark.yml:254)
Uses exit 0 inside loop which exits the entire step. Should use break instead.
npm vs yarn (benchmark.yml:200)
Uses npm install --global yalc but CLAUDE.md requires yarn-only commands.
Code Quality Strengths
- Excellent tool selection - three benchmarking tools for cross-validation
- Well-structured workflow with clear sections
- Good error handling with set -euo pipefail
- Comprehensive parameterization via workflow inputs
- Smart binary caching for faster CI
- Security-conscious SSH access gating
Suggestions
- Shell portability: Replace bc with awk for float comparisons
- Hardcoded route: Make server_side_hello_world_hooks configurable
- Duration: 30s may be too short, consider 60s for stable measurements
- Add documentation for local benchmark execution
- Consider performance regression detection vs master
- Add spec/performance/README.md with interpretation guide
Test Coverage
Good: Infrastructure in place, runs on PR/push, results preserved as artifacts Missing: No bash script tests, no performance thresholds, no local testing docs
Verdict
Solid PR with high code quality. Performance testing infrastructure will be valuable.
Required Before Merge:
- Fix trailing newlines (will cause CI failure)
- Run bundle exec rubocop and fix violations
- Fix K6 empty array bug
Recommended:
- Add basic documentation
- Verify workflow passes on CI
- Consider CHANGELOG entry
Great work overall!
PR Review: Add Performance Tests
Thank you for this comprehensive performance testing implementation! This is a well-structured addition to the repository. Here's my detailed feedback:
✅ Strengths
- Comprehensive Tool Coverage: Great choice using three different load testing tools (Fortio, Vegeta, k6) to get diverse perspectives on performance
- Excellent Error Handling: The bash script has robust parameter validation and error handling throughout
- Smart Caching: GitHub Actions workflow efficiently caches binaries to speed up future runs
- Good Documentation: Clear comments, parameter descriptions, and helpful echo statements
- Configurable Parameters: Extensive workflow_dispatch inputs allow flexible testing scenarios
- Production-Ready Setup: Proper Puma configuration with workers and preloading for production benchmarks
🐛 Potential Issues
1. Script Portability - bc Dependency (bench.sh:21,33)
The script uses bc for floating-point comparisons but bc is not installed in the workflow.
Fix: Add bc to the apt package installation step in the workflow, or use bash-native arithmetic for integer comparisons.
2. Race Condition in Server Startup (benchmark.yml:248)
The server process is backgrounded but there is no guarantee it has started before the polling loop begins. If the server fails to start immediately, the loop might miss error messages.
Suggestion: Add a brief sleep 2 before the polling loop, or capture the PID and check if the process is still running during the timeout loop.
3. k6 Status Reporting Edge Case (bench.sh:199-204)
If .root_group.checks is empty, add returns null, causing K6_REQS_OTHER calculation to fail.
Fix: Add null handling by changing the jq expression to use | add // 0
4. Missing Newline at End of Files
Per CLAUDE.md requirements: ALWAYS ensure files end with a newline character
Files needing newlines:
- spec/dummy/bin/prod (line 4)
- spec/dummy/bin/prod-assets (line 9)
- spec/performance/bench.sh (line 214)
Fix: Run rake autofix or add trailing newlines manually.
5. Benchmark Workflow Runs on Every PR (benchmark.yml:57-58)
Running 30-second benchmarks with 3 tools on every PR will consume significant CI minutes and slow down PR feedback.
Suggestion:
- Remove pull_request trigger and only keep workflow_dispatch + push to master
- Or add a condition to only run on PRs with a specific label (e.g., performance)
- Or significantly reduce default duration (e.g., 10s) for PR runs
🔍 Code Quality & Best Practices
Security
- ✅ SSH access is properly gated behind workflow_dispatch input with clear warnings
- ✅ limit-access-to-actor: true prevents unauthorized access
- ⚠️ Consider adding a comment warning maintainers never to merge code while SSH session is active
Performance
- ✅ Binary caching will significantly speed up repeated runs
- ✅ Server warm-up phase (10 requests) is good practice
- ⚠️ REQUEST_TIMEOUT=60s is very generous - consider 30s default to catch timeout issues faster
Maintainability
- ✅ Clear section comments in YAML
- ✅ Parameterized configuration
- ⚠️ Consider extracting the jq parsing logic (lines 167-209) into a separate function for readability
📊 Test Coverage
Missing:
- ❌ No automated tests for the bench.sh script itself
- ❌ No validation that the summary.txt format is correct
- ❌ No performance regression detection (just data collection)
Suggestions:
- Add a unit test for bench.sh parameter validation
- Consider storing benchmark results over time and comparing against baselines
- Add a step to post benchmark results as a PR comment for visibility
🔧 Configuration Issues
Puma Configuration (spec/dummy/config/puma.rb:38)
Workflow sets default to 4 workers (WEB_CONCURRENCY: 4), but Puma defaults to 2 if not set. The workflow default should match.
Consistency check needed: Verify that WEB_CONCURRENCY=4 is intentional for CI runners with 2 CPU cores.
📝 Documentation & Changelog
Per CLAUDE.md: Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements)
This PR qualifies as it adds a performance testing feature. Needs:
- ✅ Changelog entry format: PR 1868 by alexeyr-ci2: Add performance benchmarking workflow with Fortio, Vegeta, and k6.
Missing:
- ❌ No README/docs update explaining how to run benchmarks locally
- ❌ No documentation on interpreting benchmark results
🎯 Recommendations
Before Merge (Critical):
- ✅ Add trailing newlines to all new files
- ✅ Run bundle exec rubocop and fix any violations
- ✅ Add CHANGELOG entry
- ✅ Fix k6 null handling edge case
- ✅ Reconsider pull_request trigger or reduce duration
Nice to Have: 6. Add bc to dependencies or remove floating-point support 7. Add brief README section on running benchmarks 8. Handle server startup race condition 9. Consider performance regression detection
🎓 Overall Assessment
This is a solid, production-ready implementation of performance testing. The code quality is high, with excellent error handling and configurability. The main concerns are:
- CI cost - running on every PR may be expensive
- Missing documentation - users won't know how to use this
- File formatting - needs trailing newlines per repo standards
Recommendation: Request changes for the critical items (newlines, changelog, PR trigger), then approve.
Score: 8/10 - Great implementation, minor issues to address.
Pull Request Review: Add Performance Tests
Thank you for this comprehensive performance testing implementation! This is a valuable addition to the project. Below is my detailed review:
✅ Strengths
-
Multi-tool approach: Using Fortio, Vegeta, and K6 provides cross-validation of results and flexibility for different testing scenarios.
-
Well-structured workflow: The GitHub Actions workflow is logically organized with clear steps and helpful comments.
-
Configurable parameters: Extensive configurability via workflow inputs allows fine-tuning for different performance scenarios.
-
Good error handling: The
bench.shscript includes comprehensive input validation and error messages. -
Caching: Binary caching for tools reduces workflow execution time.
-
Production mode testing: Tests against production-like configuration (Puma workers, precompiled assets) for realistic results.
🔍 Issues and Concerns
Critical Issues
-
Missing file newlines (
spec/dummy/bin/prod,spec/dummy/bin/prod-assets)- Location: spec/dummy/bin/prod:4, spec/dummy/bin/prod-assets:9
- Impact: CI will fail per CLAUDE.md requirements
- Fix: Ensure both files end with a newline character
- Reference: CLAUDE.md states "ALWAYS ensure files end with a newline character"
-
RuboCop not run
- Impact: MANDATORY requirement before commits
-
Required action: Run
bundle exec rubocopand fix ALL violations -
Reference: CLAUDE.md: "BEFORE EVERY COMMIT/PUSH: ALWAYS run
bundle exec rubocopand fix ALL violations"
High Priority Issues
-
Workflow runs on every PR/push to master (benchmark.yml:55-58)
- Impact: Performance tests are resource-intensive and will slow down CI significantly
-
Recommendation: Only run on manual workflow_dispatch, specific labels (e.g.,
run-benchmarks), or scheduled runs (e.g., nightly)
-
Server process not properly cleaned up (benchmark.yml:248)
-
Issue: Server started in background with
bin/prod &but never explicitly killed - Impact: May leave orphaned processes, especially if workflow fails
- Fix: Add cleanup step or use a trap/signal handler
-
Issue: Server started in background with
Medium Priority Issues
-
Hardcoded route in bench.sh (line 6)
-
Issue:
ROUTE:-server_side_hello_world_hooksmay not exist or may not be representative - Recommendation: Verify this route exists in the dummy app, consider testing multiple routes, and document what this route tests
-
Issue:
-
No baseline comparison
- Issue: Results are collected but not compared against previous runs
- Enhancement: Consider using benchmark-action to track performance over time
-
Missing documentation
- Issue: No README or documentation for how to run benchmarks locally, interpret results, or understand what "good" performance looks like
-
Recommendation: Add
spec/performance/README.md
🔒 Security Considerations
-
SSH access via tmate (benchmark.yml:108-113)
- Status: Properly gated behind workflow input (default: false)
-
Good: Includes security warnings and
limit-access-to-actor
🧪 Test Coverage
Missing:
- No unit tests for the bash script logic
- No tests verifying the benchmark workflow succeeds
- No validation that summary output format is correct
Recommendations:
- Add a dry-run mode to
bench.shfor testing - Consider shellcheck for bash script validation
📝 Suggested Changes
Must Fix (Blocks Merge)
- [ ] Add trailing newlines to
spec/dummy/bin/prodandspec/dummy/bin/prod-assets - [ ] Run
bundle exec rubocopand fix all violations - [ ] Change workflow triggers to avoid running on every PR/push
Should Fix (High Priority)
- [ ] Add server cleanup/shutdown in workflow
- [ ] Verify the benchmark route exists and is representative
- [ ] Add basic documentation (
spec/performance/README.md)
Nice to Have
- [ ] Add baseline comparison/tracking
- [ ] Add shellcheck to CI for bash scripts
- [ ] Consider shorter default duration for PR checks
- [ ] Extract K6 test to separate file
✅ Changelog
Per CLAUDE.md, this should be added to CHANGELOG.md as it's a new feature.
🎯 Overall Assessment
This is a solid foundation for performance testing with good multi-tool coverage and configurability. The main blockers are CLAUDE.md compliance issues (trailing newlines, RuboCop). Once those are fixed and the workflow triggers are adjusted to avoid running on every PR, this will be ready to merge.
Estimated effort to address critical issues: ~30 minutes
Great work overall! 🚀
Code Review: Add Performance Tests
Thank you for this comprehensive performance testing addition! This is a well-structured PR that addresses issue #1777.
✅ Strengths
1. Excellent Tool Coverage
- Multiple benchmarking tools (Fortio, Vegeta, k6) provide good cross-validation
- Smart caching strategy for tool binaries reduces CI time
- Flexible configuration via environment variables
2. Robust Error Handling
- Good use of set -euo pipefail in bash scripts
- Comprehensive input validation with clear error messages
- Server health checks with timeout protection
3. Production-Ready Configuration
- Proper Puma clustering setup for production benchmarks
- Preloading and worker configuration aligned with best practices
- Appropriate separation of prod/dev configurations
🔒 Security Concerns - CRITICAL
SSH Access Feature (lines 96-119 in .github/workflows/benchmark.yml)
This poses significant security risks:
Issues:
- Detached mode risk: Using detached: true leaves the SSH session open indefinitely
- Repository exposure: The entire repository is accessible via SSH
- No audit trail: Limited visibility into commands executed during SSH sessions
- Secrets exposure risk: If secrets are added later, they could be exposed
Recommendations:
- Consider removing this feature entirely for production workflows
- If kept: Remove detached: true, add timeout-minutes: 15, and restrict to non-master branches
- Document that NO secrets should ever be added to this workflow
🐛 Potential Bugs
1. Division by Zero Risk (spec/performance/bench.sh:195-197)
If K6_REQS_KNOWN_STATUS is null/empty, this could fail. Add validation.
2. Race Condition in Server Startup (spec/dummy/bin/prod)
The rails s command starts but there is no guarantee assets are fully loaded. Consider adding a readiness check that validates asset serving.
3. Missing Error Context
When benchmarks fail, the error message does not indicate which tool failed. Consider wrapping each tool execution with proper error context.
4. Hardcoded Paths (spec/performance/bench.sh:87)
If run from wrong directory, this could create directories in unexpected locations. Consider using SCRIPT_DIR to make paths relative to the script location.
⚡ Performance Considerations
-
Workflow runs on every push/PR: For a 20-minute benchmark suite, consider only running on workflow_dispatch and push to master, or add path filters to skip documentation changes.
-
Sequential warmup (spec/performance/bench.sh:73-78): Consider parallel warmup for efficiency.
-
Triple tool execution: Running all 3 tools provides validation but triples execution time. Consider defaulting to k6 only.
🧪 Test Coverage - Missing
- Unit tests for benchmark script: Input validation and JSON parsing logic should be tested
- Workflow validation: Consider adding actionlint to CI
- Integration test: Should verify benchmark artifacts are generated correctly
📝 Documentation - REQUIRED
Missing per CLAUDE.md requirements:
-
CHANGELOG.md update - This is a user-visible feature that developers will use. Action needed: Update CHANGELOG.md
-
Inline documentation: spec/performance/bench.sh needs header comment explaining:
- Purpose
- Usage examples
- Required dependencies
- Expected output format
-
Workflow documentation: Add comment block in .github/workflows/benchmark.yml explaining when to run benchmarks and how to interpret results
🎨 Code Quality Suggestions
Good Practices:
- Consistent error handling
- Input validation with clear messages
- Proper use of environment variables with defaults
- JSON output for machine-readable results
Suggestions:
- YAML Anchors: Lines 7-14 define anchors but only use them once - consider simplifying
- DRY Principle: Lines 165-208 have very similar blocks for each tool - consider refactoring
- Magic Numbers: Several hardcoded values should be configurable (TIMEOUT_SEC=60, warmup count, sleep durations)
- Puma Configuration: Default of 2 workers may be too low - document why this was chosen
📋 Action Items Summary
🔴 High Priority (Before Merge):
- Address SSH security concerns - Remove or significantly restrict
- Add CHANGELOG.md entry (required per project guidelines)
- Fix potential null reference in K6 status parsing
- Add workflow path filters to prevent unnecessary runs
🟡 Medium Priority (Before Merge):
- Add header documentation to bench.sh
- Add error context to benchmark failures
- Fix hardcoded paths in bash script
🟢 Low Priority (Future Enhancement):
- Add unit tests for benchmark script
- Refactor duplicate code in parsing logic
- Make magic numbers configurable
✨ Overall Assessment
This is a solid foundation for performance testing! The multi-tool approach and comprehensive workflow setup show good engineering practices.
Main concerns:
- Security: SSH feature needs significant changes or removal
- Documentation: Missing CHANGELOG and inline docs (required per CLAUDE.md)
- Robustness: A few edge cases in error handling
Recommendation: Request changes for security and documentation issues, then approve after those are addressed.
Great work on this comprehensive testing infrastructure!
Pull Request Review: Add Performance Tests (Part 1/3)
Overview
This PR adds comprehensive benchmarking infrastructure to test React on Rails performance using three industry-standard tools (Fortio, Vegeta, k6). The implementation is well-structured with good error handling and configurability.
✅ Strengths
1. Excellent Tool Selection & Comparison
- Using three different benchmarking tools (Fortio, Vegeta, k6) provides cross-validation of results
- Each tool has different strengths, giving a more complete performance picture
- Smart conditional execution based on TOOLS environment variable
2. Comprehensive GitHub Actions Workflow
- Well-organized with clear step sections and comments
- Smart caching strategy for binaries and dependencies
- Proper error handling with set -e and validation steps
- 2-hour timeout is reasonable for performance testing
- Good use of artifacts for result preservation (30-day retention)
3. Robust Benchmark Script (spec/performance/bench.rb)
- Parameter validation: Validates rate, duration, connections, timeout formats
- Server availability check: Waits up to 60s with detailed logging
- Warm-up phase: 10 requests per route before benchmarking (good practice)
- Auto-route discovery: Extracts routes from Rails app automatically
- Error handling: Each tool run wrapped in rescue blocks, failures don't stop entire suite
- Consistent output: Tab-separated summary file for easy parsing
4. Production-Ready Infrastructure
- bin/prod and bin/prod-assets scripts properly separate asset compilation from server startup
- Asset freshness checks with helpful warnings
- Supports both Core and Pro versions
- Procfile.prod for Pro version includes node-renderer process
Pull Request Review: Add Performance Tests (Part 2/3)
⚠️ Issues & Recommendations
1. Documentation: Missing CHANGELOG Entry (REQUIRED per CLAUDE.md)
Per CLAUDE.md guidelines, user-visible changes should be documented in CHANGELOG.md.
Recommendation: Add entry like:
#### [PR 1868](https://github.com/shakacode/react_on_rails/pull/1868) by [alexeyr-ci2](https://github.com/alexeyr-ci2)
Added performance benchmarking infrastructure using Fortio, Vegeta, and k6 tools. New GitHub Actions workflow allows automated and manual performance testing of both Core and Pro versions.
2. Security: SSH Access Feature (Medium Priority)
Location: .github/workflows/benchmark.yml:110-128
The SSH access feature using action-tmate is documented but could be clearer about risks.
Recommendations:
- ✅ Good: Already requires explicit opt-in via debug_enabled input
- ✅ Good: Limited to workflow initiator with limit-access-to-actor: true
- ⚠️ Consider: Add a time limit to SSH sessions (currently detached: true means indefinite)
- Suggest: Document in PR description when/why developers would use this
3. Resource Management: Server Process Cleanup (Low Priority)
Location: Workflow steps Stop Core/Pro production server
Issue: For Pro version, Procfile.prod also starts node-renderer on port 3800, which may not be cleaned up. If the Rails process spawns child processes, they may not be cleaned up either.
Consider also cleaning up port 3800 for Pro, and killing child processes to ensure complete cleanup.
4. Code Quality: Magic Numbers (Low Priority)
Location: spec/performance/bench.rb:385-388
Consider extracting warm-up constants (10 requests, 0.5s delay) for better documentation.
5. Performance: Workflow Efficiency (Enhancement)
The workflow builds both Core and Pro sequentially. Since they are independent, they could run in parallel jobs for faster CI (trade-off: uses 2x runner minutes but saves wall-clock time).
6. Observability: Missing Baseline Comparison (Enhancement)
Results are uploaded as artifacts but not compared to baseline/previous runs. Future enhancement could add performance regression detection by comparing against stored baselines.
Pull Request Review: Add Performance Tests (Part 3/3)
🔍 Detailed Code Review
spec/performance/bench.rb
Positive Patterns:
- Line 12-15: env_or_default helper properly handles empty strings and 0 from workflow inputs
- Line 38-57: Input validation functions are thorough with clear error messages
- Line 59-67: Proper error handling in parse_json_file with specific error types
- Line 140-150: Server availability check accepts both 2xx and 3xx responses (handles redirects correctly)
- Line 184-186: Validates RATE=max constraint (CONNECTIONS must equal MAX_CONNECTIONS)
Security Notes:
- Line 81: Command injection risk is low - app_dir is hardcoded constant, not user input
- Line 332: k6 script uses string interpolation but target is validated as URI first
🔐 Security Assessment
Workflow Secrets:
- ✅ SECRET_KEY_BASE: Properly set as dummy value (not a real secret)
- ✅ REACT_ON_RAILS_PRO_LICENSE: Correctly uses GitHub secrets
Injection Vectors:
- ✅ All workflow inputs have proper type constraints (number, string, choice)
- ✅ Bash scripts use proper quoting
- ✅ No user input flows into eval or unvalidated shell commands
Resource Limits:
- ✅ Workflow has 120-minute timeout
- ✅ Request timeout configurable (default 60s)
- ⚠️ No memory/CPU limits on benchmark tools (may cause OOM on large loads)
🧪 Test Coverage
Missing tests: This PR adds testing infrastructure but doesn't include tests for the infrastructure itself.
Recommendations for future PRs:
- Add unit tests for spec/performance/bench.rb validation functions, route parsing logic, JSON parsing
- Add integration test that runs benchmark with DURATION=5s to verify workflow
✅ Compliance with CLAUDE.md
- ✅ Uses pnpm (not npm/yarn)
- ✅ Ruby code follows project structure
- ✅ Bash scripts use proper error handling (set -e)
- ❌ Missing CHANGELOG.md entry (REQUIRED)
- ⚠️ Should add newlines to end of all new files (pre-commit hooks will catch this)
🎯 Final Recommendation
Overall: This is a high-quality PR that adds valuable performance testing infrastructure. The code is well-structured, properly error-handled, and follows good practices.
Blocking Issue: Missing CHANGELOG.md entry (required per CLAUDE.md)
Before Merge:
- ✅ Add CHANGELOG.md entry (REQUIRED)
- ✅ Run bundle exec rubocop and fix any violations
- ✅ Ensure all files end with newlines
Consider for Follow-up PRs:
- Add baseline comparison functionality
- Optimize workflow to run Core/Pro in parallel
- Add tests for benchmark script itself
- Document when/how to use SSH debug feature
Approval Status: Ready to merge after addressing CHANGELOG.md requirement. Excellent work! 🚀
PR Review: Add Performance Tests
This PR adds comprehensive performance testing infrastructure using three benchmarking tools (Fortio, Vegeta, k6). Overall, this is a high-quality PR with well-structured code.
✅ Strengths
- Comprehensive Testing: Uses multiple tools for validation, tests both OSS and Pro versions
- Well-Structured Code: Clean separation in bench.rb, good error handling
- Production-Ready Workflow: Proper asset precompilation, server validation, caching
- Good Developer Experience: Clear parameters, retry logic, detailed output
🔧 Code Quality Observations
Ruby Code (spec/performance/bench.rb)
Minor Security Suggestions:
-
Line 81 - Command Injection (Low Risk) Consider using Shellwords.escape for app_dir variable
-
Line 121 - Command Injection (Low Risk) Should escape cmd variable since it comes from TOOLS env var
GitHub Actions Workflow
Suggestions:
-
Frozen Lockfile Inconsistency (Lines 236 vs 258, 396) Root uses --frozen-lockfile but dummy apps do not. This could lead to non-deterministic builds. Recommend adding it or documenting why omitted.
-
Timeout (Lines 320, 455): 120 minutes seems generous - could hide hanging processes
-
Resource Allocation: Have you validated WEB_CONCURRENCY=4 + RAILS_MAX_THREADS=3 on 2-core GitHub runners?
🔒 Security Review: ✅ No Critical Issues
- Secrets properly handled (only dummy/test values)
- SSH debug access properly gated and secured
- Command injection risks are low (inputs mostly controlled)
🧪 Test Coverage
Missing:
- No RSpec tests for bench.rb itself
- No validation of reasonable benchmark results
Recommendation: Add smoke tests for parsing logic and input validation
📝 Documentation
Good CLAUDE.md additions. Could also add:
- How to run benchmarks locally
- How to interpret results
- Expected performance baselines
🎯 Specific Files
- .rubocop.yml: Good approach inheriting from react_on_rails config
- knip.ts: Correct to ignore system utilities column/tee
- .gitignore: bench_results correctly placed at root
- bin/prod scripts: Well-designed with pre-flight checks
🚀 Recommendations
Before Merge:
- Add shell escaping for security hardening (low priority)
- Document or fix --frozen-lockfile inconsistency
- Consider adding tests for bench.rb
Future PRs:
- Performance regression detection
- Benchmark visualization
- RSpec unit tests
✅ Summary
Overall Assessment: High-quality PR with valuable performance testing infrastructure. Well-structured code following project conventions.
Main action items (all minor):
- Shell escaping for hardening
- Frozen-lockfile documentation
- Consider adding tests
Great work on comprehensive benchmarking! 🎉
Recommendation: ✅ Approve with minor suggestions
Code Review: Performance Testing Infrastructure
Thank you for adding comprehensive performance testing! This PR adds valuable infrastructure for tracking React on Rails performance over time.
Strengths
- Comprehensive Benchmarking: Multi-tool comparison (Fortio, Vegeta, k6) provides validation across different methodologies
- Well-Structured Workflow: Clear separation of concerns with good caching and cleanup
- Production Scripts: bin/prod and bin/prod-assets are well-designed with asset staleness detection
Critical Issues
1. SSH Access Security Risk (Lines 104-130)
The tmate SSH feature could leak REACT_ON_RAILS_PRO_LICENSE secret. Recommend removing it entirely or adding secret-clearing steps.
2. Port Hardcoding
Port 3001 is hardcoded in bench.rb:21, benchmark.yml, and bin/prod scripts. If port is in use, workflow fails mysteriously. Make PORT configurable via environment variable.
3. Server Startup Race Condition (Lines 289-313, 424-448)
The 30-second wait loop does not check if process died. If Puma crashes immediately, it keeps retrying. Add process health check during wait.
4. Missing CHANGELOG
Per PR checklist, CHANGELOG.md needs to be updated with an entry under Unreleased section.
Medium Priority
- RuboCop Violations: bench.rb has large methods with disabled cops (lines 188-369). Extract helper methods to reduce complexity.
- Inconsistent Error Handling: Some errors raise, others return failure_metrics(e). Choose one approach and document it.
Testing Concerns
Per CLAUDE.md requirements, there is no evidence of local testing in the PR description. Please document:
- Testing Core and Pro apps locally
- Testing with different parameter combinations
- Results of running bundle exec rubocop
Documentation Gaps
Add usage documentation for:
- Running benchmarks locally
- Triggering workflow in CI
- Parameter descriptions
- Interpreting results
Action Items Before Merge
Required:
- Remove SSH access or add secret-clearing
- Fix port hardcoding
- Add process health check to server startup
- Update CHANGELOG.md
- Document local testing results
Recommended: 6. Refactor bench.rb methods 7. Add usage documentation
Overall Assessment
Excellent foundation for performance testing! With security and reliability issues addressed, this will be a solid addition. Estimated effort: 2-3 hours.
Great work! 🚀
Code Review: Performance Testing Infrastructure
Thanks for adding performance testing infrastructure! Here is my detailed feedback on PR #1868.
Overall Assessment
Quality: Good - Well-structured benchmarking framework with solid error handling
Scope: Appropriate - Addresses issue #1777 with a flexible, configurable system
Testing: Needs improvement - See security and production concerns below
Strengths
- Excellent multi-tool approach: Testing with Fortio, Vegeta, and k6 provides good coverage
- Well-documented workflow: Clear step separation and helpful comments
- Flexible configuration: Good use of environment variables with sensible defaults
- Error handling: Solid error handling with descriptive messages
- Auto-detection of routes: Smart feature to discover benchmark routes automatically
- Proper warmup: 10-request warmup prevents cold-start skew
Critical Issues
1. Security: SSH Access in CI (HIGH PRIORITY)
Location: .github/workflows/benchmark.yml:105-130
The SSH debug access feature is a significant security risk. SSH sessions can remain open indefinitely and expose the GitHub Actions runner.
Recommendations:
- Add automatic timeout (30 minutes max)
- Consider removing if not actively needed
- Add more prominent warnings
2. Command Injection Vulnerability (HIGH PRIORITY)
Location: spec/performance/bench.rb:81,263,269
Shell commands use string interpolation without sanitization. If ROUTES env var contains shell metacharacters, command injection is possible.
Fix: Use Shellwords.escape for user inputs
3. Inconsistent Lock File Handling
Location: react_on_rails/spec/dummy/bin/prod:24-26 and react_on_rails_pro/spec/dummy/bin/prod:40
This project uses pnpm, not yarn (per CLAUDE.md). Scripts check for yarn.lock but should check pnpm-lock.yaml.
Documentation Needs
Per CLAUDE.md requirements:
- Update CHANGELOG.md with entry for PR 1868
- Document the benchmark workflow
- Update CLAUDE.md to mention spec/performance/ directory
Pre-merge Checklist
- Fix command injection vulnerability (use Shellwords.escape)
- Fix yarn.lock references to pnpm-lock.yaml
- Add security warnings or remove SSH access feature
- Run bundle exec rubocop and fix violations
- Update CHANGELOG.md
- Add benchmark documentation
- Add bench_results/ to RuboCop exclusions
Summary
Solid foundation for performance testing! Main concerns:
- Security: Command injection needs fixing before merge
- Consistency: Fix pnpm references
- Documentation: Add changelog entry and usage docs
Once security issue is addressed, this will be a valuable addition. Great work on the comprehensive workflow!
size-limit report 📦
| Path | Size |
|---|---|
| react-on-rails/client bundled (gzip) | 62.5 KB (0%) |
| react-on-rails/client bundled (gzip) (time) | 62.5 KB (0%) |
| react-on-rails/client bundled (brotli) | 53.66 KB (0%) |
| react-on-rails/client bundled (brotli) (time) | 53.66 KB (0%) |
| react-on-rails-pro/client bundled (gzip) | 63.45 KB (0%) |
| react-on-rails-pro/client bundled (gzip) (time) | 63.45 KB (0%) |
| react-on-rails-pro/client bundled (brotli) | 54.53 KB (0%) |
| react-on-rails-pro/client bundled (brotli) (time) | 54.53 KB (0%) |
| registerServerComponent/client bundled (gzip) | 127.35 KB (0%) |
| registerServerComponent/client bundled (gzip) (time) | 127.35 KB (0%) |
| registerServerComponent/client bundled (brotli) | 61.51 KB (+0.2% 🔺) |
| registerServerComponent/client bundled (brotli) (time) | 61.39 KB (0%) |
| wrapServerComponentRenderer/client bundled (gzip) | 121.96 KB (0%) |
| wrapServerComponentRenderer/client bundled (gzip) (time) | 121.96 KB (0%) |
| wrapServerComponentRenderer/client bundled (brotli) | 56.66 KB (0%) |
| wrapServerComponentRenderer/client bundled (brotli) (time) | 56.66 KB (0%) |
Code Review - Performance Testing Infrastructure
I've reviewed PR #1868 which adds comprehensive performance testing infrastructure. This is a substantial and well-structured addition to the project. Here's my detailed feedback:
✅ Strengths
1. Excellent Benchmark Implementation (spec/performance/bench.rb)
- Robust parameter validation: Input validation for RATE, CONNECTIONS, DURATION is thorough and provides clear error messages
- Multi-tool support: Supporting Fortio, Vegeta, and k6 gives flexibility and cross-validation of results
- Smart route discovery: Auto-detecting routes from Rails with proper filtering (pages# and react_router# controllers only)
-
Required parameter handling: Correctly skips routes with required parameters like
/rsc_payload/:component_name - Good error handling: Each tool's execution is wrapped in error handling with clear failure metrics
- Server warmup: 10 requests per route before benchmarking prevents cold-start bias
2. Well-Designed GitHub Workflow (.github/workflows/benchmark.yml)
- Comprehensive configurability: All key parameters (rate, duration, connections, tools) are configurable via workflow_dispatch
- Proper caching: Binary caching for Fortio, Vegeta, and gem caching reduces CI time
- Security-conscious SSH access: The debug_enabled input with clear warnings is the right approach
-
Both core and pro support: Conditional execution based on
app_versioninput is clean - Artifact retention: 30-day retention for benchmark results enables historical analysis
- Validation steps: Verifying summary.txt exists and displaying results before upload
3. Production Scripts (bin/prod, bin/prod-assets)
- Asset validation: Checking for public/assets and manifest.json before starting server is crucial
- Staleness warnings: Warning when source files are newer than compiled assets helps catch mistakes
- Foreman/Overmind support: Graceful fallback from overmind → foreman → gem install foreman
📝 Issues & Recommendations
CRITICAL: RuboCop Configuration
-
Issue: Added root-level
.rubocop.ymlthat excludesspec/performance/**/* -
Problem: According to CLAUDE.md, "BEFORE EVERY COMMIT: ALWAYS run
bundle exec rubocopand fix ALL violations" -
Action required: The new
spec/performance/bench.rbshould pass RuboCop. Please run:
And fix any violations. Only add exclusions if there's a valid reason (document it in comments).bundle exec rubocop spec/performance/bench.rb
Documentation Missing
From the PR checklist:
- ❌ Update documentation - No user-facing docs added
- ❌ Update CHANGELOG file - This is a significant feature addition
Recommendations:
- Add a section to README or create
BENCHMARKING.mdcovering:- How to run benchmarks locally
- How to interpret results
- How to use the GitHub Actions workflow
- Available configuration options
- Update appropriate CHANGELOG (
/CHANGELOG.mdor/CHANGELOG_PRO.md):#### Developer Experience - Add performance benchmarking infrastructure with Fortio, Vegeta, and k6 support. [PR 1868](https://github.com/shakacode/react_on_rails/pull/1868) by [alexeyr-ci2](https://github.com/alexeyr-ci2)
Testing Concerns
Per CLAUDE.md's CRITICAL LOCAL TESTING REQUIREMENTS:
- ⚠️ Have you tested the benchmark.yml workflow end-to-end?
- Can you confirm the Core workflow completes successfully?
- Can you confirm the Pro workflow completes successfully?
- Have you verified the uploaded artifacts contain expected files?
If not tested locally (which is understandable for complex CI workflows), please clearly state:
- "UNTESTED - requires GitHub Actions environment"
- Provide a plan to verify after merge (e.g., run workflow_dispatch manually)
Code Quality Issues
-
spec/performance/bench.rb:220 - Multiple RuboCop disables:
# rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity, Metrics/MethodLengthWhile sometimes necessary, consider refactoring:
- Extract common code from
run_fortio_benchmark,run_vegeta_benchmark,run_k6_benchmark - Create a
BenchmarkToolclass with a common interface - Move percentile extraction logic to a helper method
- Extract common code from
-
Hard-coded port 3001:
- Multiple locations: bench.rb:21, bin/prod:29, benchmark.yml:303, etc.
- Consider:
PORT = env_or_default('PORT', 3001)for flexibility
-
bench.rb:106 - Shell command without escaping:
routes_output = `cd #{app_dir} && bundle exec rails routes 2>&1`While
app_diris controlled internally, consider:routes_output = `cd #{Shellwords.escape(app_dir)} && bundle exec rails routes 2>&1` -
benchmark.yml:194 - Hard-coded Ruby version:
ruby-version: '3.3.7'Should this use a matrix strategy or at least match the versions in other workflows?
Minor Issues
-
benchmark.yml:242-258 - TODO comments:
# TODO only needed while we use --ignore-workspace below # TODO simplify this both here and in integration testsCreate follow-up issues to track these improvements
-
Deleted file:
packages/react-on-rails-pro-node-renderer/script/preinstall.js- Appears unrelated to this PR's purpose
- Should this be in a separate PR? If intentional, explain in PR description
-
Path updates in documentation files:
- Many
.claude/andanalysis/files haveyarn→pnpmreplacements - These seem like cleanup from a previous migration
- Consider separating documentation updates from feature PRs for easier review
- Many
🔒 Security
No security concerns identified:
- ✅ SSH access properly gated behind workflow_dispatch input
- ✅ No secrets exposed in benchmark results
- ✅ Tools are from official releases with pinned versions
⚡ Performance Considerations
-
Workflow caching is excellent:
- Binary caching saves ~30s per run
- Gem caching saves ~2-3 minutes
- Node modules caching saves ~1-2 minutes
-
Benchmark duration defaults:
- 30s default might be short for stable results
- Consider documenting: "Use 60s+ for production comparisons"
-
Parallel tool execution:
- Currently runs tools sequentially (Fortio → Vegeta → k6)
- Could parallelize across routes or tools, but sequential is safer for accurate results
🎯 Final Recommendations
Before merge:
- ✅ Run
bundle exec rubocop spec/performance/bench.rband fix violations - ✅ Add documentation (BENCHMARKING.md or README section)
- ✅ Update CHANGELOG.md with this feature
- ✅ Verify workflow runs successfully (at least once via workflow_dispatch)
- ✅ Address or create issues for TODO comments
Nice to have:
- Refactor benchmark tool functions to reduce complexity
- Make PORT configurable
- Add Shellwords escaping for shell commands
- Separate unrelated changes (preinstall.js deletion, doc updates) into their own PRs
Overall, this is excellent work - the implementation is solid, well-tested, and adds significant value. The issues are mostly about following project conventions (RuboCop, CHANGELOG) rather than fundamental problems. Great job! 🎉
Pull Request Review: Add Performance Tests
Thanks for adding performance testing infrastructure! This is a valuable addition to the project.
✅ Strengths
Well-Designed Benchmark Suite:
- Multiple tools (Fortio, Vegeta, k6) provide cross-validation of results
- Comprehensive metrics: RPS, p50/p90/p99 latencies, and status codes
- Flexible configuration via environment variables
- Smart route auto-detection from Rails routes
- Proper warm-up phase before benchmarking
Robust CI Integration:
- Conditional execution (master, workflow_dispatch, or labels)
- Artifact upload with 30-day retention
- Smart caching of benchmark tools
- Tests both Core and Pro versions
Good Operational Practices:
- Server health checks with retry logic
- Proper error handling and validation
- Clear logging with indicators
- Graceful cleanup
⚠️ Issues to Address
1. Security: Command Injection Vulnerabilities 🔴 HIGH PRIORITY
Location: spec/performance/bench.rb:106, 251, 301-302, 373
The script uses string interpolation in shell commands without proper sanitization. While ROUTES input is validated, route values aren't sanitized before use in shell commands.
Recommendation: Use array form of system() or Shellwords.escape for filenames.
2. FIXME Comments in Production Code 🟡 MEDIUM PRIORITY
Location: .github/workflows/benchmark.yml:79, 89
FIXMEs indicate test configuration (only root route, only Fortio) rather than production defaults. Should remove FIXMEs and set proper defaults or create follow-up issue.
3. Missing Changelog Entry 📋
Per CLAUDE.md guidelines, user-visible changes should be documented. This is a significant addition for developers.
4. Aggressive Process Killing ⚠️
Location: .github/workflows/benchmark.yml:374, 523
Kills ALL Ruby/Node processes which could affect other jobs in shared CI. Should store PID and kill specifically.
5. Port Hardcoding 🔧
Workflow hardcodes port 3001 in health checks despite configurable BASE_URL.
6. Validation Order Issue
RATE=max validation happens AFTER route detection. Move all validation to top.
💡 Suggestions
- Parallel warmup for better performance
- Metrics comparison logic across tools
- Usage documentation and result interpretation guide
- Dry run mode for quick validation
- Unit tests for validation and parsing logic
🎯 Summary
Solid foundation for performance testing. Main concerns:
- MUST FIX: Command injection security issues
- SHOULD FIX: FIXME comments and test configuration
- NICE TO HAVE: Changelog, better process management, docs
Overall: Approve with required changes ✋
Pull Request Review: Performance Testing Infrastructure
I've reviewed PR #1868 which adds comprehensive performance testing infrastructure.
Overall Assessment: STRONG - Well-designed with some important considerations
This PR adds valuable performance benchmarking infrastructure using industry-standard tools (Fortio, Vegeta, k6). The implementation is thoughtful, configurable, and follows good practices.
Strengths
Excellent Architecture & Design:
- Multi-tool approach: Using 3 different benchmarking tools provides cross-validation of results
- Configurable parameters: Environment variables allow flexible CI/CD integration
- Max-rate strategy: Smart choice for library benchmarking (documented in docs/planning/library-benchmarking.md)
- Auto-discovery: Automatically detects routes from Rails app - brilliant!
- Production mode testing: Tests with precompiled assets, which is the right approach
Robust Implementation:
- Proper error handling: bench.rb validates inputs, checks for missing tools, and provides clear error messages
- Server health checks: Waits for server readiness before benchmarking (60s timeout with detailed logging)
- Warmup phase: 10 requests per route before benchmarking - essential for accurate results
- Route filtering: Skips routes with required params and test-only routes
- CI caching: Aggressive caching of dependencies, gems, and benchmark tools
Good DevEx:
- Clear documentation: library-benchmarking.md explains the max-rate vs fixed-rate trade-offs
- Readable output: Tab-separated summary with column formatting
- Artifact uploads: 30-day retention of benchmark results
- SSH debugging: Optional tmate access for CI debugging (properly gated)
Security Conscious:
- SSH access requires explicit opt-in via debug_enabled input
- Clear warning messages about security risks
- limit-access-to-actor restricts access appropriately
Issues & Concerns
CRITICAL: Incomplete Pull Request
BLOCKER - Must address before merge:
-
Pull request checklist not completed:
- Update documentation - marked incomplete
- Update CHANGELOG file - marked incomplete
-
Missing changelog entries:
- This is a significant user-facing feature (benchmark workflow)
- Should have entries in CHANGELOG.md documenting the new benchmark.yml workflow
- How to run benchmarks via workflow_dispatch inputs
- How to interpret results
HIGH PRIORITY: Testing Gaps
-
No local testing evidence:
- Per CLAUDE.md CRITICAL LOCAL TESTING REQUIREMENTS: NEVER claim a test is fixed without running it locally first
- Has this workflow been tested locally with act, or in a real CI run?
- Have all 3 benchmark tools been verified to work correctly?
- Has the Pro app benchmarking been tested with PRO=true?
-
No validation of benchmark results:
- The workflow validates that summary.txt exists, but doesn't validate the data quality
- Should check for: non-zero RPS, reasonable latencies, no all-failures
- Consider adding basic sanity checks (e.g., RPS > 1, p99 < 10000ms)
-
Missing error scenarios:
- What happens if all routes have required params? (would fail with No routes to benchmark)
- What if server crashes during benchmark?
- What if all 3 tools fail?
MEDIUM PRIORITY: Code Quality
-
Magic numbers and unclear defaults (benchmarks/bench.rb:410):
- 10 warmup requests and 0.5s delay should be constants or ENV vars
- Consider: WARMUP_REQUESTS and WARMUP_DELAY_SEC as env vars
-
Hardcoded port in Core app (.github/workflows/benchmark.yml:306):
- Port 3001 is hardcoded but should match server startup
- Consider extracting to an env var
-
Inconsistent tool result handling:
- Fortio/Vegeta/k6 methods return nil if tool not enabled
- Then if metrics check before adding to summary
- Could be cleaner with early returns or wrapper method
-
RuboCop disable without explanation (benchmarks/bench.rb:226):
- Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity disabled
- These are legitimate disables for benchmark parsing methods, but complex methods warrant refactoring consideration
MEDIUM PRIORITY: CI/CD Concerns
-
Benchmark workflow triggers:
- Runs on ALL pushes to master (excluding docs)
- Runs on ALL PR sync/reopen events if labeled
- Question: Is this too aggressive? Benchmarks can be slow (2+ min per route × tools)
- Consider: Only run on master, or require manual trigger/label for PRs
-
Port cleanup race condition (.github/workflows/benchmark.yml:368-383):
- pkill -9 is aggressive (SIGKILL)
- Consider graceful shutdown first: pkill -TERM puma && sleep 2 && pkill -9 puma
-
Missing workflow timeout:
- Individual steps have timeout-minutes: 120
- But no job-level timeout
- Consider adding: timeout-minutes: 180 at job level
-
Artifact naming collision:
- Uses github.run_number which could collide if multiple PRs run simultaneously
- Consider: github.run_id instead
Specific File Feedback
benchmarks/bench.rb:
- Lines 59-67: Excellent error handling with descriptive messages ✅
- Lines 79-136: Route discovery is clever! Consider extracting to a method for testability
- Lines 203-210: Love the detailed attempt logging with timing ✅
- Lines 228-279: Fortio implementation solid, good use of failure_metrics helper
.github/workflows/benchmark.yml:
- Lines 71-88: Smart env var defaults using ternary operators ✅
- Lines 142-193: Excellent tool caching strategy ✅
- Lines 278-290: Good asset validation before server start ✅
- Lines 521-537: Nice workflow summary! ✅
benchmarks/k6.ts:
- Lines 21-35: Good error handling for missing TARGET_URL ✅
- Lines 38-56: Smart scenario switching based on rate mode ✅
- Lines 70-76: Status code checks are comprehensive ✅
react_on_rails/spec/dummy/bin/prod & react_on_rails_pro/spec/dummy/bin/prod:
- Asset validation: Excellent checks for manifest.json and public/assets ✅
- Staleness warnings: Great UX for developers! ✅
- Different approaches: Core uses Rails directly, Pro uses foreman/overmind - is this intentional?
Recommendations
Before Merging (MUST DO):
- Complete the PR checklist - Add CHANGELOG.md entry documenting the new benchmark workflow and update documentation
- Verify local testing - Run the workflow manually via workflow_dispatch or document tested workflow run number
- Add basic sanity checks - Validate benchmark results have reasonable values and exit with error if all routes fail
Consider (SHOULD DO):
- Extract magic numbers to constants/env vars (warmup count, delays)
- Add job-level timeout to workflow
- Use graceful shutdown before SIGKILL
- Improve artifact naming to avoid collisions using github.run_id
- Document when to worry threshold for regressions
Future Enhancements (COULD DO):
- Regression detection comparing against baseline from master
- Multiple samples - Run each route 2-3 times, average results
- Performance dashboard uploading results to time-series DB
- Alert on regressions via PR comments if performance drops
Security Review
No security concerns - SSH debugging is properly gated and documented
Verdict
Status: CHANGES REQUESTED
This is excellent work that adds valuable infrastructure! The implementation is solid and well-thought-out. However, it needs:
- Completed PR checklist (CHANGELOG + docs)
- Evidence of testing (workflow run or local validation)
- Basic result validation
Once these are addressed, this will be ready to merge. Great job on the comprehensive approach!
Estimated review time: 45 minutes Confidence level: High (reviewed all code, workflows, docs, and configs)
Code Review for PR #1868: Add Performance Tests
Thank you for this comprehensive performance testing implementation! This addresses issue #1777 excellently.
✅ Strengths
1. Comprehensive Testing Setup
- Multi-tool approach (Fortio, Vegeta, k6) validates results
- Well-documented max-rate benchmarking decision
- Smart auto-detection of routes from Rails
- Good handling of optional vs required route parameters
2. Robust Implementation
- Input validation for all parameters
- Server readiness checks with timeout
- 10-request warmup per route
- Artifact preservation (30-day retention)
3. CI/CD Integration
- Well-structured workflow with clear steps
- Conditional execution (labels, workflow_dispatch, master)
- Good caching strategy
- Proper process cleanup
4. Developer Experience
- SSH debugging (properly secured)
- Configurable via workflow inputs
- Clear error messages
- bin/prod and bin/prod-assets scripts
⚠️ Issues
1. Port Cleanup (MEDIUM) - .github/workflows/benchmark.yml:368
- Uses pkill -9 (SIGKILL) - no graceful shutdown
- Recommend: Try pkill -15 first, then -9 after delay
2. Hardcoded Port 3001 (LOW)
- Multiple files hardcode port 3001
- Recommend: Make configurable via PORT env var
3. Magic Numbers (LOW) - benchmarks/bench.rb:410,412
- Warmup count (10) and delay (0.5) not constants
- Recommend: Extract to WARMUP_REQUESTS, WARMUP_DELAY_SEC
4. Documentation Reference (LOW) - CLAUDE.md:215
- References spec/dummy/TESTING_LOCALLY.md
- Recommend: Verify exists or create it
📋 Required Before Merge (P0)
Per CLAUDE.md requirements:
- ✅ Run bundle exec rubocop and fix all violations
- ✅ Ensure all files end with newline characters
- 📝 Update CHANGELOG.md with this feature
🎉 Conclusion
Excellent implementation! Address the P0 items (especially RuboCop and CHANGELOG per CLAUDE.md) and it's ready to merge.
Great work! 🚀
Review by: Claude Code Date: 2025-12-12 19:08 UTC
Code Review: Performance Testing Infrastructure
This PR adds comprehensive performance testing infrastructure. Overall well-structured.
✅ Strengths
- Excellent architecture with clean separation of concerns
- Comprehensive tooling (k6, Fortio, Vegeta)
- Robust error handling and validation
- Production-ready scripts with staleness checks
⚠️ Must Fix Before Merge
1. CRITICAL: Missing CHANGELOG Entry
Per CLAUDE.md, user-visible changes must be documented. Add entry to CHANGELOG.md
2. RuboCop Compliance Required
Per CLAUDE.md: Run bundle exec rubocop benchmarks/ and fix ALL violations
3. Documentation Needed
Add benchmarks/README.md explaining how to run locally and interpret results
Should Fix (Nice to Have)
- Add input validation in k6.ts for numeric parameters
- Consider if SSH access in workflow is truly necessary
Rating: 8.5/10
Would be 9.5/10 with CHANGELOG and docs. Great infrastructure code!
Code Review: Add Performance Tests
Thank you for this comprehensive benchmarking implementation! This is a valuable addition to the project. Below is my detailed review covering code quality, potential issues, security, and best practices.
Overall Assessment
This PR successfully adds performance testing infrastructure with k6 and Vegeta. The implementation is well-structured, follows good practices, and integrates cleanly with the existing CI infrastructure.
Strengths
1. Architecture & Design
- Modular structure: Shared helpers in benchmark_helpers.rb promote code reuse
- Flexible configuration: Environment variables allow easy customization without code changes
- Route auto-detection: Smart filtering of Rails routes eliminates manual maintenance
- Multi-tool approach: k6 for HTTP/1.1 benchmarks, Vegeta for HTTP/2 Node Renderer tests
2. GitHub Actions Workflow
- Conditional execution: Smart use of labels (benchmark, full-ci) prevents unnecessary runs
- Efficient caching: Caches binaries (Vegeta), gems (foreman), and node_modules
- Comprehensive validation: Checks assets are built before server starts
- Artifact retention: Stores results for 30 days with proper naming
- Clean error handling: set -e and explicit error messages throughout
3. Code Quality
- Input validation: All parameters validated before use (rate, duration, connections)
- Graceful degradation: Helpful error messages when tools/files missing
- Documentation: Clear comments explain configuration and trade-offs
Issues & Recommendations
1. CHANGELOG Update Needed (REQUIRED)
Per CLAUDE.md guidelines, this PR adds a user-visible feature and should have a CHANGELOG entry.
Recommendation: Add to CHANGELOG.md:
[Unreleased]
Added
- PR 1868 by alexeyr-ci2: Added performance benchmarking infrastructure with k6 and Vegeta, including GitHub Actions workflow for automated benchmarks
2. Security: Process Killing is Too Aggressive (HIGH PRIORITY)
Location: .github/workflows/benchmark.yml:310, :476
Current code: pkill -9 -f "ruby|node|foreman|overmind|puma"
This kills ALL Ruby and Node processes on the CI runner. While this works in isolated CI, it's fragile and could cause issues if GitHub changes runner isolation.
Recommendation: Use more targeted cleanup by killing only processes on port 3001 or saving PIDs when starting servers.
3. Hardcoded Password in Benchmark (MEDIUM)
Location: benchmarks/bench-node-renderer.rb:22-28
Parsing password from config with regex is fragile. Consider using environment variable instead for better maintainability.
4. RuboCop Violations (MEDIUM)
The code disables several RuboCop cops (Metrics/AbcSize, Metrics/CyclomaticComplexity). Consider refactoring complex methods into smaller functions for better maintainability and testability.
5. Missing Unit Tests
No unit tests for benchmark helper functions. Consider adding RSpec tests for:
- validate_rate, validate_duration, validate_positive_integer
- route_has_required_params?
- sanitize_route_name
- parse_json_file error handling
Security Review
Secrets Handling: Good
- REACT_ON_RAILS_PRO_LICENSE_V2 properly uses GitHub Secrets
- SECRET_KEY_BASE is a dummy value (not used for anything sensitive)
Command Injection: Safe
- All user inputs are validated before use in shell commands
- No direct interpolation of user input into shell commands
Resource Limits: Could Be Better
- Benchmarks run with timeout-minutes: 120 - quite generous
- Consider lowering to 60 minutes to prevent hung jobs
Action Items Summary
Must Fix (Before Merge):
- Add CHANGELOG entry for this user-visible feature
- Consider more targeted process cleanup (or document why aggressive approach is safe)
Should Fix (Nice to Have):
- Refactor complex methods to avoid disabling RuboCop cops
- Add unit tests for benchmark helper functions
- Extract password to environment variable instead of parsing config
Could Fix (Future Work):
- Add dry-run mode for testing
- Create docs/guides/running-benchmarks.md
- Extract server wait logic to reusable script
Conclusion
This is a solid, well-implemented PR that adds valuable performance testing infrastructure. The code quality is high, the GitHub Actions workflow is comprehensive, and the architecture is flexible.
The main concerns are:
- Missing CHANGELOG entry (per project guidelines)
- Aggressive process killing (works but fragile)
- Some methods could be refactored for better maintainability
With the CHANGELOG entry added, this PR is ready to merge. The other suggestions are nice-to-haves that can be addressed in follow-up PRs.
Great work!
Review generated by Claude Code based on project guidelines in CLAUDE.md
Pull Request Review: Performance Testing Infrastructure
This PR adds a comprehensive performance testing framework to React on Rails.
STRENGTHS
- Well-Architected Benchmark Infrastructure
- Modular design: Separate scripts for Rails benchmarks and Node Renderer benchmarks
- Shared utilities: Clean extraction of common functionality
- Multiple tools: k6 for Rails, Vegeta with HTTP/2 for Node Renderer
- Good documentation explaining the max-rate approach
- Robust Error Handling
- Comprehensive validation of all parameters
- Tool availability checks before running
- Server health checks with timeout/retry logic
- Graceful error handling with clear messages
- CI/CD Integration
- Workflow triggers only when needed
- Efficient caching of dependencies
- Proper artifact retention (30 days)
- Clean separation between Core and Pro benchmarks
ISSUES AND CONCERNS
Security:
-
Dangerous process killing (lines 314, 487) pkill -9 -f "ruby|node|foreman|overmind|puma"
- Broad regex pattern could kill unintended processes
- Recommendation: Use PID files for safer process management
-
Command injection vulnerability (bench.rb:58)
- Backtick execution with interpolated variable without escaping
- Current Risk: Low (APP_DIR is hardcoded), but dangerous pattern
- Recommendation: Use Shellwords.escape
-
Secrets handling (bench-node-renderer.rb:32)
- Password read from config file
- Ensure it is test-only and not in artifacts
Code Quality:
-
RuboCop violations not addressed
- Multiple rubocop:disable comments for complexity metrics
- Recommendation: Refactor methods to reduce complexity
-
Magic numbers (bench.rb:211)
- Hardcoded values like 10 warmup requests, 0.5s delay
- Recommendation: Extract to named constants
Testing and Documentation:
-
Missing CHANGELOG entry
- This is a significant new feature that needs changelog documentation per CLAUDE.md
-
No RSpec tests for benchmark helpers
- Complex validation logic in benchmark_helpers.rb lacks unit tests
-
Documentation improvements needed
- Missing local usage guide, result interpretation, and troubleshooting docs
RECOMMENDATIONS SUMMARY
Must Fix Before Merge:
- Add CHANGELOG entry
- Fix command injection vulnerability
- Improve process killing to use PID files
Should Fix: 4. Refactor RuboCop-disabled methods 5. Add RSpec tests for benchmark_helpers.rb 6. Create comprehensive documentation
OVERALL ASSESSMENT
This is a high-quality addition with sound architecture and thorough implementation.
Code Quality: 4/5 Security: Minor issues that should be fixed Testing: No unit tests for helpers Documentation: 3/5 - Good inline docs, missing user-facing guide
Recommendation: Approve with minor changes. Security issues are low-risk but should be fixed.
Great work! This benchmark infrastructure will be invaluable for catching performance regressions.
@coderabbitai review
✅ Actions performed
Review triggered.
Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.
PR Review: Add Performance Tests
Overview
This PR adds a comprehensive benchmarking framework to React on Rails, including:
- Ruby-based benchmark scripts for Rails routes and Node Renderer
- k6 (HTTP benchmarking tool) integration
- Vegeta integration (HTTP/2 support for Node Renderer)
- GitHub Actions workflow for automated CI benchmarking
- Production server setup scripts for dummy apps
✅ Strengths
-
Well-structured benchmark architecture:
- Shared helpers in
benchmark_helpers.rbpromote code reuse - Clear separation between Rails route benchmarks and Node Renderer benchmarks
- Flexible configuration via environment variables
- Shared helpers in
-
Robust error handling:
- Input validation for all parameters (rate, duration, connections)
- Proper timeout handling in server health checks
- JSON parsing errors are caught and reported clearly
-
Good CI/CD integration:
- Conditional workflow execution based on labels (
benchmark,full-ci) - Proper artifact retention (30 days)
- Integration with
benchmark-action/github-action-benchmarkfor tracking trends
- Conditional workflow execution based on labels (
-
Production-ready configuration:
- Proper Puma worker configuration with
preload_app! - ActiveRecord connection management in
on_worker_boot - Asset precompilation setup
- Proper Puma worker configuration with
-
CHANGELOG updated: Properly categorized as "Developer (Contributors Only)"
⚠️ Issues & Recommendations
Critical
-
Security: SSH access in workflow (removed in later commits?)
- Early commits showed SSH access via
mxschmitt/action-tmate - Verify this was removed or properly gated in final version
- If present, ensure it's only for
workflow_dispatchwith explicit opt-in
- Early commits showed SSH access via
-
Resource cleanup may be insufficient (line 353 in benchmark.yml):
pkill -9 -f "ruby|node|foreman|overmind|puma" || true- This is VERY aggressive and could kill unrelated processes
- In CI it's safe (isolated environment), but worth a comment explaining why
- Consider more targeted killing using process group IDs
-
Missing RuboCop validation:
- Per CLAUDE.md: "ALWAYS run
bundle exec rubocopand fix ALL violations" - Ruby files in
benchmarks/should pass RuboCop - Some files have
rubocop:disablecomments - verify these are justified
- Per CLAUDE.md: "ALWAYS run
High Priority
-
Hardcoded passwords (bench-node-renderer.rb:32):
PASSWORD = read_password_from_config- Reads password from
client/node-renderer.js - Document that this is for LOCAL/CI testing only, not production
- Consider using a well-known test password constant instead
- Reads password from
-
URL encoding vulnerability (bench-node-renderer.rb:133):
def url_encode(str) URI.encode_www_form_component(str) end- This is correct, but the calling code concatenates strings instead of using form encoding
- Lines 145-148 manually build the body - this works but is fragile
- Consider using
URI.encode_www_formwith hash input instead
-
Shell injection risk (bench.rb:58):
routes_output = `cd #{Shellwords.escape(app_dir)} && bundle exec rails routes 2>&1`- Good: Uses
Shellwords.escape - Bad: app_dir comes from a constant based on PRO env var
- If PRO env var is ever user-controlled, this is exploitable
- Since it's CI-only, probably fine, but worth documenting
- Good: Uses
-
No timeout on system calls:
- Line 161 in bench.rb:
system("#{k6_command} | tee #{k6_txt}") - Uses
timeout-minutes: 120in CI, but no timeout in Ruby - k6/Vegeta have their own timeouts, so probably OK
- Consider adding a Ruby
Timeout.timeoutwrapper for safety
- Line 161 in bench.rb:
Medium Priority
-
Benchmark thresholds seem loose (benchmark.yml:316, 333):
alert-threshold: '150%' fail-on-alert: true- 150% regression before alerting seems high for performance tests
- Consider tightening to 125% or 120% for RPS
- Latency might need different threshold (e.g., 150% is more reasonable for spikes)
-
Missing file validation:
-
bench-node-renderer.rbfinds bundles in.node-renderer-bundles/ - Good error message if directory missing, but doesn't check bundle validity
- What if bundles are corrupted or incomplete?
- Add basic validation (check for package.json or specific files)
-
-
k6.ts uses strict status checks (k6.ts:74-79):
check(response, { status_200: (r) => r.status === 200, status_3xx: (r) => r.status >= 300 && r.status < 400, status_4xx: (r) => r.status >= 400 && r.status < 500, status_5xx: (r) => r.status >= 500, });- Good: Categorizes status codes
- Issue: Doesn't handle 1xx informational responses (rare but possible)
- Status 0 (connection failed) won't match any check - handle this case
-
Large file cleanup (bench-node-renderer.rb:210):
FileUtils.rm_f(vegeta_bin)- Good practice to delete large binary files
- Consider doing the same for k6 if it generates large files
Low Priority / Style
-
Magic numbers in configuration:
- Line 211 in bench.rb:
10.timesfor warmup - Line 30 in benchmark.yml:
default: 10for connections - Consider extracting to named constants at top of files
- Line 211 in bench.rb:
-
Inconsistent metric naming:
-
bench.rboutputs "p50(ms)", "p90(ms)", etc. -
convert_to_benchmark_json.rbcreates "- p50 latency", "- p90 latency" - Both are fine, but consider consistent capitalization/formatting
-
-
Documentation:
-
docs/planning/library-benchmarking.mdis mentioned in PR description - Verify this document explains WHY these benchmarks matter
- Should include: how to interpret results, when to run locally, thresholds
-
-
Test coverage:
- No RSpec tests for benchmark scripts themselves
- Consider adding basic tests for validation functions, parsing logic
- At minimum, add smoke test to CI that validates scripts don't crash
-
Prettier/formatting:
- Verify k6.ts passes ESLint/Prettier
- Run
pnpm run format.listDifferentto check - The
/* eslint-disable import/no-unresolved */is fine since k6 is global
🔍 Code Quality Observations
Good Practices:
- Frozen string literals throughout Ruby files
- Descriptive variable names
- Comprehensive comments explaining k6 script configuration
- Proper use of
set -ein bash scripts (workflow steps)
Potential Improvements:
- Consider extracting common workflow logic to reusable actions
- The workflow is 589 lines - consider splitting Core/Pro into separate jobs
- Some repeated code in benchmark.yml (server start logic) - could be a script
🧪 Testing Recommendations
Before merging:
-
Run RuboCop:
bundle exec rubocop benchmarks/ -
Test locally:
- Run
ruby benchmarks/bench.rbwith Core dummy app - Run
ruby benchmarks/bench-node-renderer.rbwith Pro - Verify all output files are generated correctly
- Run
-
Test workflow manually: Trigger via
workflow_dispatchwith various input combinations - Verify cleanup: Ensure server processes are killed properly after workflow
📝 Documentation Checklist
- [ ] Add README in
benchmarks/directory explaining how to use scripts - [ ] Document environment variables in script headers (✅ partially done in k6.ts)
- [ ] Explain benchmark interpretation in
docs/planning/library-benchmarking.md - [ ] Add troubleshooting section for common issues (server won't start, tools not found)
Performance Considerations
- Benchmarks will run on every push to master - this is expensive
- Consider limiting to:
- Weekly schedule for full benchmarks
- Push to master only for smoke test (1 route, 10s duration)
- Full benchmarks only on labeled PRs or
workflow_dispatch
- Current config seems reasonable, but monitor CI minutes usage
Security Assessment
Overall security posture is good for a CI-only tool:
- ✅ Proper input validation prevents command injection
- ✅ Uses
Shellwords.escapewhere appropriate - ✅ Secrets properly handled via GitHub Secrets
- ⚠️ Verify SSH access removed/gated properly
- ⚠️ Document that passwords are test-only, not for production
Conclusion
This is a solid addition to the project. The benchmarking framework is well-designed, properly integrated into CI, and follows most project conventions.
Recommendation: Approve with minor changes
Priority fixes before merge:
- Run RuboCop and fix any violations
- Verify SSH access is properly removed/gated
- Add a comment explaining the aggressive
pkillusage in cleanup - Consider tightening alert thresholds from 150% to 125%
Everything else can be addressed in follow-up PRs if needed.
Great work on this comprehensive benchmarking system! 🎉
PR Review: Performance Testing Infrastructure
This is a comprehensive and well-structured PR adding performance benchmarking to the React on Rails project. Overall, the implementation is solid with good attention to detail.
Strengths
Architecture and Design
- Excellent separation of concerns with benchmark helpers extracted into reusable module
- Flexible configuration via environment variables with sensible defaults
- Multi-tool approach: k6 for Rails benchmarks, Vegeta for Node Renderer with HTTP/2 support
- Smart route detection: Automatically discovers benchmarkable routes from Rails with proper filtering
- Well-documented strategy in docs/planning/library-benchmarking.md
Code Quality
- Robust error handling with comprehensive parameter validation
- Safety features: warm-up phase, server health checks with retries, proper cleanup
- Structured output in multiple formats (JSON, text, summary)
- Smart GitHub Actions integration with caching, artifacts, and conditional execution
Production Readiness
- New bin/prod and bin/prod-assets scripts with validation
- Properly configured Puma for benchmarking with worker processes and thread control
- Well-organized CI workflow with clear steps, error handling, and summaries
Issues and Concerns
1. Security: Command Injection Risk (HIGH)
Location: benchmarks/bench.rb:59 and benchmarks/bench-node-renderer.rb:94
The URL construction in bench-node-renderer.rb uses string interpolation without escaping. While url_encode is used for the body, the URL itself should be shell-escaped.
Recommendation: Use system() with array arguments instead of backticks, or ensure all interpolated values are properly escaped with Shellwords.escape
2. Security: Process Killing (MEDIUM)
Location: .github/workflows/benchmark.yml:352-353, 564-565
The workflow uses pkill -9 -f with a broad regex that could kill unrelated processes. While the comment says it is safe in isolated CI, it is a dangerous pattern.
Recommendation: Track PIDs explicitly when starting servers and kill only those specific PIDs.
3. Hardcoded Credentials Reading (LOW)
Location: benchmarks/bench-node-renderer.rb:21-29
Reading passwords from config files with regex is fragile and could break with formatting changes.
Recommendation: Consider exposing the password as an environment variable.
Suggestions for Improvement
4. Missing Tests
There are no tests for the benchmark scripts. While these are developer-only tools, basic tests would catch regressions.
Recommendation: Add spec/benchmarks/ with unit tests for helper functions.
5. Workflow Complexity
The benchmark workflow is 588 lines - very large for a single workflow file.
Recommendation: Consider extracting repeated setup steps into reusable actions or splitting into separate workflows.
6. Magic Numbers
Extract magic numbers like warmup count (10) and delay (0.5) to named constants.
7. Error Handling
The workflow could provide more informative errors with additional context about server status when failures occur.
Testing Recommendations
Before merging, verify:
- Manual test the workflow via workflow_dispatch with different parameter combinations
- Test failure scenarios (server fails, invalid parameters, no routes, missing tools)
- Test on a PR with benchmark label to ensure conditional triggers work
- Verify artifact uploads are properly accessible
Overall Assessment
This is high-quality work that adds significant value to the project. The benchmarking infrastructure is well-designed, comprehensive, and production-ready.
Main concerns:
- Security issues (command injection, aggressive process killing) - should be addressed before merge
- Lack of tests - acceptable for MVP but should be added soon
- Workflow complexity - not blocking but consider refactoring for maintainability
Recommendation: Approve with requested changes - fix the security issues, then merge.
Great work on this PR!
PR Review: Add Performance Tests
This PR adds comprehensive benchmarking infrastructure to the React on Rails project. Here's my detailed review:
🎯 Overall Assessment
Strong implementation with well-structured code, good documentation, and appropriate tooling choices. The PR successfully addresses #1777 by adding automated performance testing in CI.
✅ Strengths
1. Excellent Architecture
- Clean separation of concerns with modular helper functions (
benchmark_helpers.rb) - Reusable benchmarking scripts for both Core and Pro versions
- Well-designed workflow with clear steps and error handling
2. Comprehensive Tooling
- k6 for HTTP benchmarking (Core Rails routes) - modern, reliable, good reporting
- Vegeta for HTTP/2 benchmarking (Pro Node Renderer) - appropriate for h2c protocol
- Tool caching to speed up CI runs
- Proper validation of benchmark parameters
3. Good Documentation
- Clear
library-benchmarking.mdexplaining the "max rate" approach and trade-offs - Inline comments explaining non-obvious logic
- Good workflow parameter documentation
- CHANGELOG entry (in Developer section - appropriate)
4. Strong Error Handling
- Proper timeout checks for server startup
- Validation of all benchmark parameters
- Graceful handling of missing bundles
- Clear error messages with actionable guidance
5. CI Integration
- Smart conditional execution (master, workflow_dispatch, or labeled PRs)
- Artifact uploads for troubleshooting
- Integration with github-action-benchmark for tracking
- Proper cleanup steps in
always()blocks
🔍 Code Quality Issues
CRITICAL: Security - Credential Exposure
Location: benchmarks/bench-node-renderer.rb:20-29
def read_password_from_config
config_path = File.expand_path(
"../react_on_rails_pro/spec/dummy/client/node-renderer.js",
__dir__
)
config_content = File.read(config_path)
match = config_content.match(/password:\s*['"]([^'"]+)['"]/)
match ? match[1] : raise("password not found in \#{config_path}")
end
Issue: The password is hardcoded in the config file and read via regex. While this is a test/benchmark password (not production), this pattern could be problematic if:
- The config file is accidentally committed with a real password
- Someone copies this pattern for production code
Recommendation:
- Add a comment explaining this is ONLY for benchmarking and should never be used in production
- Consider using an environment variable with a fallback to the config file
- Ensure the password in the config file is clearly marked as a test password
MEDIUM: Changelog Placement
Location: CHANGELOG.md:24-26
Per CLAUDE.md guidelines:
Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements) Do NOT add entries for: linting, formatting, refactoring, tests, or documentation fixes
Issue: This is marked as "Developer (Contributors Only)" which is appropriate, but the CLAUDE.md guidance says "Do NOT add entries for... tests". This benchmarking infrastructure is specifically for contributors/CI, not end users.
Recommendation: This is actually fine as-is since it's in the "Developer (Contributors Only)" section, which is the correct place for CI/tooling improvements. Consider clarifying in CLAUDE.md that "Developer" section entries ARE acceptable for significant CI/testing infrastructure.
⚠️ Potential Issues
1. Process Management in CI
Location: .github/workflows/benchmark.yml:349
pkill -9 -f "ruby|node|foreman|overmind|puma" || true
Issue: Using pkill -9 with broad patterns could kill unintended processes in a shared CI environment.
Recommendation: While this is acceptable in isolated GitHub Actions runners, consider:
- Using more specific process identifiers
- Storing PIDs from the startup step
- Using
pkill -P(kill process group) instead
2. Hardcoded Port Numbers
Multiple locations:
- Port 3001 for Rails server
- Port 3800 for Node Renderer
Issue: If multiple benchmarks run concurrently (future scaling), ports could conflict.
Recommendation:
- Current approach is fine for single-job workflow
- If parallelization is added later, use dynamic port allocation
- Document the port assumptions clearly (which you've done well)
3. No Baseline Protection
Location: .github/workflows/benchmark.yml:316, 525
fail-on-alert: true
alert-threshold: '150%'
Issue: 150% threshold is very permissive. A 50% regression is significant.
Recommendation:
- Consider tightening to 125% or 130% once baseline is established
- Add a comment explaining the rationale for the threshold
- Monitor a few runs to see actual variance before tightening
4. Route Filtering Logic
Location: benchmarks/bench.rb:33-46
def route_has_required_params?(path)
path_without_optional = path.gsub(/\([^)]*\)/, "")
path_without_optional.include?(":")
end
Issue: This regex assumes parentheses are only used for optional params. If a route has nested parentheses or literal parentheses in the path (rare but possible), this could fail.
Recommendation:
- Add a comment explaining the assumption
- Consider using Rails route introspection API instead of regex parsing
- Add a test case for this logic
5. Warm-up Strategy
Location: benchmarks/bench.rb:211-216
10.times do
server_responding?(target)
sleep 0.5
end
Issue: Fixed 10 requests might not be enough for JIT compilation, DB connection pooling, etc.
Recommendation:
- Consider a time-based warm-up (e.g., 5-10 seconds)
- Or adaptive warm-up (continue until latency stabilizes)
- Current approach is acceptable for initial implementation
🎨 Style & Best Practices
Minor Issues:
-
RuboCop Disables
-
benchmarks/bench.rb:140andbenchmarks/bench-node-renderer.rb:151 - Disabling complexity metrics is acceptable for benchmark scripts
- Consider extracting some logic to reduce complexity instead
-
-
Magic Numbers
-
30sduration,60stimeout,10connections - Well-documented in code and configurable via env vars ✅
- Good defaults chosen
-
-
Error Messages
- Clear and actionable throughout ✅
- Good use of emojis for visual scanning in CI logs
🧪 Test Coverage
Missing:
- Unit tests for helper functions (
benchmark_helpers.rb) - Tests for route filtering logic
- Tests for JSON conversion
Recommendation:
- Add RSpec tests for
benchmark_helpers.rbfunctions - Test edge cases (missing files, invalid JSON, network errors)
- Not blocking for merge, but should be added
🚀 Performance Considerations
-
Caching Strategy ✅
- Good caching of tools (Vegeta, foreman)
- Good caching of node_modules and gems
- Cache keys look correct
-
Benchmark Duration
- 30s per route is reasonable
- With ~15-20 routes, total benchmark time ~10-15 minutes
- Consider reducing to 20s if CI time becomes an issue
-
Parallel Execution
- Currently sequential (one app at a time)
- Could parallelize Core and Pro benchmarks in separate jobs
- Current approach is fine for initial implementation
📝 Documentation
Strengths:
- Clear
library-benchmarking.mdwith trade-off analysis ✅ - Good inline comments ✅
- Clear parameter documentation in workflow ✅
Missing:
- How to run benchmarks locally (beyond env vars)
- How to interpret results and identify regressions
- What to do when benchmarks fail
Recommendation: Add a benchmarks/README.md with:
- Local execution instructions
- How to debug failures
- How to add new benchmark routes
- How to interpret github-action-benchmark results
🔒 Security Considerations
Issues Identified:
-
Password in source (covered above under CRITICAL)
-
SECRET_KEY_BASE in workflow
-
'dummy-secret-key-for-ci-testing-not-used-in-production' - This is fine for benchmarking, clearly marked as dummy ✅
-
-
Pro License Secret
- Using GitHub Secrets properly ✅
- Secret name:
REACT_ON_RAILS_PRO_LICENSE_V2
✨ Recommendations Summary
Before Merge (High Priority):
- ✅ Add security comment to password extraction function
- ✅ Verify CHANGELOG.md placement is correct (it is, in Developer section)
- ⚠️ Consider tightening alert threshold from 150% to 125-130% after baseline
Follow-up (Medium Priority):
- Add
benchmarks/README.mdwith usage instructions - Add unit tests for
benchmark_helpers.rb - Consider using PID tracking instead of
pkill -9with broad patterns - Monitor initial benchmark runs and adjust thresholds/duration as needed
Future Enhancements (Low Priority):
- Parallelize Core and Pro benchmarks in separate jobs
- Add multiple samples per route to reduce variance
- Consider adaptive warm-up strategy
🎉 Conclusion
This is excellent work with thoughtful design decisions and good engineering practices. The code is clean, well-documented, and properly integrated with CI.
Recommendation: Approve with minor comments addressed.
The identified issues are mostly minor, with the password handling being the only security concern (and it's mitigated by being test-only). The PR is ready to merge once the security comment is added.
Great job on this contribution! 🚀
📊 Stats
- Lines added: 1,742
- Lines removed: 39
- Files changed: 30
- New workflows: 1 (benchmark.yml)
- New benchmarking scripts: 4
- Documentation files: 1
Review performed following CLAUDE.md guidelines and React on Rails best practices.
@coderabbitai Review.