react_on_rails icon indicating copy to clipboard operation
react_on_rails copied to clipboard

Add performance tests

Open alexeyr-ci2 opened this issue 6 months ago • 15 comments

Summary

Add performance tests. Fixes #1777.

Pull Request checklist

  • ~[ ] Add/update test to cover these changes~
  • [ ] Update documentation
  • [ ] Update CHANGELOG file

This change is Reviewable

alexeyr-ci2 avatar Oct 20 '25 14:10 alexeyr-ci2

Walkthrough

Adds a CI benchmarking system: new GitHub Actions workflow, k6/Vegeta benchmark drivers (TypeScript + Ruby), shared helpers and converter, dummy-app production/start scripts and configs, tooling/ignore updates, and docs/contributing updates to run and publish Core and Pro performance tests in CI and locally. (50 words)

Changes

Cohort / File(s) Change Summary
CI Benchmark Workflow
​.github/workflows/benchmark.yml
New GitHub Actions workflow to run Core/Pro benchmarks (Rails + Node renderer), install/cache tools, prepare apps, start/stop servers, run k6/Vegeta, validate/convert results, upload artifacts, optionally push data on master, and emit a workflow summary with guarded failures.
Benchmark Orchestrators
benchmarks/bench.rb, benchmarks/bench-node-renderer.rb
New Ruby drivers: bench.rb runs k6 per Rails route (discovery, warm‑up, per-route JSON parsing); bench-node-renderer.rb runs Vegeta (HTTP/2) per production bundle, detects RSC bundles and emits textual JSON summaries.
Benchmark Support & Conversion
benchmarks/lib/benchmark_helpers.rb, benchmarks/convert_to_benchmark_json.rb, benchmarks/k6.ts
New shared Ruby helpers for env/validation/server readiness/summary; converter to produce unified benchmark JSON; TypeScript k6 script with dynamic scenarios and runtime checks.
Dummy App Prod Scripts & Procfiles
react_on_rails/spec/dummy/bin/prod, react_on_rails/spec/dummy/bin/prod-assets, react_on_rails_pro/spec/dummy/bin/prod, react_on_rails_pro/spec/dummy/bin/prod-assets, react_on_rails_pro/spec/dummy/Procfile.prod
Added production start and asset-build scripts with asset freshness checks, environment exports, CI bootsnap precompile hook, and process-supervisor selection (overmind/foreman) with fallbacks.
Puma, Shakapacker & Env Configs
react_on_rails/spec/dummy/config/puma.rb, react_on_rails/spec/dummy/config/shakapacker.yml, react_on_rails_pro/spec/dummy/config/puma.rb, react_on_rails_pro/spec/dummy/config/shakapacker.yml, react_on_rails_pro/spec/dummy/config/environments/production.rb, react_on_rails_pro/spec/dummy/config/database.yml
Puma: ENV-driven min/max threads, production workers, worker_shutdown_timeout; Shakapacker: cache path, extract_css and ensure_consistent_versioning; production env tweaks and DB pool uses RAILS_MAX_THREADS.
Asset Task Update
react_on_rails_pro/spec/dummy/lib/tasks/assets.rake
Replaced yarn invocations with pnpm in asset build tasks.
Project Tooling & Ignore Files
.gitignore, .prettierignore, react_on_rails_pro/.prettierignore, knip.ts, package.json, CLAUDE.md
Ignore bench_results; add recursive **/vendor prettier ignore and bin/.* to knip ignores; include benchmarks/k6.ts in knip entries; add @types/k6 devDependency; add pre‑commit YAML lint guidance.
Docs & Contribution
docs/planning/library-benchmarking.md, CONTRIBUTING.md, CHANGELOG.md, CLAUDE.md
New benchmarking planning doc, CONTRIBUTING updated with benchmarking section, CHANGELOG entry for CI benchmarking, and CLAUDE.md YAML lint guidance.

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub Actions
    participant Runner as CI Runner
    participant Tools as Tools (k6, Vegeta, Node, Ruby, pnpm)
    participant App as Dummy App (Rails + Node Renderer)
    participant Store as Artifacts/Registry

    GH->>Runner: trigger (workflow_dispatch / push / labeled PR)
    Runner->>Tools: install & restore caches
    Runner->>App: prepare apps (build assets / generate packs)
    Runner->>App: start prod servers (rails, node-renderer)
    Runner->>App: wait_for_server / health checks
    Runner->>Tools: run Core benchmarks (k6 per route)
    Runner->>Tools: run Pro benchmarks (Vegeta per bundle)
    Tools->>App: send HTTP requests → collect metrics
    Runner->>Runner: parse & convert summaries to benchmark JSON
    Runner->>Store: upload artifacts & publish results
    Runner->>GH: post workflow summary / status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Focus review on:
    • benchmarks/bench.rb — route discovery, k6 invocation, JSON parsing.
    • benchmarks/bench-node-renderer.rb — bundle discovery, RSC detection, Vegeta HTTP/2 usage and aggregation.
    • .github/workflows/benchmark.yml — conditional gates, caching, secrets usage, artifact publishing.
    • benchmarks/lib/benchmark_helpers.rb & benchmarks/convert_to_benchmark_json.rb — validation, failure metrics and JSON schema correctness.
    • dummy app bin/prod / Procfile.prod scripts — asset freshness checks and supervisor fallbacks.

Possibly related PRs

  • shakacode/react_on_rails#2153 — updates/unifies RuboCop/config for new performance/benchmark files (actionlint/yamllint touches).
  • shakacode/react_on_rails#2121 — pnpm/Yarn migration and CI tooling changes that overlap with asset/build/CI steps.

Suggested reviewers

  • AbanoubGhadban
  • alexeyr-ci
  • justin808

Poem

🐰
I hopped through scripts at break of day,
Laid k6 tracks where requests may play.
Rails and Node hum, numbers in sight,
Bundles sprint by under moonlight.
A rabbit cheers — benchmarks take flight!

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add performance tests' accurately summarizes the main objective of the pull request, which is to introduce performance testing capabilities via GitHub Actions workflows and benchmarking scripts.
Linked Issues check ✅ Passed The PR implements the requirement from issue #1777 to add performance tests using k6 for the dummy app, delivering benchmarking infrastructure via GitHub Actions workflow, benchmark scripts for Rails and Node Renderer, and helper utilities.
Out of Scope Changes check ✅ Passed All changes are scoped to performance testing infrastructure. Configuration updates (puma.rb, shakapacker.yml, database.yml, environments/production.rb) support the benchmark environment, and documentation/tooling changes prepare the project for CI benchmarking.
Docstring Coverage ✅ Passed Docstring coverage is 90.63% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch alexeyr/performance-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 20 '25 14:10 coderabbitai[bot]

Code Review - PR #1868: Add Performance Tests

Thanks for this contribution! This is a solid implementation of performance testing infrastructure.


✅ Strengths

  1. Comprehensive Tool Coverage: Using three industry-standard tools (Fortio, Vegeta, k6) provides good cross-validation
  2. Well-Structured Workflow: Clear separation of concerns with logical step grouping
  3. Good Error Handling: Proper use of set -euo pipefail and validation steps
  4. Caching Strategy: Smart use of GitHub Actions cache for benchmark tool binaries
  5. Security Conscious: SSH access properly gated with warnings and actor-only access
  6. Flexible Configuration: Parameterized inputs allow customization

🐛 Critical Issues

  1. Node.js Version Mismatch (benchmark.yml:155)

    • Uses Node 22, but PR #1925 documents Node 22 V8 cache crashes
    • Recommend changing to Node 20 until #1925 is resolved
  2. Missing Trailing Newlines (CRITICAL per CLAUDE.md)

    • Files spec/dummy/bin/prod and spec/dummy/bin/prod-assets missing final newlines
    • This will cause CI failures - Run: rake autofix to fix
  3. Race Condition in Server Startup (benchmark.yml:220)

    • Background server might not be fully initialized when curl check succeeds
    • Recommend adding health check endpoint or verifying multiple successful responses
  4. Incomplete PR Checklist

    • Documentation and CHANGELOG boxes unchecked
    • Per CLAUDE.md: Should update CHANGELOG for user-visible features

⚠️ Performance & Quality Issues

  1. bench.sh:66 - Unrealistic iteration count assumes 10 iterations/sec per VU
  2. Fortio Optimization Missing - Referenced guide not implemented
  3. No Resource Monitoring - Missing CPU/memory metrics
  4. Missing Test Validation - No thresholds to detect performance regressions
  5. Single Route Testing - Only tests server_side_hello_world_hooks
  6. Complex Validation Logic - Lines 17-27 hard to read
  7. Inconsistent Quoting - Variables not consistently quoted
  8. Hard-coded Timeouts - 30s and 60s timeouts not configurable

📚 Documentation Issues

  1. Missing Documentation - No README for running benchmarks locally
  2. Workflow Triggers - Runs on ALL PRs (expensive, should use labels/manual only)
  3. No Historical Tracking - Results not compared over time

🎯 Before Merging

  1. Fix Node.js version (22 → 20)
  2. Add trailing newlines
  3. Run bundle exec rubocop
  4. Update CHANGELOG.md
  5. Add performance thresholds
  6. Update workflow triggers

📊 Summary

Assessment: Good foundation but needs critical fixes before merge

Effort: Critical issues ~30 mins, High priority ~1-2 hours

Main concerns: Node.js version mismatch and missing trailing newlines will cause CI failures.

Let me know if you need help with any of these!

claude[bot] avatar Nov 05 '25 12:11 claude[bot]

PR Review: Add Performance Tests (1868)

Summary

This PR adds comprehensive performance testing infrastructure using three industry-standard benchmarking tools (Fortio, Vegeta, and k6). Well-structured implementation addressing issue 1777.

Critical Issues

1. Missing Trailing Newlines

CRITICAL per CLAUDE.md: Files MUST end with newline character or CI will fail:

  • spec/dummy/bin/prod (line 4)
  • spec/dummy/bin/prod-assets (line 9)
  • spec/performance/bench.sh (line 202)

2. RuboCop Required

CRITICAL per CLAUDE.md: Must run bundle exec rubocop and fix ALL violations before commit.

Potential Bugs

K6 JSON Parsing (bench.sh:187-192)

The jq expression uses add which fails on empty arrays. Use add // 0 for safety.

Server Startup Race (benchmark.yml:254)

Uses exit 0 inside loop which exits the entire step. Should use break instead.

npm vs yarn (benchmark.yml:200)

Uses npm install --global yalc but CLAUDE.md requires yarn-only commands.

Code Quality Strengths

  1. Excellent tool selection - three benchmarking tools for cross-validation
  2. Well-structured workflow with clear sections
  3. Good error handling with set -euo pipefail
  4. Comprehensive parameterization via workflow inputs
  5. Smart binary caching for faster CI
  6. Security-conscious SSH access gating

Suggestions

  1. Shell portability: Replace bc with awk for float comparisons
  2. Hardcoded route: Make server_side_hello_world_hooks configurable
  3. Duration: 30s may be too short, consider 60s for stable measurements
  4. Add documentation for local benchmark execution
  5. Consider performance regression detection vs master
  6. Add spec/performance/README.md with interpretation guide

Test Coverage

Good: Infrastructure in place, runs on PR/push, results preserved as artifacts Missing: No bash script tests, no performance thresholds, no local testing docs

Verdict

Solid PR with high code quality. Performance testing infrastructure will be valuable.

Required Before Merge:

  1. Fix trailing newlines (will cause CI failure)
  2. Run bundle exec rubocop and fix violations
  3. Fix K6 empty array bug

Recommended:

  1. Add basic documentation
  2. Verify workflow passes on CI
  3. Consider CHANGELOG entry

Great work overall!

claude[bot] avatar Nov 05 '25 15:11 claude[bot]

PR Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a well-structured addition to the repository. Here's my detailed feedback:


Strengths

  1. Comprehensive Tool Coverage: Great choice using three different load testing tools (Fortio, Vegeta, k6) to get diverse perspectives on performance
  2. Excellent Error Handling: The bash script has robust parameter validation and error handling throughout
  3. Smart Caching: GitHub Actions workflow efficiently caches binaries to speed up future runs
  4. Good Documentation: Clear comments, parameter descriptions, and helpful echo statements
  5. Configurable Parameters: Extensive workflow_dispatch inputs allow flexible testing scenarios
  6. Production-Ready Setup: Proper Puma configuration with workers and preloading for production benchmarks

🐛 Potential Issues

1. Script Portability - bc Dependency (bench.sh:21,33)

The script uses bc for floating-point comparisons but bc is not installed in the workflow.

Fix: Add bc to the apt package installation step in the workflow, or use bash-native arithmetic for integer comparisons.

2. Race Condition in Server Startup (benchmark.yml:248)

The server process is backgrounded but there is no guarantee it has started before the polling loop begins. If the server fails to start immediately, the loop might miss error messages.

Suggestion: Add a brief sleep 2 before the polling loop, or capture the PID and check if the process is still running during the timeout loop.

3. k6 Status Reporting Edge Case (bench.sh:199-204)

If .root_group.checks is empty, add returns null, causing K6_REQS_OTHER calculation to fail.

Fix: Add null handling by changing the jq expression to use | add // 0

4. Missing Newline at End of Files

Per CLAUDE.md requirements: ALWAYS ensure files end with a newline character

Files needing newlines:

  • spec/dummy/bin/prod (line 4)
  • spec/dummy/bin/prod-assets (line 9)
  • spec/performance/bench.sh (line 214)

Fix: Run rake autofix or add trailing newlines manually.

5. Benchmark Workflow Runs on Every PR (benchmark.yml:57-58)

Running 30-second benchmarks with 3 tools on every PR will consume significant CI minutes and slow down PR feedback.

Suggestion:

  • Remove pull_request trigger and only keep workflow_dispatch + push to master
  • Or add a condition to only run on PRs with a specific label (e.g., performance)
  • Or significantly reduce default duration (e.g., 10s) for PR runs

🔍 Code Quality & Best Practices

Security

  • ✅ SSH access is properly gated behind workflow_dispatch input with clear warnings
  • ✅ limit-access-to-actor: true prevents unauthorized access
  • ⚠️ Consider adding a comment warning maintainers never to merge code while SSH session is active

Performance

  • ✅ Binary caching will significantly speed up repeated runs
  • ✅ Server warm-up phase (10 requests) is good practice
  • ⚠️ REQUEST_TIMEOUT=60s is very generous - consider 30s default to catch timeout issues faster

Maintainability

  • ✅ Clear section comments in YAML
  • ✅ Parameterized configuration
  • ⚠️ Consider extracting the jq parsing logic (lines 167-209) into a separate function for readability

📊 Test Coverage

Missing:

  • ❌ No automated tests for the bench.sh script itself
  • ❌ No validation that the summary.txt format is correct
  • ❌ No performance regression detection (just data collection)

Suggestions:

  1. Add a unit test for bench.sh parameter validation
  2. Consider storing benchmark results over time and comparing against baselines
  3. Add a step to post benchmark results as a PR comment for visibility

🔧 Configuration Issues

Puma Configuration (spec/dummy/config/puma.rb:38)

Workflow sets default to 4 workers (WEB_CONCURRENCY: 4), but Puma defaults to 2 if not set. The workflow default should match.

Consistency check needed: Verify that WEB_CONCURRENCY=4 is intentional for CI runners with 2 CPU cores.


📝 Documentation & Changelog

Per CLAUDE.md: Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements)

This PR qualifies as it adds a performance testing feature. Needs:

  • ✅ Changelog entry format: PR 1868 by alexeyr-ci2: Add performance benchmarking workflow with Fortio, Vegeta, and k6.

Missing:

  • ❌ No README/docs update explaining how to run benchmarks locally
  • ❌ No documentation on interpreting benchmark results

🎯 Recommendations

Before Merge (Critical):

  1. ✅ Add trailing newlines to all new files
  2. ✅ Run bundle exec rubocop and fix any violations
  3. ✅ Add CHANGELOG entry
  4. ✅ Fix k6 null handling edge case
  5. ✅ Reconsider pull_request trigger or reduce duration

Nice to Have: 6. Add bc to dependencies or remove floating-point support 7. Add brief README section on running benchmarks 8. Handle server startup race condition 9. Consider performance regression detection


🎓 Overall Assessment

This is a solid, production-ready implementation of performance testing. The code quality is high, with excellent error handling and configurability. The main concerns are:

  1. CI cost - running on every PR may be expensive
  2. Missing documentation - users won't know how to use this
  3. File formatting - needs trailing newlines per repo standards

Recommendation: Request changes for the critical items (newlines, changelog, PR trigger), then approve.


Score: 8/10 - Great implementation, minor issues to address.

claude[bot] avatar Nov 06 '25 10:11 claude[bot]

Pull Request Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a valuable addition to the project. Below is my detailed review:


Strengths

  1. Multi-tool approach: Using Fortio, Vegeta, and K6 provides cross-validation of results and flexibility for different testing scenarios.

  2. Well-structured workflow: The GitHub Actions workflow is logically organized with clear steps and helpful comments.

  3. Configurable parameters: Extensive configurability via workflow inputs allows fine-tuning for different performance scenarios.

  4. Good error handling: The bench.sh script includes comprehensive input validation and error messages.

  5. Caching: Binary caching for tools reduces workflow execution time.

  6. Production mode testing: Tests against production-like configuration (Puma workers, precompiled assets) for realistic results.


🔍 Issues and Concerns

Critical Issues

  1. Missing file newlines (spec/dummy/bin/prod, spec/dummy/bin/prod-assets)

    • Location: spec/dummy/bin/prod:4, spec/dummy/bin/prod-assets:9
    • Impact: CI will fail per CLAUDE.md requirements
    • Fix: Ensure both files end with a newline character
    • Reference: CLAUDE.md states "ALWAYS ensure files end with a newline character"
  2. RuboCop not run

    • Impact: MANDATORY requirement before commits
    • Required action: Run bundle exec rubocop and fix ALL violations
    • Reference: CLAUDE.md: "BEFORE EVERY COMMIT/PUSH: ALWAYS run bundle exec rubocop and fix ALL violations"

High Priority Issues

  1. Workflow runs on every PR/push to master (benchmark.yml:55-58)

    • Impact: Performance tests are resource-intensive and will slow down CI significantly
    • Recommendation: Only run on manual workflow_dispatch, specific labels (e.g., run-benchmarks), or scheduled runs (e.g., nightly)
  2. Server process not properly cleaned up (benchmark.yml:248)

    • Issue: Server started in background with bin/prod & but never explicitly killed
    • Impact: May leave orphaned processes, especially if workflow fails
    • Fix: Add cleanup step or use a trap/signal handler

Medium Priority Issues

  1. Hardcoded route in bench.sh (line 6)

    • Issue: ROUTE:-server_side_hello_world_hooks may not exist or may not be representative
    • Recommendation: Verify this route exists in the dummy app, consider testing multiple routes, and document what this route tests
  2. No baseline comparison

    • Issue: Results are collected but not compared against previous runs
    • Enhancement: Consider using benchmark-action to track performance over time
  3. Missing documentation

    • Issue: No README or documentation for how to run benchmarks locally, interpret results, or understand what "good" performance looks like
    • Recommendation: Add spec/performance/README.md

🔒 Security Considerations

  1. SSH access via tmate (benchmark.yml:108-113)
    • Status: Properly gated behind workflow input (default: false)
    • Good: Includes security warnings and limit-access-to-actor

🧪 Test Coverage

Missing:

  • No unit tests for the bash script logic
  • No tests verifying the benchmark workflow succeeds
  • No validation that summary output format is correct

Recommendations:

  • Add a dry-run mode to bench.sh for testing
  • Consider shellcheck for bash script validation

📝 Suggested Changes

Must Fix (Blocks Merge)

  • [ ] Add trailing newlines to spec/dummy/bin/prod and spec/dummy/bin/prod-assets
  • [ ] Run bundle exec rubocop and fix all violations
  • [ ] Change workflow triggers to avoid running on every PR/push

Should Fix (High Priority)

  • [ ] Add server cleanup/shutdown in workflow
  • [ ] Verify the benchmark route exists and is representative
  • [ ] Add basic documentation (spec/performance/README.md)

Nice to Have

  • [ ] Add baseline comparison/tracking
  • [ ] Add shellcheck to CI for bash scripts
  • [ ] Consider shorter default duration for PR checks
  • [ ] Extract K6 test to separate file

Changelog

Per CLAUDE.md, this should be added to CHANGELOG.md as it's a new feature.


🎯 Overall Assessment

This is a solid foundation for performance testing with good multi-tool coverage and configurability. The main blockers are CLAUDE.md compliance issues (trailing newlines, RuboCop). Once those are fixed and the workflow triggers are adjusted to avoid running on every PR, this will be ready to merge.

Estimated effort to address critical issues: ~30 minutes

Great work overall! 🚀

claude[bot] avatar Nov 07 '25 13:11 claude[bot]

Code Review: Add Performance Tests

Thank you for this comprehensive performance testing addition! This is a well-structured PR that addresses issue #1777.


✅ Strengths

1. Excellent Tool Coverage

  • Multiple benchmarking tools (Fortio, Vegeta, k6) provide good cross-validation
  • Smart caching strategy for tool binaries reduces CI time
  • Flexible configuration via environment variables

2. Robust Error Handling

  • Good use of set -euo pipefail in bash scripts
  • Comprehensive input validation with clear error messages
  • Server health checks with timeout protection

3. Production-Ready Configuration

  • Proper Puma clustering setup for production benchmarks
  • Preloading and worker configuration aligned with best practices
  • Appropriate separation of prod/dev configurations

🔒 Security Concerns - CRITICAL

SSH Access Feature (lines 96-119 in .github/workflows/benchmark.yml)

This poses significant security risks:

Issues:

  1. Detached mode risk: Using detached: true leaves the SSH session open indefinitely
  2. Repository exposure: The entire repository is accessible via SSH
  3. No audit trail: Limited visibility into commands executed during SSH sessions
  4. Secrets exposure risk: If secrets are added later, they could be exposed

Recommendations:

  • Consider removing this feature entirely for production workflows
  • If kept: Remove detached: true, add timeout-minutes: 15, and restrict to non-master branches
  • Document that NO secrets should ever be added to this workflow

🐛 Potential Bugs

1. Division by Zero Risk (spec/performance/bench.sh:195-197)

If K6_REQS_KNOWN_STATUS is null/empty, this could fail. Add validation.

2. Race Condition in Server Startup (spec/dummy/bin/prod)

The rails s command starts but there is no guarantee assets are fully loaded. Consider adding a readiness check that validates asset serving.

3. Missing Error Context

When benchmarks fail, the error message does not indicate which tool failed. Consider wrapping each tool execution with proper error context.

4. Hardcoded Paths (spec/performance/bench.sh:87)

If run from wrong directory, this could create directories in unexpected locations. Consider using SCRIPT_DIR to make paths relative to the script location.


⚡ Performance Considerations

  1. Workflow runs on every push/PR: For a 20-minute benchmark suite, consider only running on workflow_dispatch and push to master, or add path filters to skip documentation changes.

  2. Sequential warmup (spec/performance/bench.sh:73-78): Consider parallel warmup for efficiency.

  3. Triple tool execution: Running all 3 tools provides validation but triples execution time. Consider defaulting to k6 only.


🧪 Test Coverage - Missing

  1. Unit tests for benchmark script: Input validation and JSON parsing logic should be tested
  2. Workflow validation: Consider adding actionlint to CI
  3. Integration test: Should verify benchmark artifacts are generated correctly

📝 Documentation - REQUIRED

Missing per CLAUDE.md requirements:

  1. CHANGELOG.md update - This is a user-visible feature that developers will use. Action needed: Update CHANGELOG.md

  2. Inline documentation: spec/performance/bench.sh needs header comment explaining:

    • Purpose
    • Usage examples
    • Required dependencies
    • Expected output format
  3. Workflow documentation: Add comment block in .github/workflows/benchmark.yml explaining when to run benchmarks and how to interpret results


🎨 Code Quality Suggestions

Good Practices:

  • Consistent error handling
  • Input validation with clear messages
  • Proper use of environment variables with defaults
  • JSON output for machine-readable results

Suggestions:

  1. YAML Anchors: Lines 7-14 define anchors but only use them once - consider simplifying
  2. DRY Principle: Lines 165-208 have very similar blocks for each tool - consider refactoring
  3. Magic Numbers: Several hardcoded values should be configurable (TIMEOUT_SEC=60, warmup count, sleep durations)
  4. Puma Configuration: Default of 2 workers may be too low - document why this was chosen

📋 Action Items Summary

🔴 High Priority (Before Merge):

  1. Address SSH security concerns - Remove or significantly restrict
  2. Add CHANGELOG.md entry (required per project guidelines)
  3. Fix potential null reference in K6 status parsing
  4. Add workflow path filters to prevent unnecessary runs

🟡 Medium Priority (Before Merge):

  1. Add header documentation to bench.sh
  2. Add error context to benchmark failures
  3. Fix hardcoded paths in bash script

🟢 Low Priority (Future Enhancement):

  1. Add unit tests for benchmark script
  2. Refactor duplicate code in parsing logic
  3. Make magic numbers configurable

✨ Overall Assessment

This is a solid foundation for performance testing! The multi-tool approach and comprehensive workflow setup show good engineering practices.

Main concerns:

  1. Security: SSH feature needs significant changes or removal
  2. Documentation: Missing CHANGELOG and inline docs (required per CLAUDE.md)
  3. Robustness: A few edge cases in error handling

Recommendation: Request changes for security and documentation issues, then approve after those are addressed.

Great work on this comprehensive testing infrastructure!

claude[bot] avatar Nov 07 '25 13:11 claude[bot]

Pull Request Review: Add Performance Tests (Part 1/3)

Overview

This PR adds comprehensive benchmarking infrastructure to test React on Rails performance using three industry-standard tools (Fortio, Vegeta, k6). The implementation is well-structured with good error handling and configurability.

✅ Strengths

1. Excellent Tool Selection & Comparison

  • Using three different benchmarking tools (Fortio, Vegeta, k6) provides cross-validation of results
  • Each tool has different strengths, giving a more complete performance picture
  • Smart conditional execution based on TOOLS environment variable

2. Comprehensive GitHub Actions Workflow

  • Well-organized with clear step sections and comments
  • Smart caching strategy for binaries and dependencies
  • Proper error handling with set -e and validation steps
  • 2-hour timeout is reasonable for performance testing
  • Good use of artifacts for result preservation (30-day retention)

3. Robust Benchmark Script (spec/performance/bench.rb)

  • Parameter validation: Validates rate, duration, connections, timeout formats
  • Server availability check: Waits up to 60s with detailed logging
  • Warm-up phase: 10 requests per route before benchmarking (good practice)
  • Auto-route discovery: Extracts routes from Rails app automatically
  • Error handling: Each tool run wrapped in rescue blocks, failures don't stop entire suite
  • Consistent output: Tab-separated summary file for easy parsing

4. Production-Ready Infrastructure

  • bin/prod and bin/prod-assets scripts properly separate asset compilation from server startup
  • Asset freshness checks with helpful warnings
  • Supports both Core and Pro versions
  • Procfile.prod for Pro version includes node-renderer process

claude[bot] avatar Dec 04 '25 16:12 claude[bot]

Pull Request Review: Add Performance Tests (Part 2/3)

⚠️ Issues & Recommendations

1. Documentation: Missing CHANGELOG Entry (REQUIRED per CLAUDE.md)

Per CLAUDE.md guidelines, user-visible changes should be documented in CHANGELOG.md.

Recommendation: Add entry like:

#### [PR 1868](https://github.com/shakacode/react_on_rails/pull/1868) by [alexeyr-ci2](https://github.com/alexeyr-ci2)

Added performance benchmarking infrastructure using Fortio, Vegeta, and k6 tools. New GitHub Actions workflow allows automated and manual performance testing of both Core and Pro versions.

2. Security: SSH Access Feature (Medium Priority)

Location: .github/workflows/benchmark.yml:110-128

The SSH access feature using action-tmate is documented but could be clearer about risks.

Recommendations:

  • ✅ Good: Already requires explicit opt-in via debug_enabled input
  • ✅ Good: Limited to workflow initiator with limit-access-to-actor: true
  • ⚠️ Consider: Add a time limit to SSH sessions (currently detached: true means indefinite)
  • Suggest: Document in PR description when/why developers would use this

3. Resource Management: Server Process Cleanup (Low Priority)

Location: Workflow steps Stop Core/Pro production server

Issue: For Pro version, Procfile.prod also starts node-renderer on port 3800, which may not be cleaned up. If the Rails process spawns child processes, they may not be cleaned up either.

Consider also cleaning up port 3800 for Pro, and killing child processes to ensure complete cleanup.

4. Code Quality: Magic Numbers (Low Priority)

Location: spec/performance/bench.rb:385-388

Consider extracting warm-up constants (10 requests, 0.5s delay) for better documentation.

5. Performance: Workflow Efficiency (Enhancement)

The workflow builds both Core and Pro sequentially. Since they are independent, they could run in parallel jobs for faster CI (trade-off: uses 2x runner minutes but saves wall-clock time).

6. Observability: Missing Baseline Comparison (Enhancement)

Results are uploaded as artifacts but not compared to baseline/previous runs. Future enhancement could add performance regression detection by comparing against stored baselines.

claude[bot] avatar Dec 04 '25 16:12 claude[bot]

Pull Request Review: Add Performance Tests (Part 3/3)

🔍 Detailed Code Review

spec/performance/bench.rb

Positive Patterns:

  • Line 12-15: env_or_default helper properly handles empty strings and 0 from workflow inputs
  • Line 38-57: Input validation functions are thorough with clear error messages
  • Line 59-67: Proper error handling in parse_json_file with specific error types
  • Line 140-150: Server availability check accepts both 2xx and 3xx responses (handles redirects correctly)
  • Line 184-186: Validates RATE=max constraint (CONNECTIONS must equal MAX_CONNECTIONS)

Security Notes:

  • Line 81: Command injection risk is low - app_dir is hardcoded constant, not user input
  • Line 332: k6 script uses string interpolation but target is validated as URI first

🔐 Security Assessment

Workflow Secrets:

  • ✅ SECRET_KEY_BASE: Properly set as dummy value (not a real secret)
  • ✅ REACT_ON_RAILS_PRO_LICENSE: Correctly uses GitHub secrets

Injection Vectors:

  • ✅ All workflow inputs have proper type constraints (number, string, choice)
  • ✅ Bash scripts use proper quoting
  • ✅ No user input flows into eval or unvalidated shell commands

Resource Limits:

  • ✅ Workflow has 120-minute timeout
  • ✅ Request timeout configurable (default 60s)
  • ⚠️ No memory/CPU limits on benchmark tools (may cause OOM on large loads)

🧪 Test Coverage

Missing tests: This PR adds testing infrastructure but doesn't include tests for the infrastructure itself.

Recommendations for future PRs:

  1. Add unit tests for spec/performance/bench.rb validation functions, route parsing logic, JSON parsing
  2. Add integration test that runs benchmark with DURATION=5s to verify workflow

✅ Compliance with CLAUDE.md

  • ✅ Uses pnpm (not npm/yarn)
  • ✅ Ruby code follows project structure
  • ✅ Bash scripts use proper error handling (set -e)
  • Missing CHANGELOG.md entry (REQUIRED)
  • ⚠️ Should add newlines to end of all new files (pre-commit hooks will catch this)

🎯 Final Recommendation

Overall: This is a high-quality PR that adds valuable performance testing infrastructure. The code is well-structured, properly error-handled, and follows good practices.

Blocking Issue: Missing CHANGELOG.md entry (required per CLAUDE.md)

Before Merge:

  1. ✅ Add CHANGELOG.md entry (REQUIRED)
  2. ✅ Run bundle exec rubocop and fix any violations
  3. ✅ Ensure all files end with newlines

Consider for Follow-up PRs:

  1. Add baseline comparison functionality
  2. Optimize workflow to run Core/Pro in parallel
  3. Add tests for benchmark script itself
  4. Document when/how to use SSH debug feature

Approval Status: Ready to merge after addressing CHANGELOG.md requirement. Excellent work! 🚀

claude[bot] avatar Dec 04 '25 16:12 claude[bot]

PR Review: Add Performance Tests

This PR adds comprehensive performance testing infrastructure using three benchmarking tools (Fortio, Vegeta, k6). Overall, this is a high-quality PR with well-structured code.

✅ Strengths

  1. Comprehensive Testing: Uses multiple tools for validation, tests both OSS and Pro versions
  2. Well-Structured Code: Clean separation in bench.rb, good error handling
  3. Production-Ready Workflow: Proper asset precompilation, server validation, caching
  4. Good Developer Experience: Clear parameters, retry logic, detailed output

🔧 Code Quality Observations

Ruby Code (spec/performance/bench.rb)

Minor Security Suggestions:

  1. Line 81 - Command Injection (Low Risk) Consider using Shellwords.escape for app_dir variable

  2. Line 121 - Command Injection (Low Risk) Should escape cmd variable since it comes from TOOLS env var

GitHub Actions Workflow

Suggestions:

  1. Frozen Lockfile Inconsistency (Lines 236 vs 258, 396) Root uses --frozen-lockfile but dummy apps do not. This could lead to non-deterministic builds. Recommend adding it or documenting why omitted.

  2. Timeout (Lines 320, 455): 120 minutes seems generous - could hide hanging processes

  3. Resource Allocation: Have you validated WEB_CONCURRENCY=4 + RAILS_MAX_THREADS=3 on 2-core GitHub runners?

🔒 Security Review: ✅ No Critical Issues

  • Secrets properly handled (only dummy/test values)
  • SSH debug access properly gated and secured
  • Command injection risks are low (inputs mostly controlled)

🧪 Test Coverage

Missing:

  • No RSpec tests for bench.rb itself
  • No validation of reasonable benchmark results

Recommendation: Add smoke tests for parsing logic and input validation

📝 Documentation

Good CLAUDE.md additions. Could also add:

  • How to run benchmarks locally
  • How to interpret results
  • Expected performance baselines

🎯 Specific Files

  • .rubocop.yml: Good approach inheriting from react_on_rails config
  • knip.ts: Correct to ignore system utilities column/tee
  • .gitignore: bench_results correctly placed at root
  • bin/prod scripts: Well-designed with pre-flight checks

🚀 Recommendations

Before Merge:

  1. Add shell escaping for security hardening (low priority)
  2. Document or fix --frozen-lockfile inconsistency
  3. Consider adding tests for bench.rb

Future PRs:

  1. Performance regression detection
  2. Benchmark visualization
  3. RSpec unit tests

✅ Summary

Overall Assessment: High-quality PR with valuable performance testing infrastructure. Well-structured code following project conventions.

Main action items (all minor):

  1. Shell escaping for hardening
  2. Frozen-lockfile documentation
  3. Consider adding tests

Great work on comprehensive benchmarking! 🎉

Recommendation:Approve with minor suggestions

claude[bot] avatar Dec 04 '25 16:12 claude[bot]

Code Review: Performance Testing Infrastructure

Thank you for adding comprehensive performance testing! This PR adds valuable infrastructure for tracking React on Rails performance over time.

Strengths

  1. Comprehensive Benchmarking: Multi-tool comparison (Fortio, Vegeta, k6) provides validation across different methodologies
  2. Well-Structured Workflow: Clear separation of concerns with good caching and cleanup
  3. Production Scripts: bin/prod and bin/prod-assets are well-designed with asset staleness detection

Critical Issues

1. SSH Access Security Risk (Lines 104-130)

The tmate SSH feature could leak REACT_ON_RAILS_PRO_LICENSE secret. Recommend removing it entirely or adding secret-clearing steps.

2. Port Hardcoding

Port 3001 is hardcoded in bench.rb:21, benchmark.yml, and bin/prod scripts. If port is in use, workflow fails mysteriously. Make PORT configurable via environment variable.

3. Server Startup Race Condition (Lines 289-313, 424-448)

The 30-second wait loop does not check if process died. If Puma crashes immediately, it keeps retrying. Add process health check during wait.

4. Missing CHANGELOG

Per PR checklist, CHANGELOG.md needs to be updated with an entry under Unreleased section.

Medium Priority

  • RuboCop Violations: bench.rb has large methods with disabled cops (lines 188-369). Extract helper methods to reduce complexity.
  • Inconsistent Error Handling: Some errors raise, others return failure_metrics(e). Choose one approach and document it.

Testing Concerns

Per CLAUDE.md requirements, there is no evidence of local testing in the PR description. Please document:

  1. Testing Core and Pro apps locally
  2. Testing with different parameter combinations
  3. Results of running bundle exec rubocop

Documentation Gaps

Add usage documentation for:

  • Running benchmarks locally
  • Triggering workflow in CI
  • Parameter descriptions
  • Interpreting results

Action Items Before Merge

Required:

  1. Remove SSH access or add secret-clearing
  2. Fix port hardcoding
  3. Add process health check to server startup
  4. Update CHANGELOG.md
  5. Document local testing results

Recommended: 6. Refactor bench.rb methods 7. Add usage documentation

Overall Assessment

Excellent foundation for performance testing! With security and reliability issues addressed, this will be a solid addition. Estimated effort: 2-3 hours.

Great work! 🚀

claude[bot] avatar Dec 04 '25 16:12 claude[bot]

Code Review: Performance Testing Infrastructure

Thanks for adding performance testing infrastructure! Here is my detailed feedback on PR #1868.

Overall Assessment

Quality: Good - Well-structured benchmarking framework with solid error handling Scope: Appropriate - Addresses issue #1777 with a flexible, configurable system
Testing: Needs improvement - See security and production concerns below

Strengths

  1. Excellent multi-tool approach: Testing with Fortio, Vegeta, and k6 provides good coverage
  2. Well-documented workflow: Clear step separation and helpful comments
  3. Flexible configuration: Good use of environment variables with sensible defaults
  4. Error handling: Solid error handling with descriptive messages
  5. Auto-detection of routes: Smart feature to discover benchmark routes automatically
  6. Proper warmup: 10-request warmup prevents cold-start skew

Critical Issues

1. Security: SSH Access in CI (HIGH PRIORITY)

Location: .github/workflows/benchmark.yml:105-130

The SSH debug access feature is a significant security risk. SSH sessions can remain open indefinitely and expose the GitHub Actions runner.

Recommendations:

  • Add automatic timeout (30 minutes max)
  • Consider removing if not actively needed
  • Add more prominent warnings

2. Command Injection Vulnerability (HIGH PRIORITY)

Location: spec/performance/bench.rb:81,263,269

Shell commands use string interpolation without sanitization. If ROUTES env var contains shell metacharacters, command injection is possible.

Fix: Use Shellwords.escape for user inputs

3. Inconsistent Lock File Handling

Location: react_on_rails/spec/dummy/bin/prod:24-26 and react_on_rails_pro/spec/dummy/bin/prod:40

This project uses pnpm, not yarn (per CLAUDE.md). Scripts check for yarn.lock but should check pnpm-lock.yaml.

Documentation Needs

Per CLAUDE.md requirements:

  1. Update CHANGELOG.md with entry for PR 1868
  2. Document the benchmark workflow
  3. Update CLAUDE.md to mention spec/performance/ directory

Pre-merge Checklist

  • Fix command injection vulnerability (use Shellwords.escape)
  • Fix yarn.lock references to pnpm-lock.yaml
  • Add security warnings or remove SSH access feature
  • Run bundle exec rubocop and fix violations
  • Update CHANGELOG.md
  • Add benchmark documentation
  • Add bench_results/ to RuboCop exclusions

Summary

Solid foundation for performance testing! Main concerns:

  1. Security: Command injection needs fixing before merge
  2. Consistency: Fix pnpm references
  3. Documentation: Add changelog entry and usage docs

Once security issue is addressed, this will be a valuable addition. Great work on the comprehensive workflow!

claude[bot] avatar Dec 04 '25 17:12 claude[bot]

size-limit report 📦

Path Size
react-on-rails/client bundled (gzip) 62.5 KB (0%)
react-on-rails/client bundled (gzip) (time) 62.5 KB (0%)
react-on-rails/client bundled (brotli) 53.66 KB (0%)
react-on-rails/client bundled (brotli) (time) 53.66 KB (0%)
react-on-rails-pro/client bundled (gzip) 63.45 KB (0%)
react-on-rails-pro/client bundled (gzip) (time) 63.45 KB (0%)
react-on-rails-pro/client bundled (brotli) 54.53 KB (0%)
react-on-rails-pro/client bundled (brotli) (time) 54.53 KB (0%)
registerServerComponent/client bundled (gzip) 127.35 KB (0%)
registerServerComponent/client bundled (gzip) (time) 127.35 KB (0%)
registerServerComponent/client bundled (brotli) 61.51 KB (+0.2% 🔺)
registerServerComponent/client bundled (brotli) (time) 61.39 KB (0%)
wrapServerComponentRenderer/client bundled (gzip) 121.96 KB (0%)
wrapServerComponentRenderer/client bundled (gzip) (time) 121.96 KB (0%)
wrapServerComponentRenderer/client bundled (brotli) 56.66 KB (0%)
wrapServerComponentRenderer/client bundled (brotli) (time) 56.66 KB (0%)

github-actions[bot] avatar Dec 08 '25 15:12 github-actions[bot]

Code Review - Performance Testing Infrastructure

I've reviewed PR #1868 which adds comprehensive performance testing infrastructure. This is a substantial and well-structured addition to the project. Here's my detailed feedback:

Strengths

1. Excellent Benchmark Implementation (spec/performance/bench.rb)

  • Robust parameter validation: Input validation for RATE, CONNECTIONS, DURATION is thorough and provides clear error messages
  • Multi-tool support: Supporting Fortio, Vegeta, and k6 gives flexibility and cross-validation of results
  • Smart route discovery: Auto-detecting routes from Rails with proper filtering (pages# and react_router# controllers only)
  • Required parameter handling: Correctly skips routes with required parameters like /rsc_payload/:component_name
  • Good error handling: Each tool's execution is wrapped in error handling with clear failure metrics
  • Server warmup: 10 requests per route before benchmarking prevents cold-start bias

2. Well-Designed GitHub Workflow (.github/workflows/benchmark.yml)

  • Comprehensive configurability: All key parameters (rate, duration, connections, tools) are configurable via workflow_dispatch
  • Proper caching: Binary caching for Fortio, Vegeta, and gem caching reduces CI time
  • Security-conscious SSH access: The debug_enabled input with clear warnings is the right approach
  • Both core and pro support: Conditional execution based on app_version input is clean
  • Artifact retention: 30-day retention for benchmark results enables historical analysis
  • Validation steps: Verifying summary.txt exists and displaying results before upload

3. Production Scripts (bin/prod, bin/prod-assets)

  • Asset validation: Checking for public/assets and manifest.json before starting server is crucial
  • Staleness warnings: Warning when source files are newer than compiled assets helps catch mistakes
  • Foreman/Overmind support: Graceful fallback from overmind → foreman → gem install foreman

📝 Issues & Recommendations

CRITICAL: RuboCop Configuration

  • Issue: Added root-level .rubocop.yml that excludes spec/performance/**/*
  • Problem: According to CLAUDE.md, "BEFORE EVERY COMMIT: ALWAYS run bundle exec rubocop and fix ALL violations"
  • Action required: The new spec/performance/bench.rb should pass RuboCop. Please run:
    bundle exec rubocop spec/performance/bench.rb
    
    And fix any violations. Only add exclusions if there's a valid reason (document it in comments).

Documentation Missing

From the PR checklist:

  • ❌ Update documentation - No user-facing docs added
  • ❌ Update CHANGELOG file - This is a significant feature addition

Recommendations:

  1. Add a section to README or create BENCHMARKING.md covering:
    • How to run benchmarks locally
    • How to interpret results
    • How to use the GitHub Actions workflow
    • Available configuration options
  2. Update appropriate CHANGELOG (/CHANGELOG.md or /CHANGELOG_PRO.md):
    #### Developer Experience
    - Add performance benchmarking infrastructure with Fortio, Vegeta, and k6 support. [PR 1868](https://github.com/shakacode/react_on_rails/pull/1868) by [alexeyr-ci2](https://github.com/alexeyr-ci2)
    

Testing Concerns

Per CLAUDE.md's CRITICAL LOCAL TESTING REQUIREMENTS:

  • ⚠️ Have you tested the benchmark.yml workflow end-to-end?
    • Can you confirm the Core workflow completes successfully?
    • Can you confirm the Pro workflow completes successfully?
    • Have you verified the uploaded artifacts contain expected files?

If not tested locally (which is understandable for complex CI workflows), please clearly state:

  • "UNTESTED - requires GitHub Actions environment"
  • Provide a plan to verify after merge (e.g., run workflow_dispatch manually)

Code Quality Issues

  1. spec/performance/bench.rb:220 - Multiple RuboCop disables:

    # rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity, Metrics/MethodLength
    

    While sometimes necessary, consider refactoring:

    • Extract common code from run_fortio_benchmark, run_vegeta_benchmark, run_k6_benchmark
    • Create a BenchmarkTool class with a common interface
    • Move percentile extraction logic to a helper method
  2. Hard-coded port 3001:

    • Multiple locations: bench.rb:21, bin/prod:29, benchmark.yml:303, etc.
    • Consider: PORT = env_or_default('PORT', 3001) for flexibility
  3. bench.rb:106 - Shell command without escaping:

    routes_output = `cd #{app_dir} && bundle exec rails routes 2>&1`
    

    While app_dir is controlled internally, consider:

    routes_output = `cd #{Shellwords.escape(app_dir)} && bundle exec rails routes 2>&1`
    
  4. benchmark.yml:194 - Hard-coded Ruby version:

    ruby-version: '3.3.7'
    

    Should this use a matrix strategy or at least match the versions in other workflows?

Minor Issues

  1. benchmark.yml:242-258 - TODO comments:

    # TODO only needed while we use --ignore-workspace below
    # TODO simplify this both here and in integration tests
    

    Create follow-up issues to track these improvements

  2. Deleted file: packages/react-on-rails-pro-node-renderer/script/preinstall.js

    • Appears unrelated to this PR's purpose
    • Should this be in a separate PR? If intentional, explain in PR description
  3. Path updates in documentation files:

    • Many .claude/ and analysis/ files have yarn→pnpm replacements
    • These seem like cleanup from a previous migration
    • Consider separating documentation updates from feature PRs for easier review

🔒 Security

No security concerns identified:

  • ✅ SSH access properly gated behind workflow_dispatch input
  • ✅ No secrets exposed in benchmark results
  • ✅ Tools are from official releases with pinned versions

Performance Considerations

  1. Workflow caching is excellent:

    • Binary caching saves ~30s per run
    • Gem caching saves ~2-3 minutes
    • Node modules caching saves ~1-2 minutes
  2. Benchmark duration defaults:

    • 30s default might be short for stable results
    • Consider documenting: "Use 60s+ for production comparisons"
  3. Parallel tool execution:

    • Currently runs tools sequentially (Fortio → Vegeta → k6)
    • Could parallelize across routes or tools, but sequential is safer for accurate results

🎯 Final Recommendations

Before merge:

  1. ✅ Run bundle exec rubocop spec/performance/bench.rb and fix violations
  2. ✅ Add documentation (BENCHMARKING.md or README section)
  3. ✅ Update CHANGELOG.md with this feature
  4. ✅ Verify workflow runs successfully (at least once via workflow_dispatch)
  5. ✅ Address or create issues for TODO comments

Nice to have:

  • Refactor benchmark tool functions to reduce complexity
  • Make PORT configurable
  • Add Shellwords escaping for shell commands
  • Separate unrelated changes (preinstall.js deletion, doc updates) into their own PRs

Overall, this is excellent work - the implementation is solid, well-tested, and adds significant value. The issues are mostly about following project conventions (RuboCop, CHANGELOG) rather than fundamental problems. Great job! 🎉

claude[bot] avatar Dec 10 '25 01:12 claude[bot]

Pull Request Review: Add Performance Tests

Thanks for adding performance testing infrastructure! This is a valuable addition to the project.

✅ Strengths

Well-Designed Benchmark Suite:

  • Multiple tools (Fortio, Vegeta, k6) provide cross-validation of results
  • Comprehensive metrics: RPS, p50/p90/p99 latencies, and status codes
  • Flexible configuration via environment variables
  • Smart route auto-detection from Rails routes
  • Proper warm-up phase before benchmarking

Robust CI Integration:

  • Conditional execution (master, workflow_dispatch, or labels)
  • Artifact upload with 30-day retention
  • Smart caching of benchmark tools
  • Tests both Core and Pro versions

Good Operational Practices:

  • Server health checks with retry logic
  • Proper error handling and validation
  • Clear logging with indicators
  • Graceful cleanup

⚠️ Issues to Address

1. Security: Command Injection Vulnerabilities 🔴 HIGH PRIORITY

Location: spec/performance/bench.rb:106, 251, 301-302, 373

The script uses string interpolation in shell commands without proper sanitization. While ROUTES input is validated, route values aren't sanitized before use in shell commands.

Recommendation: Use array form of system() or Shellwords.escape for filenames.

2. FIXME Comments in Production Code 🟡 MEDIUM PRIORITY

Location: .github/workflows/benchmark.yml:79, 89

FIXMEs indicate test configuration (only root route, only Fortio) rather than production defaults. Should remove FIXMEs and set proper defaults or create follow-up issue.

3. Missing Changelog Entry 📋

Per CLAUDE.md guidelines, user-visible changes should be documented. This is a significant addition for developers.

4. Aggressive Process Killing ⚠️

Location: .github/workflows/benchmark.yml:374, 523

Kills ALL Ruby/Node processes which could affect other jobs in shared CI. Should store PID and kill specifically.

5. Port Hardcoding 🔧

Workflow hardcodes port 3001 in health checks despite configurable BASE_URL.

6. Validation Order Issue

RATE=max validation happens AFTER route detection. Move all validation to top.

💡 Suggestions

  • Parallel warmup for better performance
  • Metrics comparison logic across tools
  • Usage documentation and result interpretation guide
  • Dry run mode for quick validation
  • Unit tests for validation and parsing logic

🎯 Summary

Solid foundation for performance testing. Main concerns:

  1. MUST FIX: Command injection security issues
  2. SHOULD FIX: FIXME comments and test configuration
  3. NICE TO HAVE: Changelog, better process management, docs

Overall: Approve with required changes

claude[bot] avatar Dec 10 '25 11:12 claude[bot]

Pull Request Review: Performance Testing Infrastructure

I've reviewed PR #1868 which adds comprehensive performance testing infrastructure.

Overall Assessment: STRONG - Well-designed with some important considerations

This PR adds valuable performance benchmarking infrastructure using industry-standard tools (Fortio, Vegeta, k6). The implementation is thoughtful, configurable, and follows good practices.

Strengths

Excellent Architecture & Design:

  • Multi-tool approach: Using 3 different benchmarking tools provides cross-validation of results
  • Configurable parameters: Environment variables allow flexible CI/CD integration
  • Max-rate strategy: Smart choice for library benchmarking (documented in docs/planning/library-benchmarking.md)
  • Auto-discovery: Automatically detects routes from Rails app - brilliant!
  • Production mode testing: Tests with precompiled assets, which is the right approach

Robust Implementation:

  • Proper error handling: bench.rb validates inputs, checks for missing tools, and provides clear error messages
  • Server health checks: Waits for server readiness before benchmarking (60s timeout with detailed logging)
  • Warmup phase: 10 requests per route before benchmarking - essential for accurate results
  • Route filtering: Skips routes with required params and test-only routes
  • CI caching: Aggressive caching of dependencies, gems, and benchmark tools

Good DevEx:

  • Clear documentation: library-benchmarking.md explains the max-rate vs fixed-rate trade-offs
  • Readable output: Tab-separated summary with column formatting
  • Artifact uploads: 30-day retention of benchmark results
  • SSH debugging: Optional tmate access for CI debugging (properly gated)

Security Conscious:

  • SSH access requires explicit opt-in via debug_enabled input
  • Clear warning messages about security risks
  • limit-access-to-actor restricts access appropriately

claude[bot] avatar Dec 12 '25 17:12 claude[bot]

Issues & Concerns

CRITICAL: Incomplete Pull Request

BLOCKER - Must address before merge:

  1. Pull request checklist not completed:

    • Update documentation - marked incomplete
    • Update CHANGELOG file - marked incomplete
  2. Missing changelog entries:

    • This is a significant user-facing feature (benchmark workflow)
    • Should have entries in CHANGELOG.md documenting the new benchmark.yml workflow
    • How to run benchmarks via workflow_dispatch inputs
    • How to interpret results

HIGH PRIORITY: Testing Gaps

  1. No local testing evidence:

    • Per CLAUDE.md CRITICAL LOCAL TESTING REQUIREMENTS: NEVER claim a test is fixed without running it locally first
    • Has this workflow been tested locally with act, or in a real CI run?
    • Have all 3 benchmark tools been verified to work correctly?
    • Has the Pro app benchmarking been tested with PRO=true?
  2. No validation of benchmark results:

    • The workflow validates that summary.txt exists, but doesn't validate the data quality
    • Should check for: non-zero RPS, reasonable latencies, no all-failures
    • Consider adding basic sanity checks (e.g., RPS > 1, p99 < 10000ms)
  3. Missing error scenarios:

    • What happens if all routes have required params? (would fail with No routes to benchmark)
    • What if server crashes during benchmark?
    • What if all 3 tools fail?

claude[bot] avatar Dec 12 '25 17:12 claude[bot]

MEDIUM PRIORITY: Code Quality

  1. Magic numbers and unclear defaults (benchmarks/bench.rb:410):

    • 10 warmup requests and 0.5s delay should be constants or ENV vars
    • Consider: WARMUP_REQUESTS and WARMUP_DELAY_SEC as env vars
  2. Hardcoded port in Core app (.github/workflows/benchmark.yml:306):

    • Port 3001 is hardcoded but should match server startup
    • Consider extracting to an env var
  3. Inconsistent tool result handling:

    • Fortio/Vegeta/k6 methods return nil if tool not enabled
    • Then if metrics check before adding to summary
    • Could be cleaner with early returns or wrapper method
  4. RuboCop disable without explanation (benchmarks/bench.rb:226):

    • Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity disabled
    • These are legitimate disables for benchmark parsing methods, but complex methods warrant refactoring consideration

MEDIUM PRIORITY: CI/CD Concerns

  1. Benchmark workflow triggers:

    • Runs on ALL pushes to master (excluding docs)
    • Runs on ALL PR sync/reopen events if labeled
    • Question: Is this too aggressive? Benchmarks can be slow (2+ min per route × tools)
    • Consider: Only run on master, or require manual trigger/label for PRs
  2. Port cleanup race condition (.github/workflows/benchmark.yml:368-383):

    • pkill -9 is aggressive (SIGKILL)
    • Consider graceful shutdown first: pkill -TERM puma && sleep 2 && pkill -9 puma
  3. Missing workflow timeout:

    • Individual steps have timeout-minutes: 120
    • But no job-level timeout
    • Consider adding: timeout-minutes: 180 at job level
  4. Artifact naming collision:

    • Uses github.run_number which could collide if multiple PRs run simultaneously
    • Consider: github.run_id instead

claude[bot] avatar Dec 12 '25 17:12 claude[bot]

Specific File Feedback

benchmarks/bench.rb:

  • Lines 59-67: Excellent error handling with descriptive messages ✅
  • Lines 79-136: Route discovery is clever! Consider extracting to a method for testability
  • Lines 203-210: Love the detailed attempt logging with timing ✅
  • Lines 228-279: Fortio implementation solid, good use of failure_metrics helper

.github/workflows/benchmark.yml:

  • Lines 71-88: Smart env var defaults using ternary operators ✅
  • Lines 142-193: Excellent tool caching strategy ✅
  • Lines 278-290: Good asset validation before server start ✅
  • Lines 521-537: Nice workflow summary! ✅

benchmarks/k6.ts:

  • Lines 21-35: Good error handling for missing TARGET_URL ✅
  • Lines 38-56: Smart scenario switching based on rate mode ✅
  • Lines 70-76: Status code checks are comprehensive ✅

react_on_rails/spec/dummy/bin/prod & react_on_rails_pro/spec/dummy/bin/prod:

  • Asset validation: Excellent checks for manifest.json and public/assets ✅
  • Staleness warnings: Great UX for developers! ✅
  • Different approaches: Core uses Rails directly, Pro uses foreman/overmind - is this intentional?

claude[bot] avatar Dec 12 '25 17:12 claude[bot]

Recommendations

Before Merging (MUST DO):

  1. Complete the PR checklist - Add CHANGELOG.md entry documenting the new benchmark workflow and update documentation
  2. Verify local testing - Run the workflow manually via workflow_dispatch or document tested workflow run number
  3. Add basic sanity checks - Validate benchmark results have reasonable values and exit with error if all routes fail

Consider (SHOULD DO):

  1. Extract magic numbers to constants/env vars (warmup count, delays)
  2. Add job-level timeout to workflow
  3. Use graceful shutdown before SIGKILL
  4. Improve artifact naming to avoid collisions using github.run_id
  5. Document when to worry threshold for regressions

Future Enhancements (COULD DO):

  1. Regression detection comparing against baseline from master
  2. Multiple samples - Run each route 2-3 times, average results
  3. Performance dashboard uploading results to time-series DB
  4. Alert on regressions via PR comments if performance drops

Security Review

No security concerns - SSH debugging is properly gated and documented


Verdict

Status: CHANGES REQUESTED

This is excellent work that adds valuable infrastructure! The implementation is solid and well-thought-out. However, it needs:

  1. Completed PR checklist (CHANGELOG + docs)
  2. Evidence of testing (workflow run or local validation)
  3. Basic result validation

Once these are addressed, this will be ready to merge. Great job on the comprehensive approach!

Estimated review time: 45 minutes Confidence level: High (reviewed all code, workflows, docs, and configs)

claude[bot] avatar Dec 12 '25 17:12 claude[bot]

Code Review for PR #1868: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This addresses issue #1777 excellently.

✅ Strengths

1. Comprehensive Testing Setup

  • Multi-tool approach (Fortio, Vegeta, k6) validates results
  • Well-documented max-rate benchmarking decision
  • Smart auto-detection of routes from Rails
  • Good handling of optional vs required route parameters

2. Robust Implementation

  • Input validation for all parameters
  • Server readiness checks with timeout
  • 10-request warmup per route
  • Artifact preservation (30-day retention)

3. CI/CD Integration

  • Well-structured workflow with clear steps
  • Conditional execution (labels, workflow_dispatch, master)
  • Good caching strategy
  • Proper process cleanup

4. Developer Experience

  • SSH debugging (properly secured)
  • Configurable via workflow inputs
  • Clear error messages
  • bin/prod and bin/prod-assets scripts

⚠️ Issues

1. Port Cleanup (MEDIUM) - .github/workflows/benchmark.yml:368

  • Uses pkill -9 (SIGKILL) - no graceful shutdown
  • Recommend: Try pkill -15 first, then -9 after delay

2. Hardcoded Port 3001 (LOW)

  • Multiple files hardcode port 3001
  • Recommend: Make configurable via PORT env var

3. Magic Numbers (LOW) - benchmarks/bench.rb:410,412

  • Warmup count (10) and delay (0.5) not constants
  • Recommend: Extract to WARMUP_REQUESTS, WARMUP_DELAY_SEC

4. Documentation Reference (LOW) - CLAUDE.md:215

  • References spec/dummy/TESTING_LOCALLY.md
  • Recommend: Verify exists or create it

📋 Required Before Merge (P0)

Per CLAUDE.md requirements:

  1. ✅ Run bundle exec rubocop and fix all violations
  2. ✅ Ensure all files end with newline characters
  3. 📝 Update CHANGELOG.md with this feature

🎉 Conclusion

Excellent implementation! Address the P0 items (especially RuboCop and CHANGELOG per CLAUDE.md) and it's ready to merge.

Great work! 🚀


Review by: Claude Code Date: 2025-12-12 19:08 UTC

claude[bot] avatar Dec 12 '25 19:12 claude[bot]

Code Review: Performance Testing Infrastructure

This PR adds comprehensive performance testing infrastructure. Overall well-structured.

✅ Strengths

  • Excellent architecture with clean separation of concerns
  • Comprehensive tooling (k6, Fortio, Vegeta)
  • Robust error handling and validation
  • Production-ready scripts with staleness checks

⚠️ Must Fix Before Merge

1. CRITICAL: Missing CHANGELOG Entry

Per CLAUDE.md, user-visible changes must be documented. Add entry to CHANGELOG.md

2. RuboCop Compliance Required

Per CLAUDE.md: Run bundle exec rubocop benchmarks/ and fix ALL violations

3. Documentation Needed

Add benchmarks/README.md explaining how to run locally and interpret results

Should Fix (Nice to Have)

  • Add input validation in k6.ts for numeric parameters
  • Consider if SSH access in workflow is truly necessary

Rating: 8.5/10

Would be 9.5/10 with CHANGELOG and docs. Great infrastructure code!

claude[bot] avatar Dec 14 '25 09:12 claude[bot]

Code Review: Add Performance Tests

Thank you for this comprehensive benchmarking implementation! This is a valuable addition to the project. Below is my detailed review covering code quality, potential issues, security, and best practices.


Overall Assessment

This PR successfully adds performance testing infrastructure with k6 and Vegeta. The implementation is well-structured, follows good practices, and integrates cleanly with the existing CI infrastructure.


Strengths

1. Architecture & Design

  • Modular structure: Shared helpers in benchmark_helpers.rb promote code reuse
  • Flexible configuration: Environment variables allow easy customization without code changes
  • Route auto-detection: Smart filtering of Rails routes eliminates manual maintenance
  • Multi-tool approach: k6 for HTTP/1.1 benchmarks, Vegeta for HTTP/2 Node Renderer tests

2. GitHub Actions Workflow

  • Conditional execution: Smart use of labels (benchmark, full-ci) prevents unnecessary runs
  • Efficient caching: Caches binaries (Vegeta), gems (foreman), and node_modules
  • Comprehensive validation: Checks assets are built before server starts
  • Artifact retention: Stores results for 30 days with proper naming
  • Clean error handling: set -e and explicit error messages throughout

3. Code Quality

  • Input validation: All parameters validated before use (rate, duration, connections)
  • Graceful degradation: Helpful error messages when tools/files missing
  • Documentation: Clear comments explain configuration and trade-offs

Issues & Recommendations

1. CHANGELOG Update Needed (REQUIRED)

Per CLAUDE.md guidelines, this PR adds a user-visible feature and should have a CHANGELOG entry.

Recommendation: Add to CHANGELOG.md:

[Unreleased]

Added
  • PR 1868 by alexeyr-ci2: Added performance benchmarking infrastructure with k6 and Vegeta, including GitHub Actions workflow for automated benchmarks

2. Security: Process Killing is Too Aggressive (HIGH PRIORITY)

Location: .github/workflows/benchmark.yml:310, :476

Current code: pkill -9 -f "ruby|node|foreman|overmind|puma"

This kills ALL Ruby and Node processes on the CI runner. While this works in isolated CI, it's fragile and could cause issues if GitHub changes runner isolation.

Recommendation: Use more targeted cleanup by killing only processes on port 3001 or saving PIDs when starting servers.

3. Hardcoded Password in Benchmark (MEDIUM)

Location: benchmarks/bench-node-renderer.rb:22-28

Parsing password from config with regex is fragile. Consider using environment variable instead for better maintainability.

4. RuboCop Violations (MEDIUM)

The code disables several RuboCop cops (Metrics/AbcSize, Metrics/CyclomaticComplexity). Consider refactoring complex methods into smaller functions for better maintainability and testability.

5. Missing Unit Tests

No unit tests for benchmark helper functions. Consider adding RSpec tests for:

  • validate_rate, validate_duration, validate_positive_integer
  • route_has_required_params?
  • sanitize_route_name
  • parse_json_file error handling

Security Review

Secrets Handling: Good

  • REACT_ON_RAILS_PRO_LICENSE_V2 properly uses GitHub Secrets
  • SECRET_KEY_BASE is a dummy value (not used for anything sensitive)

Command Injection: Safe

  • All user inputs are validated before use in shell commands
  • No direct interpolation of user input into shell commands

Resource Limits: Could Be Better

  • Benchmarks run with timeout-minutes: 120 - quite generous
  • Consider lowering to 60 minutes to prevent hung jobs

Action Items Summary

Must Fix (Before Merge):

  1. Add CHANGELOG entry for this user-visible feature
  2. Consider more targeted process cleanup (or document why aggressive approach is safe)

Should Fix (Nice to Have):

  1. Refactor complex methods to avoid disabling RuboCop cops
  2. Add unit tests for benchmark helper functions
  3. Extract password to environment variable instead of parsing config

Could Fix (Future Work):

  1. Add dry-run mode for testing
  2. Create docs/guides/running-benchmarks.md
  3. Extract server wait logic to reusable script

Conclusion

This is a solid, well-implemented PR that adds valuable performance testing infrastructure. The code quality is high, the GitHub Actions workflow is comprehensive, and the architecture is flexible.

The main concerns are:

  1. Missing CHANGELOG entry (per project guidelines)
  2. Aggressive process killing (works but fragile)
  3. Some methods could be refactored for better maintainability

With the CHANGELOG entry added, this PR is ready to merge. The other suggestions are nice-to-haves that can be addressed in follow-up PRs.

Great work!


Review generated by Claude Code based on project guidelines in CLAUDE.md

claude[bot] avatar Dec 14 '25 13:12 claude[bot]

Pull Request Review: Performance Testing Infrastructure

This PR adds a comprehensive performance testing framework to React on Rails.

STRENGTHS

  1. Well-Architected Benchmark Infrastructure
  • Modular design: Separate scripts for Rails benchmarks and Node Renderer benchmarks
  • Shared utilities: Clean extraction of common functionality
  • Multiple tools: k6 for Rails, Vegeta with HTTP/2 for Node Renderer
  • Good documentation explaining the max-rate approach
  1. Robust Error Handling
  • Comprehensive validation of all parameters
  • Tool availability checks before running
  • Server health checks with timeout/retry logic
  • Graceful error handling with clear messages
  1. CI/CD Integration
  • Workflow triggers only when needed
  • Efficient caching of dependencies
  • Proper artifact retention (30 days)
  • Clean separation between Core and Pro benchmarks

ISSUES AND CONCERNS

Security:

  1. Dangerous process killing (lines 314, 487) pkill -9 -f "ruby|node|foreman|overmind|puma"

    • Broad regex pattern could kill unintended processes
    • Recommendation: Use PID files for safer process management
  2. Command injection vulnerability (bench.rb:58)

    • Backtick execution with interpolated variable without escaping
    • Current Risk: Low (APP_DIR is hardcoded), but dangerous pattern
    • Recommendation: Use Shellwords.escape
  3. Secrets handling (bench-node-renderer.rb:32)

    • Password read from config file
    • Ensure it is test-only and not in artifacts

Code Quality:

  1. RuboCop violations not addressed

    • Multiple rubocop:disable comments for complexity metrics
    • Recommendation: Refactor methods to reduce complexity
  2. Magic numbers (bench.rb:211)

    • Hardcoded values like 10 warmup requests, 0.5s delay
    • Recommendation: Extract to named constants

Testing and Documentation:

  1. Missing CHANGELOG entry

    • This is a significant new feature that needs changelog documentation per CLAUDE.md
  2. No RSpec tests for benchmark helpers

    • Complex validation logic in benchmark_helpers.rb lacks unit tests
  3. Documentation improvements needed

    • Missing local usage guide, result interpretation, and troubleshooting docs

RECOMMENDATIONS SUMMARY

Must Fix Before Merge:

  1. Add CHANGELOG entry
  2. Fix command injection vulnerability
  3. Improve process killing to use PID files

Should Fix: 4. Refactor RuboCop-disabled methods 5. Add RSpec tests for benchmark_helpers.rb 6. Create comprehensive documentation

OVERALL ASSESSMENT

This is a high-quality addition with sound architecture and thorough implementation.

Code Quality: 4/5 Security: Minor issues that should be fixed Testing: No unit tests for helpers Documentation: 3/5 - Good inline docs, missing user-facing guide

Recommendation: Approve with minor changes. Security issues are low-risk but should be fixed.

Great work! This benchmark infrastructure will be invaluable for catching performance regressions.

claude[bot] avatar Dec 15 '25 11:12 claude[bot]

@coderabbitai review

alexeyr-ci2 avatar Dec 15 '25 13:12 alexeyr-ci2

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai[bot] avatar Dec 15 '25 13:12 coderabbitai[bot]

PR Review: Add Performance Tests

Overview

This PR adds a comprehensive benchmarking framework to React on Rails, including:

  • Ruby-based benchmark scripts for Rails routes and Node Renderer
  • k6 (HTTP benchmarking tool) integration
  • Vegeta integration (HTTP/2 support for Node Renderer)
  • GitHub Actions workflow for automated CI benchmarking
  • Production server setup scripts for dummy apps

✅ Strengths

  1. Well-structured benchmark architecture:

    • Shared helpers in benchmark_helpers.rb promote code reuse
    • Clear separation between Rails route benchmarks and Node Renderer benchmarks
    • Flexible configuration via environment variables
  2. Robust error handling:

    • Input validation for all parameters (rate, duration, connections)
    • Proper timeout handling in server health checks
    • JSON parsing errors are caught and reported clearly
  3. Good CI/CD integration:

    • Conditional workflow execution based on labels (benchmark, full-ci)
    • Proper artifact retention (30 days)
    • Integration with benchmark-action/github-action-benchmark for tracking trends
  4. Production-ready configuration:

    • Proper Puma worker configuration with preload_app!
    • ActiveRecord connection management in on_worker_boot
    • Asset precompilation setup
  5. CHANGELOG updated: Properly categorized as "Developer (Contributors Only)"

⚠️ Issues & Recommendations

Critical

  1. Security: SSH access in workflow (removed in later commits?)

    • Early commits showed SSH access via mxschmitt/action-tmate
    • Verify this was removed or properly gated in final version
    • If present, ensure it's only for workflow_dispatch with explicit opt-in
  2. Resource cleanup may be insufficient (line 353 in benchmark.yml):

    pkill -9 -f "ruby|node|foreman|overmind|puma" || true
    
    • This is VERY aggressive and could kill unrelated processes
    • In CI it's safe (isolated environment), but worth a comment explaining why
    • Consider more targeted killing using process group IDs
  3. Missing RuboCop validation:

    • Per CLAUDE.md: "ALWAYS run bundle exec rubocop and fix ALL violations"
    • Ruby files in benchmarks/ should pass RuboCop
    • Some files have rubocop:disable comments - verify these are justified

High Priority

  1. Hardcoded passwords (bench-node-renderer.rb:32):

    PASSWORD = read_password_from_config
    
    • Reads password from client/node-renderer.js
    • Document that this is for LOCAL/CI testing only, not production
    • Consider using a well-known test password constant instead
  2. URL encoding vulnerability (bench-node-renderer.rb:133):

    def url_encode(str)
      URI.encode_www_form_component(str)
    end
    
    • This is correct, but the calling code concatenates strings instead of using form encoding
    • Lines 145-148 manually build the body - this works but is fragile
    • Consider using URI.encode_www_form with hash input instead
  3. Shell injection risk (bench.rb:58):

    routes_output = `cd #{Shellwords.escape(app_dir)} && bundle exec rails routes 2>&1`
    
    • Good: Uses Shellwords.escape
    • Bad: app_dir comes from a constant based on PRO env var
    • If PRO env var is ever user-controlled, this is exploitable
    • Since it's CI-only, probably fine, but worth documenting
  4. No timeout on system calls:

    • Line 161 in bench.rb: system("#{k6_command} | tee #{k6_txt}")
    • Uses timeout-minutes: 120 in CI, but no timeout in Ruby
    • k6/Vegeta have their own timeouts, so probably OK
    • Consider adding a Ruby Timeout.timeout wrapper for safety

Medium Priority

  1. Benchmark thresholds seem loose (benchmark.yml:316, 333):

    alert-threshold: '150%'
    fail-on-alert: true
    
    • 150% regression before alerting seems high for performance tests
    • Consider tightening to 125% or 120% for RPS
    • Latency might need different threshold (e.g., 150% is more reasonable for spikes)
  2. Missing file validation:

    • bench-node-renderer.rb finds bundles in .node-renderer-bundles/
    • Good error message if directory missing, but doesn't check bundle validity
    • What if bundles are corrupted or incomplete?
    • Add basic validation (check for package.json or specific files)
  3. k6.ts uses strict status checks (k6.ts:74-79):

    check(response, {
      status_200: (r) => r.status === 200,
      status_3xx: (r) => r.status >= 300 && r.status < 400,
      status_4xx: (r) => r.status >= 400 && r.status < 500,
      status_5xx: (r) => r.status >= 500,
    });
    
    • Good: Categorizes status codes
    • Issue: Doesn't handle 1xx informational responses (rare but possible)
    • Status 0 (connection failed) won't match any check - handle this case
  4. Large file cleanup (bench-node-renderer.rb:210):

    FileUtils.rm_f(vegeta_bin)
    
    • Good practice to delete large binary files
    • Consider doing the same for k6 if it generates large files

Low Priority / Style

  1. Magic numbers in configuration:

    • Line 211 in bench.rb: 10.times for warmup
    • Line 30 in benchmark.yml: default: 10 for connections
    • Consider extracting to named constants at top of files
  2. Inconsistent metric naming:

    • bench.rb outputs "p50(ms)", "p90(ms)", etc.
    • convert_to_benchmark_json.rb creates "- p50 latency", "- p90 latency"
    • Both are fine, but consider consistent capitalization/formatting
  3. Documentation:

    • docs/planning/library-benchmarking.md is mentioned in PR description
    • Verify this document explains WHY these benchmarks matter
    • Should include: how to interpret results, when to run locally, thresholds
  4. Test coverage:

    • No RSpec tests for benchmark scripts themselves
    • Consider adding basic tests for validation functions, parsing logic
    • At minimum, add smoke test to CI that validates scripts don't crash
  5. Prettier/formatting:

    • Verify k6.ts passes ESLint/Prettier
    • Run pnpm run format.listDifferent to check
    • The /* eslint-disable import/no-unresolved */ is fine since k6 is global

🔍 Code Quality Observations

Good Practices:

  • Frozen string literals throughout Ruby files
  • Descriptive variable names
  • Comprehensive comments explaining k6 script configuration
  • Proper use of set -e in bash scripts (workflow steps)

Potential Improvements:

  • Consider extracting common workflow logic to reusable actions
  • The workflow is 589 lines - consider splitting Core/Pro into separate jobs
  • Some repeated code in benchmark.yml (server start logic) - could be a script

🧪 Testing Recommendations

Before merging:

  1. Run RuboCop: bundle exec rubocop benchmarks/
  2. Test locally:
    • Run ruby benchmarks/bench.rb with Core dummy app
    • Run ruby benchmarks/bench-node-renderer.rb with Pro
    • Verify all output files are generated correctly
  3. Test workflow manually: Trigger via workflow_dispatch with various input combinations
  4. Verify cleanup: Ensure server processes are killed properly after workflow

📝 Documentation Checklist

  • [ ] Add README in benchmarks/ directory explaining how to use scripts
  • [ ] Document environment variables in script headers (✅ partially done in k6.ts)
  • [ ] Explain benchmark interpretation in docs/planning/library-benchmarking.md
  • [ ] Add troubleshooting section for common issues (server won't start, tools not found)

Performance Considerations

  • Benchmarks will run on every push to master - this is expensive
  • Consider limiting to:
    • Weekly schedule for full benchmarks
    • Push to master only for smoke test (1 route, 10s duration)
    • Full benchmarks only on labeled PRs or workflow_dispatch
  • Current config seems reasonable, but monitor CI minutes usage

Security Assessment

Overall security posture is good for a CI-only tool:

  • ✅ Proper input validation prevents command injection
  • ✅ Uses Shellwords.escape where appropriate
  • ✅ Secrets properly handled via GitHub Secrets
  • ⚠️ Verify SSH access removed/gated properly
  • ⚠️ Document that passwords are test-only, not for production

Conclusion

This is a solid addition to the project. The benchmarking framework is well-designed, properly integrated into CI, and follows most project conventions.

Recommendation: Approve with minor changes

Priority fixes before merge:

  1. Run RuboCop and fix any violations
  2. Verify SSH access is properly removed/gated
  3. Add a comment explaining the aggressive pkill usage in cleanup
  4. Consider tightening alert thresholds from 150% to 125%

Everything else can be addressed in follow-up PRs if needed.

Great work on this comprehensive benchmarking system! 🎉

claude[bot] avatar Dec 15 '25 13:12 claude[bot]

PR Review: Performance Testing Infrastructure

This is a comprehensive and well-structured PR adding performance benchmarking to the React on Rails project. Overall, the implementation is solid with good attention to detail.

Strengths

Architecture and Design

  • Excellent separation of concerns with benchmark helpers extracted into reusable module
  • Flexible configuration via environment variables with sensible defaults
  • Multi-tool approach: k6 for Rails benchmarks, Vegeta for Node Renderer with HTTP/2 support
  • Smart route detection: Automatically discovers benchmarkable routes from Rails with proper filtering
  • Well-documented strategy in docs/planning/library-benchmarking.md

Code Quality

  • Robust error handling with comprehensive parameter validation
  • Safety features: warm-up phase, server health checks with retries, proper cleanup
  • Structured output in multiple formats (JSON, text, summary)
  • Smart GitHub Actions integration with caching, artifacts, and conditional execution

Production Readiness

  • New bin/prod and bin/prod-assets scripts with validation
  • Properly configured Puma for benchmarking with worker processes and thread control
  • Well-organized CI workflow with clear steps, error handling, and summaries

Issues and Concerns

1. Security: Command Injection Risk (HIGH)

Location: benchmarks/bench.rb:59 and benchmarks/bench-node-renderer.rb:94

The URL construction in bench-node-renderer.rb uses string interpolation without escaping. While url_encode is used for the body, the URL itself should be shell-escaped.

Recommendation: Use system() with array arguments instead of backticks, or ensure all interpolated values are properly escaped with Shellwords.escape

2. Security: Process Killing (MEDIUM)

Location: .github/workflows/benchmark.yml:352-353, 564-565

The workflow uses pkill -9 -f with a broad regex that could kill unrelated processes. While the comment says it is safe in isolated CI, it is a dangerous pattern.

Recommendation: Track PIDs explicitly when starting servers and kill only those specific PIDs.

3. Hardcoded Credentials Reading (LOW)

Location: benchmarks/bench-node-renderer.rb:21-29

Reading passwords from config files with regex is fragile and could break with formatting changes.

Recommendation: Consider exposing the password as an environment variable.

Suggestions for Improvement

4. Missing Tests

There are no tests for the benchmark scripts. While these are developer-only tools, basic tests would catch regressions.

Recommendation: Add spec/benchmarks/ with unit tests for helper functions.

5. Workflow Complexity

The benchmark workflow is 588 lines - very large for a single workflow file.

Recommendation: Consider extracting repeated setup steps into reusable actions or splitting into separate workflows.

6. Magic Numbers

Extract magic numbers like warmup count (10) and delay (0.5) to named constants.

7. Error Handling

The workflow could provide more informative errors with additional context about server status when failures occur.

Testing Recommendations

Before merging, verify:

  1. Manual test the workflow via workflow_dispatch with different parameter combinations
  2. Test failure scenarios (server fails, invalid parameters, no routes, missing tools)
  3. Test on a PR with benchmark label to ensure conditional triggers work
  4. Verify artifact uploads are properly accessible

Overall Assessment

This is high-quality work that adds significant value to the project. The benchmarking infrastructure is well-designed, comprehensive, and production-ready.

Main concerns:

  1. Security issues (command injection, aggressive process killing) - should be addressed before merge
  2. Lack of tests - acceptable for MVP but should be added soon
  3. Workflow complexity - not blocking but consider refactoring for maintainability

Recommendation: Approve with requested changes - fix the security issues, then merge.

Great work on this PR!

claude[bot] avatar Dec 15 '25 14:12 claude[bot]

PR Review: Add Performance Tests

This PR adds comprehensive benchmarking infrastructure to the React on Rails project. Here's my detailed review:


🎯 Overall Assessment

Strong implementation with well-structured code, good documentation, and appropriate tooling choices. The PR successfully addresses #1777 by adding automated performance testing in CI.


✅ Strengths

1. Excellent Architecture

  • Clean separation of concerns with modular helper functions (benchmark_helpers.rb)
  • Reusable benchmarking scripts for both Core and Pro versions
  • Well-designed workflow with clear steps and error handling

2. Comprehensive Tooling

  • k6 for HTTP benchmarking (Core Rails routes) - modern, reliable, good reporting
  • Vegeta for HTTP/2 benchmarking (Pro Node Renderer) - appropriate for h2c protocol
  • Tool caching to speed up CI runs
  • Proper validation of benchmark parameters

3. Good Documentation

  • Clear library-benchmarking.md explaining the "max rate" approach and trade-offs
  • Inline comments explaining non-obvious logic
  • Good workflow parameter documentation
  • CHANGELOG entry (in Developer section - appropriate)

4. Strong Error Handling

  • Proper timeout checks for server startup
  • Validation of all benchmark parameters
  • Graceful handling of missing bundles
  • Clear error messages with actionable guidance

5. CI Integration

  • Smart conditional execution (master, workflow_dispatch, or labeled PRs)
  • Artifact uploads for troubleshooting
  • Integration with github-action-benchmark for tracking
  • Proper cleanup steps in always() blocks

🔍 Code Quality Issues

CRITICAL: Security - Credential Exposure

Location: benchmarks/bench-node-renderer.rb:20-29

def read_password_from_config
  config_path = File.expand_path(
    "../react_on_rails_pro/spec/dummy/client/node-renderer.js",
    __dir__
  )
  config_content = File.read(config_path)
  match = config_content.match(/password:\s*['"]([^'"]+)['"]/)
  match ? match[1] : raise("password not found in \#{config_path}")
end

Issue: The password is hardcoded in the config file and read via regex. While this is a test/benchmark password (not production), this pattern could be problematic if:

  • The config file is accidentally committed with a real password
  • Someone copies this pattern for production code

Recommendation:

  • Add a comment explaining this is ONLY for benchmarking and should never be used in production
  • Consider using an environment variable with a fallback to the config file
  • Ensure the password in the config file is clearly marked as a test password

MEDIUM: Changelog Placement

Location: CHANGELOG.md:24-26

Per CLAUDE.md guidelines:

Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements) Do NOT add entries for: linting, formatting, refactoring, tests, or documentation fixes

Issue: This is marked as "Developer (Contributors Only)" which is appropriate, but the CLAUDE.md guidance says "Do NOT add entries for... tests". This benchmarking infrastructure is specifically for contributors/CI, not end users.

Recommendation: This is actually fine as-is since it's in the "Developer (Contributors Only)" section, which is the correct place for CI/tooling improvements. Consider clarifying in CLAUDE.md that "Developer" section entries ARE acceptable for significant CI/testing infrastructure.


⚠️ Potential Issues

1. Process Management in CI

Location: .github/workflows/benchmark.yml:349

pkill -9 -f "ruby|node|foreman|overmind|puma" || true

Issue: Using pkill -9 with broad patterns could kill unintended processes in a shared CI environment.

Recommendation: While this is acceptable in isolated GitHub Actions runners, consider:

  • Using more specific process identifiers
  • Storing PIDs from the startup step
  • Using pkill -P (kill process group) instead

2. Hardcoded Port Numbers

Multiple locations:

  • Port 3001 for Rails server
  • Port 3800 for Node Renderer

Issue: If multiple benchmarks run concurrently (future scaling), ports could conflict.

Recommendation:

  • Current approach is fine for single-job workflow
  • If parallelization is added later, use dynamic port allocation
  • Document the port assumptions clearly (which you've done well)

3. No Baseline Protection

Location: .github/workflows/benchmark.yml:316, 525

fail-on-alert: true
alert-threshold: '150%'

Issue: 150% threshold is very permissive. A 50% regression is significant.

Recommendation:

  • Consider tightening to 125% or 130% once baseline is established
  • Add a comment explaining the rationale for the threshold
  • Monitor a few runs to see actual variance before tightening

4. Route Filtering Logic

Location: benchmarks/bench.rb:33-46

def route_has_required_params?(path)
  path_without_optional = path.gsub(/\([^)]*\)/, "")
  path_without_optional.include?(":")
end

Issue: This regex assumes parentheses are only used for optional params. If a route has nested parentheses or literal parentheses in the path (rare but possible), this could fail.

Recommendation:

  • Add a comment explaining the assumption
  • Consider using Rails route introspection API instead of regex parsing
  • Add a test case for this logic

5. Warm-up Strategy

Location: benchmarks/bench.rb:211-216

10.times do
  server_responding?(target)
  sleep 0.5
end

Issue: Fixed 10 requests might not be enough for JIT compilation, DB connection pooling, etc.

Recommendation:

  • Consider a time-based warm-up (e.g., 5-10 seconds)
  • Or adaptive warm-up (continue until latency stabilizes)
  • Current approach is acceptable for initial implementation

🎨 Style & Best Practices

Minor Issues:

  1. RuboCop Disables

    • benchmarks/bench.rb:140 and benchmarks/bench-node-renderer.rb:151
    • Disabling complexity metrics is acceptable for benchmark scripts
    • Consider extracting some logic to reduce complexity instead
  2. Magic Numbers

    • 30s duration, 60s timeout, 10 connections
    • Well-documented in code and configurable via env vars ✅
    • Good defaults chosen
  3. Error Messages

    • Clear and actionable throughout ✅
    • Good use of emojis for visual scanning in CI logs

🧪 Test Coverage

Missing:

  • Unit tests for helper functions (benchmark_helpers.rb)
  • Tests for route filtering logic
  • Tests for JSON conversion

Recommendation:

  • Add RSpec tests for benchmark_helpers.rb functions
  • Test edge cases (missing files, invalid JSON, network errors)
  • Not blocking for merge, but should be added

🚀 Performance Considerations

  1. Caching Strategy

    • Good caching of tools (Vegeta, foreman)
    • Good caching of node_modules and gems
    • Cache keys look correct
  2. Benchmark Duration

    • 30s per route is reasonable
    • With ~15-20 routes, total benchmark time ~10-15 minutes
    • Consider reducing to 20s if CI time becomes an issue
  3. Parallel Execution

    • Currently sequential (one app at a time)
    • Could parallelize Core and Pro benchmarks in separate jobs
    • Current approach is fine for initial implementation

📝 Documentation

Strengths:

  • Clear library-benchmarking.md with trade-off analysis ✅
  • Good inline comments ✅
  • Clear parameter documentation in workflow ✅

Missing:

  • How to run benchmarks locally (beyond env vars)
  • How to interpret results and identify regressions
  • What to do when benchmarks fail

Recommendation: Add a benchmarks/README.md with:

  • Local execution instructions
  • How to debug failures
  • How to add new benchmark routes
  • How to interpret github-action-benchmark results

🔒 Security Considerations

Issues Identified:

  1. Password in source (covered above under CRITICAL)

  2. SECRET_KEY_BASE in workflow

    • 'dummy-secret-key-for-ci-testing-not-used-in-production'
    • This is fine for benchmarking, clearly marked as dummy ✅
  3. Pro License Secret

    • Using GitHub Secrets properly ✅
    • Secret name: REACT_ON_RAILS_PRO_LICENSE_V2

✨ Recommendations Summary

Before Merge (High Priority):

  1. Add security comment to password extraction function
  2. Verify CHANGELOG.md placement is correct (it is, in Developer section)
  3. ⚠️ Consider tightening alert threshold from 150% to 125-130% after baseline

Follow-up (Medium Priority):

  1. Add benchmarks/README.md with usage instructions
  2. Add unit tests for benchmark_helpers.rb
  3. Consider using PID tracking instead of pkill -9 with broad patterns
  4. Monitor initial benchmark runs and adjust thresholds/duration as needed

Future Enhancements (Low Priority):

  1. Parallelize Core and Pro benchmarks in separate jobs
  2. Add multiple samples per route to reduce variance
  3. Consider adaptive warm-up strategy

🎉 Conclusion

This is excellent work with thoughtful design decisions and good engineering practices. The code is clean, well-documented, and properly integrated with CI.

Recommendation: Approve with minor comments addressed.

The identified issues are mostly minor, with the password handling being the only security concern (and it's mitigated by being test-only). The PR is ready to merge once the security comment is added.

Great job on this contribution! 🚀


📊 Stats

  • Lines added: 1,742
  • Lines removed: 39
  • Files changed: 30
  • New workflows: 1 (benchmark.yml)
  • New benchmarking scripts: 4
  • Documentation files: 1

Review performed following CLAUDE.md guidelines and React on Rails best practices.

claude[bot] avatar Dec 15 '25 18:12 claude[bot]

@coderabbitai Review.

alexeyr-ci2 avatar Dec 15 '25 22:12 alexeyr-ci2