Summary

Add performance tests. Fixes #1777.

Pull Request checklist

~[ ] Add/update test to cover these changes~
[ ] Update documentation
[ ] Update CHANGELOG file

This change is

Oct 20 '25 14:10 alexeyr-ci2

Walkthrough

Adds a CI benchmarking system: new GitHub Actions workflow, k6/Vegeta benchmark drivers (TypeScript + Ruby), shared helpers and converter, dummy-app production/start scripts and configs, tooling/ignore updates, and docs/contributing updates to run and publish Core and Pro performance tests in CI and locally. (50 words)

Changes

Cohort / File(s)	Change Summary
CI Benchmark Workflow `.github/workflows/benchmark.yml`	New GitHub Actions workflow to run Core/Pro benchmarks (Rails + Node renderer), install/cache tools, prepare apps, start/stop servers, run k6/Vegeta, validate/convert results, upload artifacts, optionally push data on master, and emit a workflow summary with guarded failures.
Benchmark Orchestrators `benchmarks/bench.rb`, `benchmarks/bench-node-renderer.rb`	New Ruby drivers: `bench.rb` runs k6 per Rails route (discovery, warm‑up, per-route JSON parsing); `bench-node-renderer.rb` runs Vegeta (HTTP/2) per production bundle, detects RSC bundles and emits textual JSON summaries.
Benchmark Support & Conversion `benchmarks/lib/benchmark_helpers.rb`, `benchmarks/convert_to_benchmark_json.rb`, `benchmarks/k6.ts`	New shared Ruby helpers for env/validation/server readiness/summary; converter to produce unified benchmark JSON; TypeScript k6 script with dynamic scenarios and runtime checks.
Dummy App Prod Scripts & Procfiles `react_on_rails/spec/dummy/bin/prod`, `react_on_rails/spec/dummy/bin/prod-assets`, `react_on_rails_pro/spec/dummy/bin/prod`, `react_on_rails_pro/spec/dummy/bin/prod-assets`, `react_on_rails_pro/spec/dummy/Procfile.prod`	Added production start and asset-build scripts with asset freshness checks, environment exports, CI bootsnap precompile hook, and process-supervisor selection (overmind/foreman) with fallbacks.
Puma, Shakapacker & Env Configs `react_on_rails/spec/dummy/config/puma.rb`, `react_on_rails/spec/dummy/config/shakapacker.yml`, `react_on_rails_pro/spec/dummy/config/puma.rb`, `react_on_rails_pro/spec/dummy/config/shakapacker.yml`, `react_on_rails_pro/spec/dummy/config/environments/production.rb`, `react_on_rails_pro/spec/dummy/config/database.yml`	Puma: ENV-driven min/max threads, production workers, worker_shutdown_timeout; Shakapacker: cache path, extract_css and ensure_consistent_versioning; production env tweaks and DB pool uses `RAILS_MAX_THREADS`.
Asset Task Update `react_on_rails_pro/spec/dummy/lib/tasks/assets.rake`	Replaced `yarn` invocations with `pnpm` in asset build tasks.
Project Tooling & Ignore Files `.gitignore`, `.prettierignore`, `react_on_rails_pro/.prettierignore`, `knip.ts`, `package.json`, `CLAUDE.md`	Ignore `bench_results`; add recursive `*/vendor` prettier ignore and `bin/.` to knip ignores; include `benchmarks/k6.ts` in knip entries; add `@types/k6` devDependency; add pre‑commit YAML lint guidance.
Docs & Contribution `docs/planning/library-benchmarking.md`, `CONTRIBUTING.md`, `CHANGELOG.md`, `CLAUDE.md`	New benchmarking planning doc, CONTRIBUTING updated with benchmarking section, CHANGELOG entry for CI benchmarking, and CLAUDE.md YAML lint guidance.

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub Actions
    participant Runner as CI Runner
    participant Tools as Tools (k6, Vegeta, Node, Ruby, pnpm)
    participant App as Dummy App (Rails + Node Renderer)
    participant Store as Artifacts/Registry

    GH->>Runner: trigger (workflow_dispatch / push / labeled PR)
    Runner->>Tools: install & restore caches
    Runner->>App: prepare apps (build assets / generate packs)
    Runner->>App: start prod servers (rails, node-renderer)
    Runner->>App: wait_for_server / health checks
    Runner->>Tools: run Core benchmarks (k6 per route)
    Runner->>Tools: run Pro benchmarks (Vegeta per bundle)
    Tools->>App: send HTTP requests → collect metrics
    Runner->>Runner: parse & convert summaries to benchmark JSON
    Runner->>Store: upload artifacts & publish results
    Runner->>GH: post workflow summary / status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Focus review on:
- benchmarks/bench.rb — route discovery, k6 invocation, JSON parsing.
- benchmarks/bench-node-renderer.rb — bundle discovery, RSC detection, Vegeta HTTP/2 usage and aggregation.
- .github/workflows/benchmark.yml — conditional gates, caching, secrets usage, artifact publishing.
- benchmarks/lib/benchmark_helpers.rb & benchmarks/convert_to_benchmark_json.rb — validation, failure metrics and JSON schema correctness.
- dummy app bin/prod / Procfile.prod scripts — asset freshness checks and supervisor fallbacks.

Possibly related PRs

shakacode/react_on_rails#2153 — updates/unifies RuboCop/config for new performance/benchmark files (actionlint/yamllint touches).
shakacode/react_on_rails#2121 — pnpm/Yarn migration and CI tooling changes that overlap with asset/build/CI steps.

Suggested reviewers

AbanoubGhadban
alexeyr-ci
justin808

Poem

🐰
I hopped through scripts at break of day,
Laid k6 tracks where requests may play.
Rails and Node hum, numbers in sight,
Bundles sprint by under moonlight.
A rabbit cheers — benchmarks take flight!

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add performance tests' accurately summarizes the main objective of the pull request, which is to introduce performance testing capabilities via GitHub Actions workflows and benchmarking scripts.
Linked Issues check	✅ Passed	The PR implements the requirement from issue #1777 to add performance tests using k6 for the dummy app, delivering benchmarking infrastructure via GitHub Actions workflow, benchmark scripts for Rails and Node Renderer, and helper utilities.
Out of Scope Changes check	✅ Passed	All changes are scoped to performance testing infrastructure. Configuration updates (puma.rb, shakapacker.yml, database.yml, environments/production.rb) support the benchmark environment, and documentation/tooling changes prepare the project for CI benchmarking.
Docstring Coverage	✅ Passed	Docstring coverage is 90.63% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment
[ ] Commit unit tests in branch alexeyr/performance-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Oct 20 '25 14:10 coderabbitai[bot]

Code Review - PR #1868: Add Performance Tests

Thanks for this contribution! This is a solid implementation of performance testing infrastructure.

✅ Strengths

Comprehensive Tool Coverage: Using three industry-standard tools (Fortio, Vegeta, k6) provides good cross-validation
Well-Structured Workflow: Clear separation of concerns with logical step grouping
Good Error Handling: Proper use of set -euo pipefail and validation steps
Caching Strategy: Smart use of GitHub Actions cache for benchmark tool binaries
Security Conscious: SSH access properly gated with warnings and actor-only access
Flexible Configuration: Parameterized inputs allow customization

🐛 Critical Issues

Node.js Version Mismatch (benchmark.yml:155)
- Uses Node 22, but PR #1925 documents Node 22 V8 cache crashes
- Recommend changing to Node 20 until #1925 is resolved
Missing Trailing Newlines (CRITICAL per CLAUDE.md)
- Files spec/dummy/bin/prod and spec/dummy/bin/prod-assets missing final newlines
- This will cause CI failures - Run: rake autofix to fix
Race Condition in Server Startup (benchmark.yml:220)
- Background server might not be fully initialized when curl check succeeds
- Recommend adding health check endpoint or verifying multiple successful responses
Incomplete PR Checklist
- Documentation and CHANGELOG boxes unchecked
- Per CLAUDE.md: Should update CHANGELOG for user-visible features

⚠️ Performance & Quality Issues

bench.sh:66 - Unrealistic iteration count assumes 10 iterations/sec per VU
Fortio Optimization Missing - Referenced guide not implemented
No Resource Monitoring - Missing CPU/memory metrics
Missing Test Validation - No thresholds to detect performance regressions
Single Route Testing - Only tests server_side_hello_world_hooks
Complex Validation Logic - Lines 17-27 hard to read
Inconsistent Quoting - Variables not consistently quoted
Hard-coded Timeouts - 30s and 60s timeouts not configurable

📚 Documentation Issues

Missing Documentation - No README for running benchmarks locally
Workflow Triggers - Runs on ALL PRs (expensive, should use labels/manual only)
No Historical Tracking - Results not compared over time

🎯 Before Merging

Fix Node.js version (22 → 20)
Add trailing newlines
Run bundle exec rubocop
Update CHANGELOG.md
Add performance thresholds
Update workflow triggers

📊 Summary

Assessment: Good foundation but needs critical fixes before merge

Effort: Critical issues ~30 mins, High priority ~1-2 hours

Main concerns: Node.js version mismatch and missing trailing newlines will cause CI failures.

Let me know if you need help with any of these!

Nov 05 '25 12:11 claude[bot]

PR Review: Add Performance Tests (1868)

Summary

This PR adds comprehensive performance testing infrastructure using three industry-standard benchmarking tools (Fortio, Vegeta, and k6). Well-structured implementation addressing issue 1777.

Critical Issues

1. Missing Trailing Newlines

CRITICAL per CLAUDE.md: Files MUST end with newline character or CI will fail:

spec/dummy/bin/prod (line 4)
spec/dummy/bin/prod-assets (line 9)
spec/performance/bench.sh (line 202)

2. RuboCop Required

CRITICAL per CLAUDE.md: Must run bundle exec rubocop and fix ALL violations before commit.

Potential Bugs

K6 JSON Parsing (bench.sh:187-192)

The jq expression uses add which fails on empty arrays. Use add // 0 for safety.

Server Startup Race (benchmark.yml:254)

Uses exit 0 inside loop which exits the entire step. Should use break instead.

npm vs yarn (benchmark.yml:200)

Uses npm install --global yalc but CLAUDE.md requires yarn-only commands.

Code Quality Strengths

Excellent tool selection - three benchmarking tools for cross-validation
Well-structured workflow with clear sections
Good error handling with set -euo pipefail
Comprehensive parameterization via workflow inputs
Smart binary caching for faster CI
Security-conscious SSH access gating

Suggestions

Shell portability: Replace bc with awk for float comparisons
Hardcoded route: Make server_side_hello_world_hooks configurable
Duration: 30s may be too short, consider 60s for stable measurements
Add documentation for local benchmark execution
Consider performance regression detection vs master
Add spec/performance/README.md with interpretation guide

Test Coverage

Good: Infrastructure in place, runs on PR/push, results preserved as artifacts Missing: No bash script tests, no performance thresholds, no local testing docs

Verdict

Solid PR with high code quality. Performance testing infrastructure will be valuable.

Required Before Merge:

Fix trailing newlines (will cause CI failure)
Run bundle exec rubocop and fix violations
Fix K6 empty array bug

PR Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a well-structured addition to the repository. Here's my detailed feedback:

✅ Strengths

Comprehensive Tool Coverage: Great choice using three different load testing tools (Fortio, Vegeta, k6) to get diverse perspectives on performance
Excellent Error Handling: The bash script has robust parameter validation and error handling throughout
Smart Caching: GitHub Actions workflow efficiently caches binaries to speed up future runs
Good Documentation: Clear comments, parameter descriptions, and helpful echo statements
Configurable Parameters: Extensive workflow_dispatch inputs allow flexible testing scenarios
Production-Ready Setup: Proper Puma configuration with workers and preloading for production benchmarks

🐛 Potential Issues

1. Script Portability - bc Dependency (bench.sh:21,33)

The script uses bc for floating-point comparisons but bc is not installed in the workflow.

Fix: Add bc to the apt package installation step in the workflow, or use bash-native arithmetic for integer comparisons.

2. Race Condition in Server Startup (benchmark.yml:248)

The server process is backgrounded but there is no guarantee it has started before the polling loop begins. If the server fails to start immediately, the loop might miss error messages.

Suggestion: Add a brief sleep 2 before the polling loop, or capture the PID and check if the process is still running during the timeout loop.

3. k6 Status Reporting Edge Case (bench.sh:199-204)

If .root_group.checks is empty, add returns null, causing K6_REQS_OTHER calculation to fail.

Fix: Add null handling by changing the jq expression to use | add // 0

4. Missing Newline at End of Files

Per CLAUDE.md requirements: ALWAYS ensure files end with a newline character

Files needing newlines:

spec/dummy/bin/prod (line 4)
spec/dummy/bin/prod-assets (line 9)
spec/performance/bench.sh (line 214)

Fix: Run rake autofix or add trailing newlines manually.

5. Benchmark Workflow Runs on Every PR (benchmark.yml:57-58)

Running 30-second benchmarks with 3 tools on every PR will consume significant CI minutes and slow down PR feedback.

Suggestion:

Remove pull_request trigger and only keep workflow_dispatch + push to master
Or add a condition to only run on PRs with a specific label (e.g., performance)
Or significantly reduce default duration (e.g., 10s) for PR runs

🔍 Code Quality & Best Practices

Security

✅ SSH access is properly gated behind workflow_dispatch input with clear warnings
✅ limit-access-to-actor: true prevents unauthorized access
⚠️ Consider adding a comment warning maintainers never to merge code while SSH session is active

Performance

✅ Binary caching will significantly speed up repeated runs
✅ Server warm-up phase (10 requests) is good practice
⚠️ REQUEST_TIMEOUT=60s is very generous - consider 30s default to catch timeout issues faster

Maintainability

✅ Clear section comments in YAML
✅ Parameterized configuration
⚠️ Consider extracting the jq parsing logic (lines 167-209) into a separate function for readability

📊 Test Coverage

Missing:

❌ No automated tests for the bench.sh script itself
❌ No validation that the summary.txt format is correct
❌ No performance regression detection (just data collection)

Suggestions:

Add a unit test for bench.sh parameter validation
Consider storing benchmark results over time and comparing against baselines
Add a step to post benchmark results as a PR comment for visibility

🔧 Configuration Issues

Puma Configuration (spec/dummy/config/puma.rb:38)

Workflow sets default to 4 workers (WEB_CONCURRENCY: 4), but Puma defaults to 2 if not set. The workflow default should match.

Consistency check needed: Verify that WEB_CONCURRENCY=4 is intentional for CI runners with 2 CPU cores.

📝 Documentation & Changelog

Per CLAUDE.md: Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements)

This PR qualifies as it adds a performance testing feature. Needs:

✅ Changelog entry format: PR 1868 by alexeyr-ci2: Add performance benchmarking workflow with Fortio, Vegeta, and k6.

Missing:

❌ No README/docs update explaining how to run benchmarks locally
❌ No documentation on interpreting benchmark results

🎯 Recommendations

Before Merge (Critical):

✅ Add trailing newlines to all new files
✅ Run bundle exec rubocop and fix any violations
✅ Add CHANGELOG entry
✅ Fix k6 null handling edge case
✅ Reconsider pull_request trigger or reduce duration

Nice to Have: 6. Add bc to dependencies or remove floating-point support 7. Add brief README section on running benchmarks 8. Handle server startup race condition 9. Consider performance regression detection

🎓 Overall Assessment

This is a solid, production-ready implementation of performance testing. The code quality is high, with excellent error handling and configurability. The main concerns are:

CI cost - running on every PR may be expensive
Missing documentation - users won't know how to use this
File formatting - needs trailing newlines per repo standards

Recommendation: Request changes for the critical items (newlines, changelog, PR trigger), then approve.

Score: 8/10 - Great implementation, minor issues to address.

Nov 06 '25 10:11 claude[bot]

Pull Request Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a valuable addition to the project. Below is my detailed review:

✅ Strengths

Multi-tool approach: Using Fortio, Vegeta, and K6 provides cross-validation of results and flexibility for different testing scenarios.
Well-structured workflow: The GitHub Actions workflow is logically organized with clear steps and helpful comments.
Configurable parameters: Extensive configurability via workflow inputs allows fine-tuning for different performance scenarios.
Good error handling: The bench.sh script includes comprehensive input validation and error messages.
Caching: Binary caching for tools reduces workflow execution time.
Production mode testing: Tests against production-like configuration (Puma workers, precompiled assets) for realistic results.

🔍 Issues and Concerns

Critical Issues

Missing file newlines (spec/dummy/bin/prod, spec/dummy/bin/prod-assets)
- Location: spec/dummy/bin/prod:4, spec/dummy/bin/prod-assets:9
- Impact: CI will fail per CLAUDE.md requirements
- Fix: Ensure both files end with a newline character
- Reference: CLAUDE.md states "ALWAYS ensure files end with a newline character"
RuboCop not run
- Impact: MANDATORY requirement before commits
- Required action: Run bundle exec rubocop and fix ALL violations
- Reference: CLAUDE.md: "BEFORE EVERY COMMIT/PUSH: ALWAYS run bundle exec rubocop and fix ALL violations"

High Priority Issues

Workflow runs on every PR/push to master (benchmark.yml:55-58)
- Impact: Performance tests are resource-intensive and will slow down CI significantly
- Recommendation: Only run on manual workflow_dispatch, specific labels (e.g., run-benchmarks), or scheduled runs (e.g., nightly)
Server process not properly cleaned up (benchmark.yml:248)
- Issue: Server started in background with bin/prod & but never explicitly killed
- Impact: May leave orphaned processes, especially if workflow fails
- Fix: Add cleanup step or use a trap/signal handler

Medium Priority Issues

Hardcoded route in bench.sh (line 6)
- Issue: ROUTE:-server_side_hello_world_hooks may not exist or may not be representative
- Recommendation: Verify this route exists in the dummy app, consider testing multiple routes, and document what this route tests
No baseline comparison
- Issue: Results are collected but not compared against previous runs
- Enhancement: Consider using benchmark-action to track performance over time
Missing documentation
- Issue: No README or documentation for how to run benchmarks locally, interpret results, or understand what "good" performance looks like
- Recommendation: Add spec/performance/README.md

🔒 Security Considerations

SSH access via tmate (benchmark.yml:108-113)
- Status: Properly gated behind workflow input (default: false)
- Good: Includes security warnings and limit-access-to-actor

🧪 Test Coverage

Missing:

No unit tests for the bash script logic
No tests verifying the benchmark workflow succeeds
No validation that summary output format is correct

Recommendations:

Add a dry-run mode to bench.sh for testing
Consider shellcheck for bash script validation

📝 Suggested Changes

Must Fix (Blocks Merge)

[ ] Add trailing newlines to spec/dummy/bin/prod and spec/dummy/bin/prod-assets
[ ] Run bundle exec rubocop and fix all violations
[ ] Change workflow triggers to avoid running on every PR/push

Should Fix (High Priority)

[ ] Add server cleanup/shutdown in workflow
[ ] Verify the benchmark route exists and is representative
[ ] Add basic documentation (spec/performance/README.md)

Nice to Have

[ ] Add baseline comparison/tracking
[ ] Add shellcheck to CI for bash scripts
[ ] Consider shorter default duration for PR checks
[ ] Extract K6 test to separate file

✅ Changelog

Per CLAUDE.md, this should be added to CHANGELOG.md as it's a new feature.

🎯 Overall Assessment

This is a solid foundation for performance testing with good multi-tool coverage and configurability. The main blockers are CLAUDE.md compliance issues (trailing newlines, RuboCop). Once those are fixed and the workflow triggers are adjusted to avoid running on every PR, this will be ready to merge.

Estimated effort to address critical issues: ~30 minutes

Great work overall! 🚀

Nov 07 '25 13:11 claude[bot]

Code Review: Add Performance Tests

Thank you for this comprehensive performance testing addition! This is a well-structured PR that addresses issue #1777.

✅ Strengths

1. Excellent Tool Coverage

Multiple benchmarking tools (Fortio, Vegeta, k6) provide good cross-validation
Smart caching strategy for tool binaries reduces CI time
Flexible configuration via environment variables

2. Robust Error Handling

Good use of set -euo pipefail in bash scripts
Comprehensive input validation with clear error messages
Server health checks with timeout protection

3. Production-Ready Configuration

Proper Puma clustering setup for production benchmarks
Preloading and worker configuration aligned with best practices
Appropriate separation of prod/dev configurations

🔒 Security Concerns - CRITICAL

SSH Access Feature (lines 96-119 in .github/workflows/benchmark.yml)

This poses significant security risks:

Issues:

Detached mode risk: Using detached: true leaves the SSH session open indefinitely
Repository exposure: The entire repository is accessible via SSH
No audit trail: Limited visibility into commands executed during SSH sessions
Secrets exposure risk: If secrets are added later, they could be exposed

Recommendations:

Consider removing this feature entirely for production workflows
If kept: Remove detached: true, add timeout-minutes: 15, and restrict to non-master branches
Document that NO secrets should ever be added to this workflow

🐛 Potential Bugs

1. Division by Zero Risk (spec/performance/bench.sh:195-197)

If K6_REQS_KNOWN_STATUS is null/empty, this could fail. Add validation.

2. Race Condition in Server Startup (spec/dummy/bin/prod)

The rails s command starts but there is no guarantee assets are fully loaded. Consider adding a readiness check that validates asset serving.

3. Missing Error Context

When benchmarks fail, the error message does not indicate which tool failed. Consider wrapping each tool execution with proper error context.

4. Hardcoded Paths (spec/performance/bench.sh:87)

If run from wrong directory, this could create directories in unexpected locations. Consider using SCRIPT_DIR to make paths relative to the script location.

⚡ Performance Considerations

Workflow runs on every push/PR: For a 20-minute benchmark suite, consider only running on workflow_dispatch and push to master, or add path filters to skip documentation changes.
Sequential warmup (spec/performance/bench.sh:73-78): Consider parallel warmup for efficiency.
Triple tool execution: Running all 3 tools provides validation but triples execution time. Consider defaulting to k6 only.

🧪 Test Coverage - Missing

Unit tests for benchmark script: Input validation and JSON parsing logic should be tested
Workflow validation: Consider adding actionlint to CI
Integration test: Should verify benchmark artifacts are generated correctly

📝 Documentation - REQUIRED

Missing per CLAUDE.md requirements:

CHANGELOG.md update - This is a user-visible feature that developers will use. Action needed: Update CHANGELOG.md
Inline documentation: spec/performance/bench.sh needs header comment explaining:
- Purpose
- Usage examples
- Required dependencies
- Expected output format
Workflow documentation: Add comment block in .github/workflows/benchmark.yml explaining when to run benchmarks and how to interpret results

🎨 Code Quality Suggestions

Good Practices:

Consistent error handling
Input validation with clear messages
Proper use of environment variables with defaults
JSON output for machine-readable results

Suggestions:

YAML Anchors: Lines 7-14 define anchors but only use them once - consider simplifying
DRY Principle: Lines 165-208 have very similar blocks for each tool - consider refactoring
Magic Numbers: Several hardcoded values should be configurable (TIMEOUT_SEC=60, warmup count, sleep durations)
Puma Configuration: Default of 2 workers may be too low - document why this was chosen

📋 Action Items Summary

🔴 High Priority (Before Merge):

Address SSH security concerns - Remove or significantly restrict
Add CHANGELOG.md entry (required per project guidelines)
Fix potential null reference in K6 status parsing
Add workflow path filters to prevent unnecessary runs

🟡 Medium Priority (Before Merge):

Add header documentation to bench.sh
Add error context to benchmark failures
Fix hardcoded paths in bash script

🟢 Low Priority (Future Enhancement):

Add unit tests for benchmark script
Refactor duplicate code in parsing logic
Make magic numbers configurable

✨ Overall Assessment

This is a solid foundation for performance testing! The multi-tool approach and comprehensive workflow setup show good engineering practices.

Main concerns:

Security: SSH feature needs significant changes or removal
Documentation: Missing CHANGELOG and inline docs (required per CLAUDE.md)
Robustness: A few edge cases in error handling

Recommendation: Request changes for security and documentation issues, then approve after those are addressed.

Great work on this comprehensive testing infrastructure!

Nov 07 '25 13:11 claude[bot]

Pull Request Review: Add Performance Tests (Part 1/3)

Overview

This PR adds comprehensive benchmarking infrastructure to test React on Rails performance using three industry-standard tools (Fortio, Vegeta, k6). The implementation is well-structured with good error handling and configurability.

✅ Strengths

1. Excellent Tool Selection & Comparison

Using three different benchmarking tools (Fortio, Vegeta, k6) provides cross-validation of results
Each tool has different strengths, giving a more complete performance picture
Smart conditional execution based on TOOLS environment variable

2. Comprehensive GitHub Actions Workflow

Well-organized with clear step sections and comments
Smart caching strategy for binaries and dependencies
Proper error handling with set -e and validation steps
2-hour timeout is reasonable for performance testing
Good use of artifacts for result preservation (30-day retention)

3. Robust Benchmark Script (spec/performance/bench.rb)

Parameter validation: Validates rate, duration, connections, timeout formats
Server availability check: Waits up to 60s with detailed logging
Warm-up phase: 10 requests per route before benchmarking (good practice)
Auto-route discovery: Extracts routes from Rails app automatically
Error handling: Each tool run wrapped in rescue blocks, failures don't stop entire suite
Consistent output: Tab-separated summary file for easy parsing

4. Production-Ready Infrastructure

bin/prod and bin/prod-assets scripts properly separate asset compilation from server startup
Asset freshness checks with helpful warnings
Supports both Core and Pro versions
Procfile.prod for Pro version includes node-renderer process

Dec 04 '25 16:12 claude[bot]

Pull Request Review: Add Performance Tests (Part 2/3)

⚠️ Issues & Recommendations

1. Documentation: Missing CHANGELOG Entry (REQUIRED per CLAUDE.md)

Per CLAUDE.md guidelines, user-visible changes should be documented in CHANGELOG.md.

Recommendation: Add entry like:

#### [PR 1868](https://github.com/shakacode/react_on_rails/pull/1868) by [alexeyr-ci2](https://github.com/alexeyr-ci2)

Added performance benchmarking infrastructure using Fortio, Vegeta, and k6 tools. New GitHub Actions workflow allows automated and manual performance testing of both Core and Pro versions.

2. Security: SSH Access Feature (Medium Priority)

Location: .github/workflows/benchmark.yml:110-128

The SSH access feature using action-tmate is documented but could be clearer about risks.

Recommendations:

✅ Good: Already requires explicit opt-in via debug_enabled input
✅ Good: Limited to workflow initiator with limit-access-to-actor: true
⚠️ Consider: Add a time limit to SSH sessions (currently detached: true means indefinite)
Suggest: Document in PR description when/why developers would use this

3. Resource Management: Server Process Cleanup (Low Priority)

Location: Workflow steps Stop Core/Pro production server

Issue: For Pro version, Procfile.prod also starts node-renderer on port 3800, which may not be cleaned up. If the Rails process spawns child processes, they may not be cleaned up either.

Consider also cleaning up port 3800 for Pro, and killing child processes to ensure complete cleanup.

4. Code Quality: Magic Numbers (Low Priority)

Location: spec/performance/bench.rb:385-388

Consider extracting warm-up constants (10 requests, 0.5s delay) for better documentation.

5. Performance: Workflow Efficiency (Enhancement)

The workflow builds both Core and Pro sequentially. Since they are independent, they could run in parallel jobs for faster CI (trade-off: uses 2x runner minutes but saves wall-clock time).

6. Observability: Missing Baseline Comparison (Enhancement)

Results are uploaded as artifacts but not compared to baseline/previous runs. Future enhancement could add performance regression detection by comparing against stored baselines.

Dec 04 '25 16:12 claude[bot]

Pull Request Review: Add Performance Tests (Part 3/3)

🔍 Detailed Code Review

spec/performance/bench.rb

Positive Patterns:

Line 12-15: env_or_default helper properly handles empty strings and 0 from workflow inputs
Line 38-57: Input validation functions are thorough with clear error messages
Line 59-67: Proper error handling in parse_json_file with specific error types
Line 140-150: Server availability check accepts both 2xx and 3xx responses (handles redirects correctly)
Line 184-186: Validates RATE=max constraint (CONNECTIONS must equal MAX_CONNECTIONS)

Security Notes:

Line 81: Command injection risk is low - app_dir is hardcoded constant, not user input
Line 332: k6 script uses string interpolation but target is validated as URI first

🔐 Security Assessment

Workflow Secrets:

✅ SECRET_KEY_BASE: Properly set as dummy value (not a real secret)
✅ REACT_ON_RAILS_PRO_LICENSE: Correctly uses GitHub secrets

Injection Vectors:

✅ All workflow inputs have proper type constraints (number, string, choice)
✅ Bash scripts use proper quoting
✅ No user input flows into eval or unvalidated shell commands

Resource Limits:

✅ Workflow has 120-minute timeout
✅ Request timeout configurable (default 60s)
⚠️ No memory/CPU limits on benchmark tools (may cause OOM on large loads)

🧪 Test Coverage

Missing tests: This PR adds testing infrastructure but doesn't include tests for the infrastructure itself.

Recommendations for future PRs:

Add unit tests for spec/performance/bench.rb validation functions, route parsing logic, JSON parsing
Add integration test that runs benchmark with DURATION=5s to verify workflow

✅ Compliance with CLAUDE.md

✅ Uses pnpm (not npm/yarn)
✅ Ruby code follows project structure
✅ Bash scripts use proper error handling (set -e)
❌ Missing CHANGELOG.md entry (REQUIRED)
⚠️ Should add newlines to end of all new files (pre-commit hooks will catch this)

🎯 Final Recommendation

Overall: This is a high-quality PR that adds valuable performance testing infrastructure. The code is well-structured, properly error-handled, and follows good practices.

Blocking Issue: Missing CHANGELOG.md entry (required per CLAUDE.md)

Before Merge:

✅ Add CHANGELOG.md entry (REQUIRED)
✅ Run bundle exec rubocop and fix any violations
✅ Ensure all files end with newlines

Consider for Follow-up PRs:

Add baseline comparison functionality
Optimize workflow to run Core/Pro in parallel
Add tests for benchmark script itself
Document when/how to use SSH debug feature

Approval Status: Ready to merge after addressing CHANGELOG.md requirement. Excellent work! 🚀

Dec 04 '25 16:12 claude[bot]

PR Review: Add Performance Tests

This PR adds comprehensive performance testing infrastructure using three benchmarking tools (Fortio, Vegeta, k6). Overall, this is a high-quality PR with well-structured code.

✅ Strengths

Comprehensive Testing: Uses multiple tools for validation, tests both OSS and Pro versions
Well-Structured Code: Clean separation in bench.rb, good error handling
Production-Ready Workflow: Proper asset precompilation, server validation, caching
Good Developer Experience: Clear parameters, retry logic, detailed output

🔧 Code Quality Observations

Ruby Code (spec/performance/bench.rb)

Minor Security Suggestions:

Line 81 - Command Injection (Low Risk) Consider using Shellwords.escape for app_dir variable
Line 121 - Command Injection (Low Risk) Should escape cmd variable since it comes from TOOLS env var

GitHub Actions Workflow

Suggestions:

Frozen Lockfile Inconsistency (Lines 236 vs 258, 396) Root uses --frozen-lockfile but dummy apps do not. This could lead to non-deterministic builds. Recommend adding it or documenting why omitted.
Timeout (Lines 320, 455): 120 minutes seems generous - could hide hanging processes
Resource Allocation: Have you validated WEB_CONCURRENCY=4 + RAILS_MAX_THREADS=3 on 2-core GitHub runners?

🔒 Security Review: ✅ No Critical Issues

Secrets properly handled (only dummy/test values)
SSH debug access properly gated and secured
Command injection risks are low (inputs mostly controlled)

🧪 Test Coverage

Missing:

No RSpec tests for bench.rb itself
No validation of reasonable benchmark results

Recommendation: Add smoke tests for parsing logic and input validation

📝 Documentation

Good CLAUDE.md additions. Could also add:

How to run benchmarks locally
How to interpret results
Expected performance baselines

🎯 Specific Files

.rubocop.yml: Good approach inheriting from react_on_rails config
knip.ts: Correct to ignore system utilities column/tee
.gitignore: bench_results correctly placed at root
bin/prod scripts: Well-designed with pre-flight checks

🚀 Recommendations

Before Merge:

Add shell escaping for security hardening (low priority)
Document or fix --frozen-lockfile inconsistency
Consider adding tests for bench.rb

Future PRs:

Performance regression detection
Benchmark visualization
RSpec unit tests

✅ Summary

Overall Assessment: High-quality PR with valuable performance testing infrastructure. Well-structured code following project conventions.

Main action items (all minor):

Shell escaping for hardening
Frozen-lockfile documentation
Consider adding tests

Great work on comprehensive benchmarking! 🎉

Recommendation: ✅ Approve with minor suggestions

Dec 04 '25 16:12 claude[bot]

Code Review: Performance Testing Infrastructure

Thank you for adding comprehensive performance testing! This PR adds valuable infrastructure for tracking React on Rails performance over time.

Strengths

Comprehensive Benchmarking: Multi-tool comparison (Fortio, Vegeta, k6) provides validation across different methodologies
Well-Structured Workflow: Clear separation of concerns with good caching and cleanup
Production Scripts: bin/prod and bin/prod-assets are well-designed with asset staleness detection

Critical Issues

1. SSH Access Security Risk (Lines 104-130)

The tmate SSH feature could leak REACT_ON_RAILS_PRO_LICENSE secret. Recommend removing it entirely or adding secret-clearing steps.

2. Port Hardcoding

Port 3001 is hardcoded in bench.rb:21, benchmark.yml, and bin/prod scripts. If port is in use, workflow fails mysteriously. Make PORT configurable via environment variable.

3. Server Startup Race Condition (Lines 289-313, 424-448)

The 30-second wait loop does not check if process died. If Puma crashes immediately, it keeps retrying. Add process health check during wait.

4. Missing CHANGELOG

Per PR checklist, CHANGELOG.md needs to be updated with an entry under Unreleased section.

Medium Priority

RuboCop Violations: bench.rb has large methods with disabled cops (lines 188-369). Extract helper methods to reduce complexity.
Inconsistent Error Handling: Some errors raise, others return failure_metrics(e). Choose one approach and document it.

Testing Concerns

Per CLAUDE.md requirements, there is no evidence of local testing in the PR description. Please document:

Testing Core and Pro apps locally
Testing with different parameter combinations
Results of running bundle exec rubocop

Documentation Gaps

Add usage documentation for:

Running benchmarks locally
Triggering workflow in CI
Parameter descriptions
Interpreting results

Action Items Before Merge

Required:

Remove SSH access or add secret-clearing
Fix port hardcoding
Add process health check to server startup
Update CHANGELOG.md
Document local testing results

Recommended: 6. Refactor bench.rb methods 7. Add usage documentation

Overall Assessment

Excellent foundation for performance testing! With security and reliability issues addressed, this will be a solid addition. Estimated effort: 2-3 hours.

Great work! 🚀

Dec 04 '25 16:12 claude[bot]

Code Review: Performance Testing Infrastructure

Thanks for adding performance testing infrastructure! Here is my detailed feedback on PR #1868.

Overall Assessment

Quality: Good - Well-structured benchmarking framework with solid error handling Scope: Appropriate - Addresses issue #1777 with a flexible, configurable system
Testing: Needs improvement - See security and production concerns below

Strengths

Excellent multi-tool approach: Testing with Fortio, Vegeta, and k6 provides good coverage
Well-documented workflow: Clear step separation and helpful comments
Flexible configuration: Good use of environment variables with sensible defaults
Error handling: Solid error handling with descriptive messages
Auto-detection of routes: Smart feature to discover benchmark routes automatically
Proper warmup: 10-request warmup prevents cold-start skew

Critical Issues

1. Security: SSH Access in CI (HIGH PRIORITY)

Location: .github/workflows/benchmark.yml:105-130

The SSH debug access feature is a significant security risk. SSH sessions can remain open indefinitely and expose the GitHub Actions runner.

Recommendations:

Add automatic timeout (30 minutes max)
Consider removing if not actively needed
Add more prominent warnings

2. Command Injection Vulnerability (HIGH PRIORITY)

Location: spec/performance/bench.rb:81,263,269

Shell commands use string interpolation without sanitization. If ROUTES env var contains shell metacharacters, command injection is possible.

Fix: Use Shellwords.escape for user inputs

3. Inconsistent Lock File Handling

Location: react_on_rails/spec/dummy/bin/prod:24-26 and react_on_rails_pro/spec/dummy/bin/prod:40

This project uses pnpm, not yarn (per CLAUDE.md). Scripts check for yarn.lock but should check pnpm-lock.yaml.

Documentation Needs

Per CLAUDE.md requirements:

Update CHANGELOG.md with entry for PR 1868
Document the benchmark workflow
Update CLAUDE.md to mention spec/performance/ directory

Pre-merge Checklist

Fix command injection vulnerability (use Shellwords.escape)
Fix yarn.lock references to pnpm-lock.yaml
Add security warnings or remove SSH access feature
Run bundle exec rubocop and fix violations
Update CHANGELOG.md
Add benchmark documentation
Add bench_results/ to RuboCop exclusions

Summary

Solid foundation for performance testing! Main concerns:

Security: Command injection needs fixing before merge
Consistency: Fix pnpm references
Documentation: Add changelog entry and usage docs

Once security issue is addressed, this will be a valuable addition. Great work on the comprehensive workflow!

Dec 04 '25 17:12 claude[bot]

size-limit report 📦

Path	Size
react-on-rails/client bundled (gzip)	62.5 KB (0%)
react-on-rails/client bundled (gzip) (time)	62.5 KB (0%)
react-on-rails/client bundled (brotli)	53.66 KB (0%)
react-on-rails/client bundled (brotli) (time)	53.66 KB (0%)
react-on-rails-pro/client bundled (gzip)	63.45 KB (0%)
react-on-rails-pro/client bundled (gzip) (time)	63.45 KB (0%)
react-on-rails-pro/client bundled (brotli)	54.53 KB (0%)
react-on-rails-pro/client bundled (brotli) (time)	54.53 KB (0%)
registerServerComponent/client bundled (gzip)	127.35 KB (0%)
registerServerComponent/client bundled (gzip) (time)	127.35 KB (0%)
registerServerComponent/client bundled (brotli)	61.51 KB (+0.2% 🔺)
registerServerComponent/client bundled (brotli) (time)	61.39 KB (0%)
wrapServerComponentRenderer/client bundled (gzip)	121.96 KB (0%)
wrapServerComponentRenderer/client bundled (gzip) (time)	121.96 KB (0%)
wrapServerComponentRenderer/client bundled (brotli)	56.66 KB (0%)
wrapServerComponentRenderer/client bundled (brotli) (time)	56.66 KB (0%)

Dec 08 '25 15:12 github-actions[bot]

Code Review - Performance Testing Infrastructure

I've reviewed PR #1868 which adds comprehensive performance testing infrastructure. This is a substantial and well-structured addition to the project. Here's my detailed feedback:

✅ Strengths

1. Excellent Benchmark Implementation (spec/performance/bench.rb)

Robust parameter validation: Input validation for RATE, CONNECTIONS, DURATION is thorough and provides clear error messages
Multi-tool support: Supporting Fortio, Vegeta, and k6 gives flexibility and cross-validation of results
Smart route discovery: Auto-detecting routes from Rails with proper filtering (pages# and react_router# controllers only)
Required parameter handling: Correctly skips routes with required parameters like /rsc_payload/:component_name
Good error handling: Each tool's execution is wrapped in error handling with clear failure metrics
Server warmup: 10 requests per route before benchmarking prevents cold-start bias

2. Well-Designed GitHub Workflow (.github/workflows/benchmark.yml)

Comprehensive configurability: All key parameters (rate, duration, connections, tools) are configurable via workflow_dispatch
Proper caching: Binary caching for Fortio, Vegeta, and gem caching reduces CI time
Security-conscious SSH access: The debug_enabled input with clear warnings is the right approach
Both core and pro support: Conditional execution based on app_version input is clean
Artifact retention: 30-day retention for benchmark results enables historical analysis
Validation steps: Verifying summary.txt exists and displaying results before upload

3. Production Scripts (bin/prod, bin/prod-assets)

Asset validation: Checking for public/assets and manifest.json before starting server is crucial
Staleness warnings: Warning when source files are newer than compiled assets helps catch mistakes
Foreman/Overmind support: Graceful fallback from overmind → foreman → gem install foreman

📝 Issues & Recommendations

CRITICAL: RuboCop Configuration

Issue: Added root-level .rubocop.yml that excludes spec/performance/**/*
Problem: According to CLAUDE.md, "BEFORE EVERY COMMIT: ALWAYS run bundle exec rubocop and fix ALL violations"
Action required: The new spec/performance/bench.rb should pass RuboCop. Please run:
```
bundle exec rubocop spec/performance/bench.rb
```
And fix any violations. Only add exclusions if there's a valid reason (document it in comments).

Documentation Missing

From the PR checklist:

❌ Update documentation - No user-facing docs added
❌ Update CHANGELOG file - This is a significant feature addition

Recommendations:

Add a section to README or create BENCHMARKING.md covering:
- How to run benchmarks locally
- How to interpret results
- How to use the GitHub Actions workflow
- Available configuration options

Update appropriate CHANGELOG (/CHANGELOG.md or /CHANGELOG_PRO.md):

#### Developer Experience
- Add performance benchmarking infrastructure with Fortio, Vegeta, and k6 support. [PR 1868](https://github.com/shakacode/react_on_rails/pull/1868) by [alexeyr-ci2](https://github.com/alexeyr-ci2)

Testing Concerns

Per CLAUDE.md's CRITICAL LOCAL TESTING REQUIREMENTS:

⚠️ Have you tested the benchmark.yml workflow end-to-end?
- Can you confirm the Core workflow completes successfully?
- Can you confirm the Pro workflow completes successfully?
- Have you verified the uploaded artifacts contain expected files?

If not tested locally (which is understandable for complex CI workflows), please clearly state:

"UNTESTED - requires GitHub Actions environment"
Provide a plan to verify after merge (e.g., run workflow_dispatch manually)

Code Quality Issues

spec/performance/bench.rb:220 - Multiple RuboCop disables:
```
# rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity, Metrics/MethodLength
```
While sometimes necessary, consider refactoring:
- Extract common code from run_fortio_benchmark, run_vegeta_benchmark, run_k6_benchmark
- Create a BenchmarkTool class with a common interface
- Move percentile extraction logic to a helper method
Hard-coded port 3001:
- Multiple locations: bench.rb:21, bin/prod:29, benchmark.yml:303, etc.
- Consider: PORT = env_or_default('PORT', 3001) for flexibility

bench.rb:106 - Shell command without escaping:

routes_output = `cd #{app_dir} && bundle exec rails routes 2>&1`

While app_dir is controlled internally, consider:

routes_output = `cd #{Shellwords.escape(app_dir)} && bundle exec rails routes 2>&1`

benchmark.yml:194 - Hard-coded Ruby version:
```
ruby-version: '3.3.7'
```
Should this use a matrix strategy or at least match the versions in other workflows?

Minor Issues

benchmark.yml:242-258 - TODO comments:

# TODO only needed while we use --ignore-workspace below
# TODO simplify this both here and in integration tests

Create follow-up issues to track these improvements

Deleted file: packages/react-on-rails-pro-node-renderer/script/preinstall.js
- Appears unrelated to this PR's purpose
- Should this be in a separate PR? If intentional, explain in PR description
Path updates in documentation files:
- Many .claude/ and analysis/ files have yarn→pnpm replacements
- These seem like cleanup from a previous migration
- Consider separating documentation updates from feature PRs for easier review

🔒 Security

No security concerns identified:

✅ SSH access properly gated behind workflow_dispatch input
✅ No secrets exposed in benchmark results
✅ Tools are from official releases with pinned versions

⚡ Performance Considerations

Workflow caching is excellent:
- Binary caching saves ~30s per run
- Gem caching saves ~2-3 minutes
- Node modules caching saves ~1-2 minutes
Benchmark duration defaults:
- 30s default might be short for stable results
- Consider documenting: "Use 60s+ for production comparisons"
Parallel tool execution:
- Currently runs tools sequentially (Fortio → Vegeta → k6)
- Could parallelize across routes or tools, but sequential is safer for accurate results

🎯 Final Recommendations

Before merge:

✅ Run bundle exec rubocop spec/performance/bench.rb and fix violations
✅ Add documentation (BENCHMARKING.md or README section)
✅ Update CHANGELOG.md with this feature
✅ Verify workflow runs successfully (at least once via workflow_dispatch)
✅ Address or create issues for TODO comments

Nice to have:

Refactor benchmark tool functions to reduce complexity
Make PORT configurable
Add Shellwords escaping for shell commands
Separate unrelated changes (preinstall.js deletion, doc updates) into their own PRs

Overall, this is excellent work - the implementation is solid, well-tested, and adds significant value. The issues are mostly about following project conventions (RuboCop, CHANGELOG) rather than fundamental problems. Great job! 🎉

Dec 10 '25 01:12 claude[bot]

Pull Request Review: Add Performance Tests

Thanks for adding performance testing infrastructure! This is a valuable addition to the project.

✅ Strengths

Well-Designed Benchmark Suite:

Multiple tools (Fortio, Vegeta, k6) provide cross-validation of results
Comprehensive metrics: RPS, p50/p90/p99 latencies, and status codes
Flexible configuration via environment variables
Smart route auto-detection from Rails routes
Proper warm-up phase before benchmarking

Robust CI Integration:

Conditional execution (master, workflow_dispatch, or labels)
Artifact upload with 30-day retention
Smart caching of benchmark tools
Tests both Core and Pro versions

Good Operational Practices:

Server health checks with retry logic
Proper error handling and validation
Clear logging with indicators
Graceful cleanup

⚠️ Issues to Address

1. Security: Command Injection Vulnerabilities 🔴 HIGH PRIORITY

Location: spec/performance/bench.rb:106, 251, 301-302, 373

The script uses string interpolation in shell commands without proper sanitization. While ROUTES input is validated, route values aren't sanitized before use in shell commands.

Recommendation: Use array form of system() or Shellwords.escape for filenames.

2. FIXME Comments in Production Code 🟡 MEDIUM PRIORITY

Location: .github/workflows/benchmark.yml:79, 89

FIXMEs indicate test configuration (only root route, only Fortio) rather than production defaults. Should remove FIXMEs and set proper defaults or create follow-up issue.

3. Missing Changelog Entry 📋

Per CLAUDE.md guidelines, user-visible changes should be documented. This is a significant addition for developers.

4. Aggressive Process Killing ⚠️

Location: .github/workflows/benchmark.yml:374, 523

Kills ALL Ruby/Node processes which could affect other jobs in shared CI. Should store PID and kill specifically.

5. Port Hardcoding 🔧

Workflow hardcodes port 3001 in health checks despite configurable BASE_URL.

6. Validation Order Issue

RATE=max validation happens AFTER route detection. Move all validation to top.

💡 Suggestions

Parallel warmup for better performance
Metrics comparison logic across tools
Usage documentation and result interpretation guide
Dry run mode for quick validation
Unit tests for validation and parsing logic

🎯 Summary

Solid foundation for performance testing. Main concerns:

MUST FIX: Command injection security issues
SHOULD FIX: FIXME comments and test configuration
NICE TO HAVE: Changelog, better process management, docs

Overall: Approve with required changes ✋

Dec 10 '25 11:12 claude[bot]

Pull Request Review: Performance Testing Infrastructure

I've reviewed PR #1868 which adds comprehensive performance testing infrastructure.

Overall Assessment: STRONG - Well-designed with some important considerations

This PR adds valuable performance benchmarking infrastructure using industry-standard tools (Fortio, Vegeta, k6). The implementation is thoughtful, configurable, and follows good practices.

Strengths

Excellent Architecture & Design:

Multi-tool approach: Using 3 different benchmarking tools provides cross-validation of results
Configurable parameters: Environment variables allow flexible CI/CD integration
Max-rate strategy: Smart choice for library benchmarking (documented in docs/planning/library-benchmarking.md)
Auto-discovery: Automatically detects routes from Rails app - brilliant!
Production mode testing: Tests with precompiled assets, which is the right approach

Robust Implementation:

Proper error handling: bench.rb validates inputs, checks for missing tools, and provides clear error messages
Server health checks: Waits for server readiness before benchmarking (60s timeout with detailed logging)
Warmup phase: 10 requests per route before benchmarking - essential for accurate results
Route filtering: Skips routes with required params and test-only routes
CI caching: Aggressive caching of dependencies, gems, and benchmark tools

Good DevEx:

Clear documentation: library-benchmarking.md explains the max-rate vs fixed-rate trade-offs
Readable output: Tab-separated summary with column formatting
Artifact uploads: 30-day retention of benchmark results
SSH debugging: Optional tmate access for CI debugging (properly gated)

Security Conscious:

SSH access requires explicit opt-in via debug_enabled input
Clear warning messages about security risks
limit-access-to-actor restricts access appropriately

Dec 12 '25 17:12 claude[bot]

Issues & Concerns

CRITICAL: Incomplete Pull Request

BLOCKER - Must address before merge:

Pull request checklist not completed:
- Update documentation - marked incomplete
- Update CHANGELOG file - marked incomplete
Missing changelog entries:
- This is a significant user-facing feature (benchmark workflow)
- Should have entries in CHANGELOG.md documenting the new benchmark.yml workflow
- How to run benchmarks via workflow_dispatch inputs
- How to interpret results

HIGH PRIORITY: Testing Gaps

No local testing evidence:
- Per CLAUDE.md CRITICAL LOCAL TESTING REQUIREMENTS: NEVER claim a test is fixed without running it locally first
- Has this workflow been tested locally with act, or in a real CI run?
- Have all 3 benchmark tools been verified to work correctly?
- Has the Pro app benchmarking been tested with PRO=true?
No validation of benchmark results:
- The workflow validates that summary.txt exists, but doesn't validate the data quality
- Should check for: non-zero RPS, reasonable latencies, no all-failures
- Consider adding basic sanity checks (e.g., RPS > 1, p99 < 10000ms)
Missing error scenarios:
- What happens if all routes have required params? (would fail with No routes to benchmark)
- What if server crashes during benchmark?
- What if all 3 tools fail?

Dec 12 '25 17:12 claude[bot]

MEDIUM PRIORITY: Code Quality

Magic numbers and unclear defaults (benchmarks/bench.rb:410):
- 10 warmup requests and 0.5s delay should be constants or ENV vars
- Consider: WARMUP_REQUESTS and WARMUP_DELAY_SEC as env vars
Hardcoded port in Core app (.github/workflows/benchmark.yml:306):
- Port 3001 is hardcoded but should match server startup
- Consider extracting to an env var
Inconsistent tool result handling:
- Fortio/Vegeta/k6 methods return nil if tool not enabled
- Then if metrics check before adding to summary
- Could be cleaner with early returns or wrapper method
RuboCop disable without explanation (benchmarks/bench.rb:226):
- Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity disabled
- These are legitimate disables for benchmark parsing methods, but complex methods warrant refactoring consideration

MEDIUM PRIORITY: CI/CD Concerns

Benchmark workflow triggers:
- Runs on ALL pushes to master (excluding docs)
- Runs on ALL PR sync/reopen events if labeled
- Question: Is this too aggressive? Benchmarks can be slow (2+ min per route × tools)
- Consider: Only run on master, or require manual trigger/label for PRs
Port cleanup race condition (.github/workflows/benchmark.yml:368-383):
- pkill -9 is aggressive (SIGKILL)
- Consider graceful shutdown first: pkill -TERM puma && sleep 2 && pkill -9 puma
Missing workflow timeout:
- Individual steps have timeout-minutes: 120
- But no job-level timeout
- Consider adding: timeout-minutes: 180 at job level
Artifact naming collision:
- Uses github.run_number which could collide if multiple PRs run simultaneously
- Consider: github.run_id instead

Dec 12 '25 17:12 claude[bot]

Specific File Feedback

benchmarks/bench.rb:

Lines 59-67: Excellent error handling with descriptive messages ✅
Lines 79-136: Route discovery is clever! Consider extracting to a method for testability
Lines 203-210: Love the detailed attempt logging with timing ✅
Lines 228-279: Fortio implementation solid, good use of failure_metrics helper

.github/workflows/benchmark.yml:

Lines 71-88: Smart env var defaults using ternary operators ✅
Lines 142-193: Excellent tool caching strategy ✅
Lines 278-290: Good asset validation before server start ✅
Lines 521-537: Nice workflow summary! ✅

benchmarks/k6.ts:

Lines 21-35: Good error handling for missing TARGET_URL ✅
Lines 38-56: Smart scenario switching based on rate mode ✅
Lines 70-76: Status code checks are comprehensive ✅

react_on_rails/spec/dummy/bin/prod & react_on_rails_pro/spec/dummy/bin/prod:

Asset validation: Excellent checks for manifest.json and public/assets ✅
Staleness warnings: Great UX for developers! ✅
Different approaches: Core uses Rails directly, Pro uses foreman/overmind - is this intentional?

Dec 12 '25 17:12 claude[bot]

Recommendations

Before Merging (MUST DO):

Complete the PR checklist - Add CHANGELOG.md entry documenting the new benchmark workflow and update documentation
Verify local testing - Run the workflow manually via workflow_dispatch or document tested workflow run number
Add basic sanity checks - Validate benchmark results have reasonable values and exit with error if all routes fail

Consider (SHOULD DO):

Extract magic numbers to constants/env vars (warmup count, delays)
Add job-level timeout to workflow
Use graceful shutdown before SIGKILL
Improve artifact naming to avoid collisions using github.run_id
Document when to worry threshold for regressions

Future Enhancements (COULD DO):

Regression detection comparing against baseline from master
Multiple samples - Run each route 2-3 times, average results
Performance dashboard uploading results to time-series DB
Alert on regressions via PR comments if performance drops

Security Review

No security concerns - SSH debugging is properly gated and documented

Verdict

Status: CHANGES REQUESTED

This is excellent work that adds valuable infrastructure! The implementation is solid and well-thought-out. However, it needs:

Completed PR checklist (CHANGELOG + docs)
Evidence of testing (workflow run or local validation)
Basic result validation

Once these are addressed, this will be ready to merge. Great job on the comprehensive approach!

Estimated review time: 45 minutes Confidence level: High (reviewed all code, workflows, docs, and configs)

Dec 12 '25 17:12 claude[bot]

Code Review for PR #1868: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This addresses issue #1777 excellently.

✅ Strengths

1. Comprehensive Testing Setup

Multi-tool approach (Fortio, Vegeta, k6) validates results
Well-documented max-rate benchmarking decision
Smart auto-detection of routes from Rails
Good handling of optional vs required route parameters

2. Robust Implementation

Input validation for all parameters
Server readiness checks with timeout
10-request warmup per route
Artifact preservation (30-day retention)

3. CI/CD Integration

Well-structured workflow with clear steps
Conditional execution (labels, workflow_dispatch, master)
Good caching strategy
Proper process cleanup

4. Developer Experience

SSH debugging (properly secured)
Configurable via workflow inputs
Clear error messages
bin/prod and bin/prod-assets scripts

⚠️ Issues

1. Port Cleanup (MEDIUM) - .github/workflows/benchmark.yml:368

Uses pkill -9 (SIGKILL) - no graceful shutdown
Recommend: Try pkill -15 first, then -9 after delay

2. Hardcoded Port 3001 (LOW)

Multiple files hardcode port 3001
Recommend: Make configurable via PORT env var

3. Magic Numbers (LOW) - benchmarks/bench.rb:410,412

Warmup count (10) and delay (0.5) not constants
Recommend: Extract to WARMUP_REQUESTS, WARMUP_DELAY_SEC

4. Documentation Reference (LOW) - CLAUDE.md:215

References spec/dummy/TESTING_LOCALLY.md
Recommend: Verify exists or create it

📋 Required Before Merge (P0)

Per CLAUDE.md requirements:

✅ Run bundle exec rubocop and fix all violations
✅ Ensure all files end with newline characters
📝 Update CHANGELOG.md with this feature

🎉 Conclusion

Excellent implementation! Address the P0 items (especially RuboCop and CHANGELOG per CLAUDE.md) and it's ready to merge.

Great work! 🚀

Review by: Claude Code Date: 2025-12-12 19:08 UTC

Dec 12 '25 19:12 claude[bot]

Code Review: Performance Testing Infrastructure

This PR adds comprehensive performance testing infrastructure. Overall well-structured.

✅ Strengths

Excellent architecture with clean separation of concerns
Comprehensive tooling (k6, Fortio, Vegeta)
Robust error handling and validation
Production-ready scripts with staleness checks

⚠️ Must Fix Before Merge

1. CRITICAL: Missing CHANGELOG Entry

Per CLAUDE.md, user-visible changes must be documented. Add entry to CHANGELOG.md

2. RuboCop Compliance Required

Per CLAUDE.md: Run bundle exec rubocop benchmarks/ and fix ALL violations

3. Documentation Needed

Add benchmarks/README.md explaining how to run locally and interpret results

Should Fix (Nice to Have)

Add input validation in k6.ts for numeric parameters
Consider if SSH access in workflow is truly necessary

Rating: 8.5/10

Would be 9.5/10 with CHANGELOG and docs. Great infrastructure code!

Dec 14 '25 09:12 claude[bot]

Code Review: Add Performance Tests

Thank you for this comprehensive benchmarking implementation! This is a valuable addition to the project. Below is my detailed review covering code quality, potential issues, security, and best practices.

Overall Assessment

This PR successfully adds performance testing infrastructure with k6 and Vegeta. The implementation is well-structured, follows good practices, and integrates cleanly with the existing CI infrastructure.

Strengths

1. Architecture & Design

Modular structure: Shared helpers in benchmark_helpers.rb promote code reuse
Flexible configuration: Environment variables allow easy customization without code changes
Route auto-detection: Smart filtering of Rails routes eliminates manual maintenance
Multi-tool approach: k6 for HTTP/1.1 benchmarks, Vegeta for HTTP/2 Node Renderer tests

2. GitHub Actions Workflow

Conditional execution: Smart use of labels (benchmark, full-ci) prevents unnecessary runs
Efficient caching: Caches binaries (Vegeta), gems (foreman), and node_modules
Comprehensive validation: Checks assets are built before server starts
Artifact retention: Stores results for 30 days with proper naming
Clean error handling: set -e and explicit error messages throughout

3. Code Quality

Input validation: All parameters validated before use (rate, duration, connections)
Graceful degradation: Helpful error messages when tools/files missing
Documentation: Clear comments explain configuration and trade-offs

Issues & Recommendations

1. CHANGELOG Update Needed (REQUIRED)

Per CLAUDE.md guidelines, this PR adds a user-visible feature and should have a CHANGELOG entry.

Recommendation: Add to CHANGELOG.md:

[Unreleased]

Added

PR 1868 by alexeyr-ci2: Added performance benchmarking infrastructure with k6 and Vegeta, including GitHub Actions workflow for automated benchmarks

2. Security: Process Killing is Too Aggressive (HIGH PRIORITY)

Location: .github/workflows/benchmark.yml:310, :476

Current code: pkill -9 -f "ruby|node|foreman|overmind|puma"

This kills ALL Ruby and Node processes on the CI runner. While this works in isolated CI, it's fragile and could cause issues if GitHub changes runner isolation.

Recommendation: Use more targeted cleanup by killing only processes on port 3001 or saving PIDs when starting servers.

3. Hardcoded Password in Benchmark (MEDIUM)

Location: benchmarks/bench-node-renderer.rb:22-28

Parsing password from config with regex is fragile. Consider using environment variable instead for better maintainability.

4. RuboCop Violations (MEDIUM)

The code disables several RuboCop cops (Metrics/AbcSize, Metrics/CyclomaticComplexity). Consider refactoring complex methods into smaller functions for better maintainability and testability.

5. Missing Unit Tests

No unit tests for benchmark helper functions. Consider adding RSpec tests for:

validate_rate, validate_duration, validate_positive_integer
route_has_required_params?
sanitize_route_name
parse_json_file error handling

Security Review

Secrets Handling: Good

REACT_ON_RAILS_PRO_LICENSE_V2 properly uses GitHub Secrets
SECRET_KEY_BASE is a dummy value (not used for anything sensitive)

Command Injection: Safe

All user inputs are validated before use in shell commands
No direct interpolation of user input into shell commands

Resource Limits: Could Be Better

Benchmarks run with timeout-minutes: 120 - quite generous
Consider lowering to 60 minutes to prevent hung jobs

Action Items Summary

Must Fix (Before Merge):

Add CHANGELOG entry for this user-visible feature
Consider more targeted process cleanup (or document why aggressive approach is safe)

Should Fix (Nice to Have):

Refactor complex methods to avoid disabling RuboCop cops
Add unit tests for benchmark helper functions
Extract password to environment variable instead of parsing config

Could Fix (Future Work):

Add dry-run mode for testing
Create docs/guides/running-benchmarks.md
Extract server wait logic to reusable script

Conclusion

This is a solid, well-implemented PR that adds valuable performance testing infrastructure. The code quality is high, the GitHub Actions workflow is comprehensive, and the architecture is flexible.

The main concerns are:

Missing CHANGELOG entry (per project guidelines)
Aggressive process killing (works but fragile)
Some methods could be refactored for better maintainability

With the CHANGELOG entry added, this PR is ready to merge. The other suggestions are nice-to-haves that can be addressed in follow-up PRs.

Great work!

Review generated by Claude Code based on project guidelines in CLAUDE.md

Dec 14 '25 13:12 claude[bot]

Pull Request Review: Performance Testing Infrastructure

This PR adds a comprehensive performance testing framework to React on Rails.

STRENGTHS

Well-Architected Benchmark Infrastructure

Modular design: Separate scripts for Rails benchmarks and Node Renderer benchmarks
Shared utilities: Clean extraction of common functionality
Multiple tools: k6 for Rails, Vegeta with HTTP/2 for Node Renderer
Good documentation explaining the max-rate approach

Robust Error Handling

Comprehensive validation of all parameters
Tool availability checks before running
Server health checks with timeout/retry logic
Graceful error handling with clear messages

CI/CD Integration

Workflow triggers only when needed
Efficient caching of dependencies
Proper artifact retention (30 days)
Clean separation between Core and Pro benchmarks

ISSUES AND CONCERNS

Security:

Dangerous process killing (lines 314, 487) pkill -9 -f "ruby|node|foreman|overmind|puma"
- Broad regex pattern could kill unintended processes
- Recommendation: Use PID files for safer process management
Command injection vulnerability (bench.rb:58)
- Backtick execution with interpolated variable without escaping
- Current Risk: Low (APP_DIR is hardcoded), but dangerous pattern
- Recommendation: Use Shellwords.escape
Secrets handling (bench-node-renderer.rb:32)
- Password read from config file
- Ensure it is test-only and not in artifacts

Code Quality:

RuboCop violations not addressed
- Multiple rubocop:disable comments for complexity metrics
- Recommendation: Refactor methods to reduce complexity
Magic numbers (bench.rb:211)
- Hardcoded values like 10 warmup requests, 0.5s delay
- Recommendation: Extract to named constants

Testing and Documentation:

Missing CHANGELOG entry
- This is a significant new feature that needs changelog documentation per CLAUDE.md
No RSpec tests for benchmark helpers
- Complex validation logic in benchmark_helpers.rb lacks unit tests
Documentation improvements needed
- Missing local usage guide, result interpretation, and troubleshooting docs

RECOMMENDATIONS SUMMARY

Must Fix Before Merge:

Add CHANGELOG entry
Fix command injection vulnerability
Improve process killing to use PID files

Should Fix: 4. Refactor RuboCop-disabled methods 5. Add RSpec tests for benchmark_helpers.rb 6. Create comprehensive documentation

OVERALL ASSESSMENT

This is a high-quality addition with sound architecture and thorough implementation.

Code Quality: 4/5 Security: Minor issues that should be fixed Testing: No unit tests for helpers Documentation: 3/5 - Good inline docs, missing user-facing guide

Recommendation: Approve with minor changes. Security issues are low-risk but should be fixed.

Great work! This benchmark infrastructure will be invaluable for catching performance regressions.

Dec 15 '25 11:12 claude[bot]

@coderabbitai review

Dec 15 '25 13:12 alexeyr-ci2

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Dec 15 '25 13:12 coderabbitai[bot]

PR Review: Add Performance Tests

Overview

This PR adds a comprehensive benchmarking framework to React on Rails, including:

Ruby-based benchmark scripts for Rails routes and Node Renderer
k6 (HTTP benchmarking tool) integration
Vegeta integration (HTTP/2 support for Node Renderer)
GitHub Actions workflow for automated CI benchmarking
Production server setup scripts for dummy apps

✅ Strengths

Well-structured benchmark architecture:
- Shared helpers in benchmark_helpers.rb promote code reuse
- Clear separation between Rails route benchmarks and Node Renderer benchmarks
- Flexible configuration via environment variables
Robust error handling:
- Input validation for all parameters (rate, duration, connections)
- Proper timeout handling in server health checks
- JSON parsing errors are caught and reported clearly
Good CI/CD integration:
- Conditional workflow execution based on labels (benchmark, full-ci)
- Proper artifact retention (30 days)
- Integration with benchmark-action/github-action-benchmark for tracking trends
Production-ready configuration:
- Proper Puma worker configuration with preload_app!
- ActiveRecord connection management in on_worker_boot
- Asset precompilation setup
CHANGELOG updated: Properly categorized as "Developer (Contributors Only)"

⚠️ Issues & Recommendations

Critical

Security: SSH access in workflow (removed in later commits?)
- Early commits showed SSH access via mxschmitt/action-tmate
- Verify this was removed or properly gated in final version
- If present, ensure it's only for workflow_dispatch with explicit opt-in
Resource cleanup may be insufficient (line 353 in benchmark.yml):
```
pkill -9 -f "ruby|node|foreman|overmind|puma" || true
```
- This is VERY aggressive and could kill unrelated processes
- In CI it's safe (isolated environment), but worth a comment explaining why
- Consider more targeted killing using process group IDs
Missing RuboCop validation:
- Per CLAUDE.md: "ALWAYS run bundle exec rubocop and fix ALL violations"
- Ruby files in benchmarks/ should pass RuboCop
- Some files have rubocop:disable comments - verify these are justified

High Priority

Hardcoded passwords (bench-node-renderer.rb:32):
```
PASSWORD = read_password_from_config
```
- Reads password from client/node-renderer.js
- Document that this is for LOCAL/CI testing only, not production
- Consider using a well-known test password constant instead
URL encoding vulnerability (bench-node-renderer.rb:133):
```
def url_encode(str)
  URI.encode_www_form_component(str)
end
```
- This is correct, but the calling code concatenates strings instead of using form encoding
- Lines 145-148 manually build the body - this works but is fragile
- Consider using URI.encode_www_form with hash input instead
Shell injection risk (bench.rb:58):
```
routes_output = `cd #{Shellwords.escape(app_dir)} && bundle exec rails routes 2>&1`
```
- Good: Uses Shellwords.escape
- Bad: app_dir comes from a constant based on PRO env var
- If PRO env var is ever user-controlled, this is exploitable
- Since it's CI-only, probably fine, but worth documenting
No timeout on system calls:
- Line 161 in bench.rb: system("#{k6_command} | tee #{k6_txt}")
- Uses timeout-minutes: 120 in CI, but no timeout in Ruby
- k6/Vegeta have their own timeouts, so probably OK
- Consider adding a Ruby Timeout.timeout wrapper for safety

Medium Priority

Benchmark thresholds seem loose (benchmark.yml:316, 333):
```
alert-threshold: '150%'
fail-on-alert: true
```
- 150% regression before alerting seems high for performance tests
- Consider tightening to 125% or 120% for RPS
- Latency might need different threshold (e.g., 150% is more reasonable for spikes)
Missing file validation:
- bench-node-renderer.rb finds bundles in .node-renderer-bundles/
- Good error message if directory missing, but doesn't check bundle validity
- What if bundles are corrupted or incomplete?
- Add basic validation (check for package.json or specific files)
k6.ts uses strict status checks (k6.ts:74-79):
```
check(response, {
  status_200: (r) => r.status === 200,
  status_3xx: (r) => r.status >= 300 && r.status < 400,
  status_4xx: (r) => r.status >= 400 && r.status < 500,
  status_5xx: (r) => r.status >= 500,
});
```
- Good: Categorizes status codes
- Issue: Doesn't handle 1xx informational responses (rare but possible)
- Status 0 (connection failed) won't match any check - handle this case
Large file cleanup (bench-node-renderer.rb:210):
```
FileUtils.rm_f(vegeta_bin)
```
- Good practice to delete large binary files
- Consider doing the same for k6 if it generates large files

Low Priority / Style

Magic numbers in configuration:
- Line 211 in bench.rb: 10.times for warmup
- Line 30 in benchmark.yml: default: 10 for connections
- Consider extracting to named constants at top of files
Inconsistent metric naming:
- bench.rb outputs "p50(ms)", "p90(ms)", etc.
- convert_to_benchmark_json.rb creates "- p50 latency", "- p90 latency"
- Both are fine, but consider consistent capitalization/formatting
Documentation:
- docs/planning/library-benchmarking.md is mentioned in PR description
- Verify this document explains WHY these benchmarks matter
- Should include: how to interpret results, when to run locally, thresholds
Test coverage:
- No RSpec tests for benchmark scripts themselves
- Consider adding basic tests for validation functions, parsing logic
- At minimum, add smoke test to CI that validates scripts don't crash
Prettier/formatting:
- Verify k6.ts passes ESLint/Prettier
- Run pnpm run format.listDifferent to check
- The /* eslint-disable import/no-unresolved */ is fine since k6 is global

🔍 Code Quality Observations

Good Practices:

Frozen string literals throughout Ruby files
Descriptive variable names
Comprehensive comments explaining k6 script configuration
Proper use of set -e in bash scripts (workflow steps)

Potential Improvements:

Consider extracting common workflow logic to reusable actions
The workflow is 589 lines - consider splitting Core/Pro into separate jobs
Some repeated code in benchmark.yml (server start logic) - could be a script

🧪 Testing Recommendations

Before merging:

Run RuboCop: bundle exec rubocop benchmarks/
Test locally:
- Run ruby benchmarks/bench.rb with Core dummy app
- Run ruby benchmarks/bench-node-renderer.rb with Pro
- Verify all output files are generated correctly
Test workflow manually: Trigger via workflow_dispatch with various input combinations
Verify cleanup: Ensure server processes are killed properly after workflow

📝 Documentation Checklist

[ ] Add README in benchmarks/ directory explaining how to use scripts
[ ] Document environment variables in script headers (✅ partially done in k6.ts)
[ ] Explain benchmark interpretation in docs/planning/library-benchmarking.md
[ ] Add troubleshooting section for common issues (server won't start, tools not found)

Performance Considerations

Benchmarks will run on every push to master - this is expensive
Consider limiting to:
- Weekly schedule for full benchmarks
- Push to master only for smoke test (1 route, 10s duration)
- Full benchmarks only on labeled PRs or workflow_dispatch
Current config seems reasonable, but monitor CI minutes usage

Security Assessment

Overall security posture is good for a CI-only tool:

✅ Proper input validation prevents command injection
✅ Uses Shellwords.escape where appropriate
✅ Secrets properly handled via GitHub Secrets
⚠️ Verify SSH access removed/gated properly
⚠️ Document that passwords are test-only, not for production

Conclusion

This is a solid addition to the project. The benchmarking framework is well-designed, properly integrated into CI, and follows most project conventions.

Recommendation: Approve with minor changes

Priority fixes before merge:

Run RuboCop and fix any violations
Verify SSH access is properly removed/gated
Add a comment explaining the aggressive pkill usage in cleanup
Consider tightening alert thresholds from 150% to 125%

Everything else can be addressed in follow-up PRs if needed.

Great work on this comprehensive benchmarking system! 🎉

Dec 15 '25 13:12 claude[bot]

PR Review: Performance Testing Infrastructure

This is a comprehensive and well-structured PR adding performance benchmarking to the React on Rails project. Overall, the implementation is solid with good attention to detail.

Strengths

Architecture and Design

Excellent separation of concerns with benchmark helpers extracted into reusable module
Flexible configuration via environment variables with sensible defaults
Multi-tool approach: k6 for Rails benchmarks, Vegeta for Node Renderer with HTTP/2 support
Smart route detection: Automatically discovers benchmarkable routes from Rails with proper filtering
Well-documented strategy in docs/planning/library-benchmarking.md

Code Quality

Robust error handling with comprehensive parameter validation
Safety features: warm-up phase, server health checks with retries, proper cleanup
Structured output in multiple formats (JSON, text, summary)
Smart GitHub Actions integration with caching, artifacts, and conditional execution

Production Readiness

New bin/prod and bin/prod-assets scripts with validation
Properly configured Puma for benchmarking with worker processes and thread control
Well-organized CI workflow with clear steps, error handling, and summaries

Issues and Concerns

1. Security: Command Injection Risk (HIGH)

Location: benchmarks/bench.rb:59 and benchmarks/bench-node-renderer.rb:94

The URL construction in bench-node-renderer.rb uses string interpolation without escaping. While url_encode is used for the body, the URL itself should be shell-escaped.

Recommendation: Use system() with array arguments instead of backticks, or ensure all interpolated values are properly escaped with Shellwords.escape

2. Security: Process Killing (MEDIUM)

Location: .github/workflows/benchmark.yml:352-353, 564-565

The workflow uses pkill -9 -f with a broad regex that could kill unrelated processes. While the comment says it is safe in isolated CI, it is a dangerous pattern.

Recommendation: Track PIDs explicitly when starting servers and kill only those specific PIDs.

3. Hardcoded Credentials Reading (LOW)

Location: benchmarks/bench-node-renderer.rb:21-29

Reading passwords from config files with regex is fragile and could break with formatting changes.

Recommendation: Consider exposing the password as an environment variable.

Suggestions for Improvement

4. Missing Tests

There are no tests for the benchmark scripts. While these are developer-only tools, basic tests would catch regressions.

Recommendation: Add spec/benchmarks/ with unit tests for helper functions.

5. Workflow Complexity

The benchmark workflow is 588 lines - very large for a single workflow file.

Recommendation: Consider extracting repeated setup steps into reusable actions or splitting into separate workflows.

6. Magic Numbers

Extract magic numbers like warmup count (10) and delay (0.5) to named constants.

7. Error Handling

The workflow could provide more informative errors with additional context about server status when failures occur.

Testing Recommendations

Before merging, verify:

Manual test the workflow via workflow_dispatch with different parameter combinations
Test failure scenarios (server fails, invalid parameters, no routes, missing tools)
Test on a PR with benchmark label to ensure conditional triggers work
Verify artifact uploads are properly accessible

Overall Assessment

This is high-quality work that adds significant value to the project. The benchmarking infrastructure is well-designed, comprehensive, and production-ready.

Main concerns:

Security issues (command injection, aggressive process killing) - should be addressed before merge
Lack of tests - acceptable for MVP but should be added soon
Workflow complexity - not blocking but consider refactoring for maintainability

Recommendation: Approve with requested changes - fix the security issues, then merge.

Great work on this PR!

Dec 15 '25 14:12 claude[bot]

PR Review: Add Performance Tests

This PR adds comprehensive benchmarking infrastructure to the React on Rails project. Here's my detailed review:

🎯 Overall Assessment

Strong implementation with well-structured code, good documentation, and appropriate tooling choices. The PR successfully addresses #1777 by adding automated performance testing in CI.

✅ Strengths

1. Excellent Architecture

Clean separation of concerns with modular helper functions (benchmark_helpers.rb)
Reusable benchmarking scripts for both Core and Pro versions
Well-designed workflow with clear steps and error handling

2. Comprehensive Tooling

k6 for HTTP benchmarking (Core Rails routes) - modern, reliable, good reporting
Vegeta for HTTP/2 benchmarking (Pro Node Renderer) - appropriate for h2c protocol
Tool caching to speed up CI runs
Proper validation of benchmark parameters

3. Good Documentation

Clear library-benchmarking.md explaining the "max rate" approach and trade-offs
Inline comments explaining non-obvious logic
Good workflow parameter documentation
CHANGELOG entry (in Developer section - appropriate)

4. Strong Error Handling

Proper timeout checks for server startup
Validation of all benchmark parameters
Graceful handling of missing bundles
Clear error messages with actionable guidance

5. CI Integration

Smart conditional execution (master, workflow_dispatch, or labeled PRs)
Artifact uploads for troubleshooting
Integration with github-action-benchmark for tracking
Proper cleanup steps in always() blocks

🔍 Code Quality Issues

CRITICAL: Security - Credential Exposure

Location: benchmarks/bench-node-renderer.rb:20-29

def read_password_from_config
  config_path = File.expand_path(
    "../react_on_rails_pro/spec/dummy/client/node-renderer.js",
    __dir__
  )
  config_content = File.read(config_path)
  match = config_content.match(/password:\s*['"]([^'"]+)['"]/)
  match ? match[1] : raise("password not found in \#{config_path}")
end

Issue: The password is hardcoded in the config file and read via regex. While this is a test/benchmark password (not production), this pattern could be problematic if:

The config file is accidentally committed with a real password
Someone copies this pattern for production code

Recommendation:

Add a comment explaining this is ONLY for benchmarking and should never be used in production
Consider using an environment variable with a fallback to the config file
Ensure the password in the config file is clearly marked as a test password

MEDIUM: Changelog Placement

Location: CHANGELOG.md:24-26

Per CLAUDE.md guidelines:

Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements) Do NOT add entries for: linting, formatting, refactoring, tests, or documentation fixes

Issue: This is marked as "Developer (Contributors Only)" which is appropriate, but the CLAUDE.md guidance says "Do NOT add entries for... tests". This benchmarking infrastructure is specifically for contributors/CI, not end users.

Recommendation: This is actually fine as-is since it's in the "Developer (Contributors Only)" section, which is the correct place for CI/tooling improvements. Consider clarifying in CLAUDE.md that "Developer" section entries ARE acceptable for significant CI/testing infrastructure.

⚠️ Potential Issues

1. Process Management in CI

Location: .github/workflows/benchmark.yml:349

pkill -9 -f "ruby|node|foreman|overmind|puma" || true

Issue: Using pkill -9 with broad patterns could kill unintended processes in a shared CI environment.

Recommendation: While this is acceptable in isolated GitHub Actions runners, consider:

Using more specific process identifiers
Storing PIDs from the startup step
Using pkill -P (kill process group) instead

2. Hardcoded Port Numbers

Multiple locations:

Port 3001 for Rails server
Port 3800 for Node Renderer

Issue: If multiple benchmarks run concurrently (future scaling), ports could conflict.

Recommendation:

Current approach is fine for single-job workflow
If parallelization is added later, use dynamic port allocation
Document the port assumptions clearly (which you've done well)

3. No Baseline Protection

Location: .github/workflows/benchmark.yml:316, 525

fail-on-alert: true
alert-threshold: '150%'

Issue: 150% threshold is very permissive. A 50% regression is significant.

Recommendation:

Consider tightening to 125% or 130% once baseline is established
Add a comment explaining the rationale for the threshold
Monitor a few runs to see actual variance before tightening

4. Route Filtering Logic

Location: benchmarks/bench.rb:33-46

def route_has_required_params?(path)
  path_without_optional = path.gsub(/\([^)]*\)/, "")
  path_without_optional.include?(":")
end

Issue: This regex assumes parentheses are only used for optional params. If a route has nested parentheses or literal parentheses in the path (rare but possible), this could fail.

Recommendation:

Add a comment explaining the assumption
Consider using Rails route introspection API instead of regex parsing
Add a test case for this logic

5. Warm-up Strategy

Location: benchmarks/bench.rb:211-216

10.times do
  server_responding?(target)
  sleep 0.5
end

Issue: Fixed 10 requests might not be enough for JIT compilation, DB connection pooling, etc.

Recommendation:

Consider a time-based warm-up (e.g., 5-10 seconds)
Or adaptive warm-up (continue until latency stabilizes)
Current approach is acceptable for initial implementation

🎨 Style & Best Practices

Minor Issues:

RuboCop Disables
- benchmarks/bench.rb:140 and benchmarks/bench-node-renderer.rb:151
- Disabling complexity metrics is acceptable for benchmark scripts
- Consider extracting some logic to reduce complexity instead
Magic Numbers
- 30s duration, 60s timeout, 10 connections
- Well-documented in code and configurable via env vars ✅
- Good defaults chosen
Error Messages
- Clear and actionable throughout ✅
- Good use of emojis for visual scanning in CI logs

🧪 Test Coverage

Missing:

Unit tests for helper functions (benchmark_helpers.rb)
Tests for route filtering logic
Tests for JSON conversion

Recommendation:

Add RSpec tests for benchmark_helpers.rb functions
Test edge cases (missing files, invalid JSON, network errors)
Not blocking for merge, but should be added

🚀 Performance Considerations

Caching Strategy ✅
- Good caching of tools (Vegeta, foreman)
- Good caching of node_modules and gems
- Cache keys look correct
Benchmark Duration
- 30s per route is reasonable
- With ~15-20 routes, total benchmark time ~10-15 minutes
- Consider reducing to 20s if CI time becomes an issue
Parallel Execution
- Currently sequential (one app at a time)
- Could parallelize Core and Pro benchmarks in separate jobs
- Current approach is fine for initial implementation

📝 Documentation

Strengths:

Clear library-benchmarking.md with trade-off analysis ✅
Good inline comments ✅
Clear parameter documentation in workflow ✅

Missing:

How to run benchmarks locally (beyond env vars)
How to interpret results and identify regressions
What to do when benchmarks fail

Recommendation: Add a benchmarks/README.md with:

Local execution instructions
How to debug failures
How to add new benchmark routes
How to interpret github-action-benchmark results

🔒 Security Considerations

Issues Identified:

Password in source (covered above under CRITICAL)
SECRET_KEY_BASE in workflow
- 'dummy-secret-key-for-ci-testing-not-used-in-production'
- This is fine for benchmarking, clearly marked as dummy ✅
Pro License Secret
- Using GitHub Secrets properly ✅
- Secret name: REACT_ON_RAILS_PRO_LICENSE_V2

✨ Recommendations Summary

Before Merge (High Priority):

✅ Add security comment to password extraction function
✅ Verify CHANGELOG.md placement is correct (it is, in Developer section)
⚠️ Consider tightening alert threshold from 150% to 125-130% after baseline

Follow-up (Medium Priority):

Add benchmarks/README.md with usage instructions
Add unit tests for benchmark_helpers.rb
Consider using PID tracking instead of pkill -9 with broad patterns
Monitor initial benchmark runs and adjust thresholds/duration as needed

Future Enhancements (Low Priority):

Parallelize Core and Pro benchmarks in separate jobs
Add multiple samples per route to reduce variance
Consider adaptive warm-up strategy

🎉 Conclusion

This is excellent work with thoughtful design decisions and good engineering practices. The code is clean, well-documented, and properly integrated with CI.

Recommendation: Approve with minor comments addressed.

The identified issues are mostly minor, with the password handling being the only security concern (and it's mitigated by being test-only). The PR is ready to merge once the security comment is added.

Great job on this contribution! 🚀

📊 Stats

Lines added: 1,742
Lines removed: 39
Files changed: 30
New workflows: 1 (benchmark.yml)
New benchmarking scripts: 4
Documentation files: 1

Review performed following CLAUDE.md guidelines and React on Rails best practices.

Dec 15 '25 18:12 claude[bot]

@coderabbitai Review.

Dec 15 '25 22:12 alexeyr-ci2