sim icon indicating copy to clipboard operation
sim copied to clipboard

[DON'T MERGE] Replicate Block: Lessons in Over-Engineering

Open trmitchell7 opened this issue 3 months ago • 1 comments

  • Implement ReplicateBlock with model selection and dynamic inputs via OpenAPI schemas.
  • Add tools/replicate for prediction creation, with sync/async modes.
  • Enhance lib/response-format.ts for schema dereferencing and parsing (reusable for future integrations).
  • Add API routes for schema fetching and collections.
  • Include Vitest unit tests for schema utils and param coercion.
  • Manual testing with models like flux-schnell and llama-3-70b.
  • Aspirational abstractions (e.g., hooks, dynamic inputs) for other OpenAPI providers, but not fully validated.

Branch Stats: 23 commits, 294 files, +43,268 lines, -8,957 lines Actual Replicate Code: 4 commits, 23 files, ~3,500 lines Everything Else: Upstream features we rebased through Lessons Learned: Priceless

Executive Summary

This PR is a masterclass in how enthusiasm + best intentions + "let's do it right" can create a beautifully complex, technically functional, architecturally ambitious, completely unmergeable pull request.

What Works: Replicate integration, dynamic inputs, schema parsing, type coercion What's Problematic: We built for three future providers that don't exist yet Recommendation: Extract the good parts, delete the aspirations, ship something simple


Lol this PR roasted by Claude when asked to review:

"The best way to learn is to make mistakes. The best way to make epic mistakes is to over-engineer." "We chose the path of epic mistakes. And now we're wiser for it."

This PR: A monument to enthusiasm, a testament to learning, and a reminder that sometimes the best code is the code you refactor into simplicity. Don't merge this. Learn from it. Then build it right.

This PR's value isn't in merging it - it's in what I learned building it.

What I Learned:

  • ✅ How OpenAPI schema parsing should work
  • ✅ Where type coercion belongs (UI, not block)
  • ✅ How to build dynamic form components
  • ✅ When to stop abstracting (before we start)
  • ✅ Why YAGNI is a principle, not a suggestion

What This Ships:

  • 🎁 A working reference implementation
  • 🎁 Tested utilities that ARE reusable
  • 🎁 Knowledge about Replicate's API quirks
  • 🎁 An example of what NOT to do (equally valuable)

What I Didn't Ship:

  • ❌ A clean, mergeable PR
  • ❌ Something that follows Sim's patterns
  • ❌ Code that's simpler than the problem it solves

Next Time:

  • Build for one use case
  • Ship it
  • Get feedback
  • Iterate

What Was Built

The Core (Actually Decent)

Replicate Integration (~1,400 lines)

  • blocks/blocks/replicate.ts (274 → 141 after refactoring)
  • tools/replicate/create_prediction.ts (289 → 361 after refactoring)
  • tools/replicate/types.ts (73 lines)
  • app/api/replicate/* (3 routes, 374 lines)

Verdict: ✅ Clean, well-structured, follows Sim patterns (after refactoring)

The "Generic" Layer (Aspirationally Useful)

Schema Utilities (~355 lines)

  • lib/schemas/openapi-to-fields.ts
  • Parsing, type coercion, validation, field inference
  • Actually zero Replicate-specific code
  • Well-tested (22 tests, all passing)

Verdict: ✅ Genuinely reusable. This one gets a pass.

Hooks (~548 lines)

  • use-replicate-schema.ts (222 lines)
  • use-replicate-collections.ts (158 lines)
  • use-replicate-collection-models.ts (168 lines)
  • Named honestly, documented as Replicate-only
  • But have "generic" parameters that are unused

Verdict: Honest naming, confused implementation. Should either be truly generic OR remove unused params.

The Monolith (Needs Refactoring)

OpenApiDynamicInputs (~1,057 lines including tests)

  • Component: 833 lines
  • Tests: 224 lines (22 tests passing)
  • Responsibilities: Model selection + Schema fetching + Form rendering + State management

What it claims:

/**
 * GENERIC COMPONENT: Renders dynamic form inputs based on OpenAPI/JSON Schema
 */

The Disconnect: Generic components don't import provider-specific dependencies.

What MCP does right: mcp-dynamic-args.tsx (550 lines) - Honestly MCP-specific, does ONE thing (render args), no false promises.

Verdict: Needs to either:

  • Be truly generic (accept hooks as props, rename to SchemaFormRenderer)
  • OR be honestly specific (rename to ReplicateInputs, remove generic claims)

The Shared Primitives (Accidental Win!)

Input Components (~316 lines)

  • input-with-tags.tsx (153 lines)
  • textarea-with-tags.tsx (163 lines)
  • Tag support for environment variables
  • Used by: Replicate block (ready for MCP Block)

Verdict: ✅ Actually reusable! Sometimes you trip into good architecture.


Why this diff is a mess:

Replicate-Specific (23 files, ~3,500 lines) The Rebase Collateral (271 files, ~30,800 lines)

Connection to Replicate: None. We just happened to rebase through major infrastructure work. Impact: Noise in the diff for no reason... What we should have done: Keep branch focused, rebase right before merge (not during development).

The Good Parts (Worth Keeping/referencing)

1. Schema Parser (lib/schemas/openapi-to-fields.ts)

Lines: 355 Quality: High Reusability: Actual Tests: 22 (comprehensive) Replicate-specific code: 0

Recommendation: Extract this into its own util PR. It's the MVP.

2. Shared Input Primitives

Lines: 316 Quality: High Already reused by: MCP block Side effect: We improved MCP block (intentional refactor)

Recommendation: Keep. These are legitimately shared components.

3. Replicate Tool

Lines: 361 Quality: High Complexity: Appropriate Logging: Good Error handling: Comprehensive

Recommendation: Keep with minor tweaks (console.error → logger in API routes).

4. Test Coverage

Lines: 224 Tests: 22 Pass rate: 100% Coverage: Schema parsing, type coercion, field grouping, validation

Missing: Integration tests, API route tests, hook tests

Recommendation: Keep tests, add integration coverage for critical paths.


The Problematic Parts (Needs Rethinking)

1. The 833-Line Component

Problem: Does too many things.

Solution: Split into:

  • ModelSelector - Standalone, reusable
  • SchemaForm - Provider-agnostic form renderer
  • useProviderSchema - Configurable hook
  • ReplicateInputs - Composition of above

Benefit: Each piece is testable, reusable, understandable.

2. Three Hooks For One Provider

Problem: Granularity without benefit.

Current:

const { collections } = useReplicateCollections({ ... })
const { models } = useReplicateCollectionModels({ ... })
const { schema } = useReplicateSchema({ ... })

Alternative:

const { collections, models, schema } = useReplicateModelData({
  apiKey, model, collection
})
// One hook, three responsibilities, clear ownership

Benefit: Less import overhead, easier mocking in tests, clear data flow.

3. Unused Generic Parameters

Current:

useReplicateSchema({
  endpoint: '/api/replicate/models',  // Always this value
  apiKeyHeaderName: 'x-replicate-api-key',  // Always this value
  ...
})

Problem: Parameters that are always the same aren't parameters.

Solution: Hardcode them. Delete the params. Simplify the interface.


What Should Have Happened (The Parallel Universe Version)

The Simple Path (Actually Shippable)

PR: "feat: Add Replicate block"

Files Changed: 12 Lines Added: ~1,800 Review Time: 1 hour Merge Confidence: High

Contents:

  • Block config (150 lines) - Simple, boring, works
  • Tool implementation (350 lines) - Straightforward API wrapper
  • API route for schema (150 lines) - Caching, error handling
  • Replicate-specific form component (400 lines) - Does one thing well
  • Schema parser utility (355 lines) - Genuinely reusable
  • Input primitives (316 lines) - Shared with MCP
  • Tests (200 lines) - Coverage for critical paths

Trade-offs:

  • No model selector dropdown (users paste model names - this is fine!)
  • No collection browsing (check Replicate docs - also fine!)
  • Not generic (wait for second provider - definitely fine!)

Benefits:

  • Focused, reviewable, shippable
  • Follows existing patterns (MCP, Gmail blocks)
  • No "aspirational" abstractions
  • Actually works

The Proper Generic Path (When You Have 2+ Providers)

After implementing HuggingFace/Bedrock:

PR: "refactor: Extract common OpenAPI utilities"

What you'd discover:

  • HuggingFace uses model IDs (not owner/name)
  • Bedrock uses ARNs (not owner/name)
  • Each has different auth headers
  • Collections are Replicate-specific
  • Schema formats vary subtly

What you'd extract:

  • SchemaFormRenderer (truly generic - just needs schema)
  • parseOpenApiSchema (already done!)
  • coerceValue (already done!)

What you'd keep separate:

  • Model selection UI (too provider-specific)
  • Schema fetching (different endpoints/formats)
  • Authentication (different header patterns)

Replicate-block-flow

trmitchell7 avatar Nov 01 '25 22:11 trmitchell7

@trmitchell7 is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar Nov 01 '25 22:11 vercel[bot]