From single-model direct connections to dynamic multi-provider routing: what we learned and the architecture decisions we landed on.
Interaction connects to over a dozen model providers. Users never need to know whether a request went to Claude or GPT — the system selects automatically based on task type, latency expectations, and quota state.
The hard part was not writing the routing logic but balancing silent fallback against user-perceived consistency.
The routing layer is built around the ModelInfo table: each model carries attributes like minTier (access threshold), costTier (billing weight), supportsThinking, and useResponsesApi. The router scores and ranks models at runtime using these attributes.
The hard part was not writing the routing logic but balancing silent fallback against user-perceived consistency. We settled on silent downgrade with a post-hoc "used model X" hint in the UI.