Commit Messages

Generative UI in Rails with RubyLLM

2026-05-21T00:00:00+00:00

Chat is a natural interface for LLMs, and a lot of things work well inside it. But visual responses — cards, buttons, choices, inline widgets — have already become a baseline user expectation: tables and charts in ChatGPT answers, quick replies in support bots, forms and confirmations in banking assistants.

I want to give the model that same freedom of form in my Rails app — without letting it generate the HTML directly. Let the LLM choose what to show; the application still decides how to render it.

Plain chat: text

Let’s spin up a chat on RubyLLM and give it one ordinary tool right away — a weather data source:

rails new chat_app --css tailwind
cd chat_app
bundle add ruby_llm
bin/rails generate ruby_llm:install
bin/rails db:migrate
bin/rails generate ruby_llm:chat_ui
bin/rails g ruby_llm:tool Weather

class WeatherTool < RubyLLM::Tool
  description "Get current weather for a location."

  params do
    number :latitude
    number :longitude
    string :location
  end

  def execute(latitude:, longitude:, location:)
    Weather::OpenMeteo.fetch(latitude:, longitude:, location:).to_json
  end
end

Now the chat can fetch data — and by default it answers with text.

The most direct way to make that answer more visual is to render the tool result nicely. That’s how many chat-based agents work: instead of raw JSON the user sees a card with the result of the tool the model called.

<%# app/views/messages/tool_results/_weather.html.erb %>
<% weather = JSON.parse(tool.content).symbolize_keys %>
<%= render "components/weather", **weather %>

Tool-result UI answers the question “what did the agent just do?”. Generative UI answers “what form of response is most useful to the user right now?”

Showing tool results is useful: the user sees which tools the model called and what data it worked with. But the more internal steps an agent takes, the more a detailed log of its work starts to crowd the main interface. Generative UI solves a different problem: instead of painting in the whole execution trail, it lets the model choose the form of the final answer.

What if, instead of visualizing the result of WeatherTool, we wanted the model itself to ask for a weather card to be shown?

Generative UI with Per-Component Tools

For the view layer to render a component, it needs two things: a component name and attributes (often called props). A tool call already has both: the tool name and its arguments. That’s the first important shift — the data for the UI lives not in the tool’s result, but in the arguments of the call itself. Which means we can introduce a dedicated tool for this component: one that computes nothing, and just serves as a render signal.

class WeatherWidgetTool < RubyLLM::Tool
  description "Render a weather widget inline in the chat."

  params do
    string :location
    number :temperature
    string :unit, required: false
    number :wind, required: false
    number :humidity, required: false
  end

  def execute(location:, temperature:, unit: "c", wind: nil, humidity: nil)
    errors = []
    # validate attributes...

    errors.any? ? { status: "invalid", errors: }.to_json : { status: "ok" }.to_json
  end
end

The tool result here is deliberately short. Its job isn’t to carry UI — only to tell the model whether the arguments were accepted. If you put HTML or the full component data into the tool result, the model will see it on the next step and is very likely to start paraphrasing what’s already on screen. halt looks tempting, but it pulls the conversation into a different problem: the history ends up with an “assistant message” that the assistant never actually wrote.

The view’s only job left is to read the arguments of the parent tool call and render an allow-listed component:

<%# app/views/messages/tool_results/_weather_widget.html.erb %>
<% result = JSON.parse(tool.content.to_s) rescue {} %>
<% args = tool.parent_tool_call.arguments.to_h.symbolize_keys %>

<% if result["status"] == "ok" %>
  <%= render "components/weather",
        **args.slice(:location, :temperature, :unit, :wind, :humidity) %>
<% end %>

One small catch: even with a minimal tool result, the model still tends to add a prose reply under the rendered widget. The cleanest fix is to tell it in system instructions that the widget tool call is the answer — we’ll come back to the same trick with generate_ui below.

For a single component this already works well: the UI component lives outside the model, and the model just picks it and fills in the attributes. But the approach hits a ceiling fast.

The Limits of Per-Component Tools

The weather card is useful on its own, but the next step is more interesting: the model can pick on the fly between cards, containers, buttons, forms, and confirmations — and assemble a response shaped to the task. The same primitives can combine into useful scenarios the developer never planned for. For that you need a shared UI vocabulary, not a separate tool for every new scenario.

Ask the model to compare the weather in two cities, and it calls WeatherWidgetTool twice and shows two cards. But comparison as a structure never appears: the cards just stack on top of each other.

Actual: just widgets

Target: composed components

Another familiar case is clarification. A user asks the assistant: “What is the weather in Springfield?” The assistant thinks, calls some tools, and then instead of answering writes: “Which Springfield did you mean?” — forcing the user to type the city or state all over again. Springfield is ambiguous, so the clarification has to happen. The real question is how the user answers it — typing the city or state again, or picking from options with a single click. The interface can do better here: show a picker right away, where the user’s next turn is one tap. The renderer just shows the buttons; turning a click into a new message is the application’s job.

Actual: just text

Target: interactive picker

Both cases can be solved with dedicated tools: WeatherComparisonTool for comparison, CityPickerTool for clarification, then ConfirmationTool for confirmations. For a simple product this is a perfectly fine path. The limit shows up later: the catalog starts to grow not by the number of reusable primitives, but by the number of scenarios. Every new UX gesture demands a new top-level tool.

One Tool to Compose Them All

This points to a different move: keep one general-purpose tool for generative UI, and pass the entire response structure as its argument instead of a single component. Let’s call it generate_ui. Same idea — tool call as UI payload — only now the arguments hold not one card, but the whole component tree.

The structure is easiest to pass flat: a single components array, one component with id: "root", and connections between components as plain id references. The component catalog shows up directly in the payload: each node names its own component and carries the attributes declared for it.

{
  "components": [
    {
      "id": "root",
      "component": "Comparison",
      "items": ["novi-sad", "belgrade"]
    },
    {
      "id": "novi-sad",
      "component": "WeatherCard",
      "location": "Novi Sad",
      "temperature": 22,
      "unit": "c"
    },
    {
      "id": "belgrade",
      "component": "WeatherCard",
      "location": "Belgrade",
      "temperature": 24,
      "unit": "c"
    }
  ]
}

This is still a tree in normalized form: one root, references only by id, no cycles, no orphans, no child reused in two places. But now the link between the wire format and the catalog is direct. If a node says component: "WeatherCard", it carries that component’s attribute schema. If Comparison.items is declared as a list of component references, the application can check not just the array type but that the items inside really are weather cards.

With this shape, both earlier cases become compositions rather than new top-level tools: Comparison wraps two WeatherCards, and Picker wraps several QuickReplys.

`Picker` + three `QuickReply`s give a clarification flow without a separate `CityPickerTool`.

`Comparison` + two `WeatherCard`s give a side-by-side comparison of two cities.

The simplest hand-rolled version looks like a single presentation tool. It fetches nothing and has no side effects. It accepts a UI tree, validates it, and returns a short status.

In the example below, params describes the shape of individual components, and UiTree.validate would be the application’s validator for whole-tree invariants: exactly one root, all references resolve, no cycles, no orphans, no child reused in two places. COMPONENT_CATALOG here is the same kind of hand-rolled Ruby structure that lists the available components and their relationships. The code of the validator and that structure isn’t what matters here; what matters is the boundary itself — the schema helps the model hit the expected shape, but the application still treats the result as an external payload before rendering.

class GenerateUiTool < RubyLLM::Tool
  description <<~TEXT
    Render inline UI from the available component catalog.

    Arguments:
    - components is a flat array of component instances.
    - One component must have id="root".
    - Component reference fields point to other component ids.
    - The accepted payload must form one rooted tree.

    Available components:
    - WeatherCard: show current weather for one location.
    - Comparison: compare several weather cards side by side.
    - Picker: ask the user to choose between ambiguous options.
    - QuickReply: clickable option that sends a short reply back into the chat.
  TEXT

  params do
    array :components do
      any_of do
        object do
          string :id
          string :component, enum: ["WeatherCard"]
          string :location
          number :temperature
          string :unit, enum: %w[c f]
        end

        object do
          string :id
          string :component, enum: ["Comparison"]
          array :items, of: :string, min_items: 2
        end

        # ...and the same kind of schema for Picker and QuickReply.
      end
    end
  end

  def execute(**args)
    errors = UiTree.validate(
      args.fetch(:components),
      catalog: COMPONENT_CATALOG # same catalog, now used by the server validator
    )

    errors.any? ? { status: "invalid", errors: }.to_json : { status: "ok" }.to_json
  end
end

If the tree is invalid, the model gets a short list of errors and can correct itself on the next step. If the tree is valid, the application can take the arguments of that call and render them as the response to the user.

This sketch shows the mechanics — and at the same time exposes what’s missing: a single place where the catalog is assembled. The catalog is needed by several parts of the system at once:

the LLM — so it knows from system instructions which components are available and when to use them;
the provider — to derive the JSON Schema for the generate_ui arguments;
the server — to validate the tree, component references, and attributes before rendering;
the view layer — to know how each component should be rendered.

If these four views live separately, the UI tool quickly turns into a set of conventions you have to keep in sync by hand. So we need a single Ruby catalog from which instructions, schema, validation, and render targets are all derived.

Same UX trick as with the widget tool earlier: the system instructions should tell the model that calling generate_ui is the answer — no final text needed. The wiring is shown in the system prompt below.

The `GenerativeUI` gem

GenerativeUI starts from a single idea: describe UI components once in a catalog, and derive everything else from it. In the current Rails/RubyLLM integration this catalog is wired up through the generate_ui tool, but the heart of the library isn’t the transport — it’s the catalog DSL. The demo app shows everything wired together end-to-end.

class ApplicationGenerativeCatalog < GenerativeUI::Catalog
  component "WeatherCard" do
    desc "Show current weather for one location."

    attributes do
      string :location
      number :temperature
      string :unit, enum: %w[c f]
      string :condition, required: false
      number :wind, required: false
      number :humidity, required: false
    end

    present_with :partial, "generative_ui/weather_card"
  end

  component "Comparison" do
    desc "Compare several weather cards side by side."

    attributes do
      many_components :items, only: "WeatherCard", min_items: 2
    end

    present_with :partial, "generative_ui/comparison"
  end

  component "QuickReply" do
    desc "Clickable option that sends a short reply back into the chat."

    attributes do
      string :label
      string :value
    end

    present_with :partial, "generative_ui/quick_reply"
  end

  component "Picker" do
    desc "Ask the user to choose between ambiguous options."

    attributes do
      string :prompt
      many_components :options, only: "QuickReply"
    end

    present_with :partial, "generative_ui/picker"
  end
end

This single declaration captures everything we used to keep in sync by hand:

the component name in the payload (WeatherCard, Comparison, Picker);
the description for the LLM (desc);
the attribute schema (attributes);
the structural relationships between components (one_component, many_components);
the render target for the application: convention-over-configuration by default, with present_with to point at a specific partial or component.

From this catalog the gem derives component descriptions for the LLM, the schema for generate_ui arguments, tree validation, and render targets.

The catalog can be registered as the default so you don’t have to pass it explicitly to every tool and render call:

# config/initializers/generative_ui.rb
GenerativeUI.configure do |config|
  config.catalog :default, "ApplicationGenerativeCatalog"
end

After that, the chat gets a catalog-bound tool:

tool = GenerativeUI::Tool.new

chat = Chat.create!
chat.with_instructions(<<~PROMPT)
  You are a helpful weather assistant.

  Tool guidance:
  - Use generate_ui when the answer should be shown as UI.
  - IMPORTANT: after calling generate_ui, do not add a final text answer.
    The tool call itself is the user-visible UI response.
PROMPT
chat.with_tool(tool)

chat.ask("Compare the weather in Novi Sad and Belgrade.")

GenerativeUI::Tool takes the selected catalog, merges its description into the tool description, and compiles the provider-facing JSON Schema for the arguments. At call time the tool validates the tree and returns a short result: { "status": "ok" } or { "status": "invalid", "errors": ... }.

In Rails views, the user-facing UI is rendered from the tool call’s arguments:

<%# app/views/messages/tool_calls/_generate_ui.html.erb %>
<% begin %>
  <%= render_generative_ui tool_call.arguments %>
<% rescue GenerativeUI::InvalidComponentTreeError => e %>
  <% ActiveSupport::Notifications.instrument(
       "invalid_tree.generative_ui",
       error: e,
       tool_call: tool_call
     ) %>
<% end %>

The tool’s status result is best hidden from the user:

<%# app/views/messages/tool_results/_generate_ui.html.erb %>
<%# intentionally empty: { "status": "ok" } is control data, not UI %>

By this point the tree has already passed the catalog + validator check. The partial doesn’t decide whether the component can be trusted; it just receives the validated attributes and turns them into HTML.

A leaf component gets plain locals:

<%# app/views/generative_ui/_weather_card.html.erb %>
 class="rounded-2xl border p-4">
  <%= location %>
   class="text-3xl font-semibold"><%= temperature %>°<%= unit.upcase %>



  <% if condition.present? %>
    <%= condition %>
  <% end %>

A container component receives children that are already rendered:

<%# app/views/generative_ui/_comparison.html.erb %>
 class="grid gap-4 md:grid-cols-2">
  <% items.each do |item| %>
    <%= item %>
  <% end %>

The partial is just one of several render paths. The gem can also render through ViewComponent or return a JSON representation of the tree; if you need something else, the application can register its own renderer.

One caveat worth naming: even with explicit system instructions, models sometimes still add a short prose answer to a turn that already produced a generate_ui or widget tool call. The behavior varies between providers, models, and even individual requests. If the duplication matters for your product, the application can handle it on the view side — for example, by suppressing trailing assistant text on turns that already rendered UI. It’s a small ergonomics layer, not a structural problem with the approach.

Where this leaves us

Generative UI today means very different things to different people: from polished rendering of tool results to an interface that rebuilds itself in real time around the user’s intent.

I tried to focus on something more down-to-earth: what you can do right now in an ordinary chat-based application — without a custom runtime, without HTML generated by the model, and without rebuilding the product from scratch. All it takes is to give the LLM not the whole screen, but a strictly described yet composable catalog of components.

In this approach the model doesn’t draw the interface directly. It picks from allowed primitives and assembles a response tree out of them. The application validates that tree and renders it with its own renderers. Because of that, the final UI isn’t tied to one platform: the same generative payload can be shown on the web through Rails partials, in a mobile app through native components, and in Telegram or WhatsApp — through their buttons, lists, and messages.

Generative UI is no longer just “a chat with widgets,” but it hasn’t settled into one canonical pattern yet either. A good way to find out where its real shape lies is to assemble a small catalog of components and try it out in a live conversation.

Share Extension Auth in iOS 18: Four Approaches Compared

2026-01-14T00:00:00+00:00

For years, the Share icon on mobile was just visual noise to me — something that kept appearing everywhere but I never used. Then something clicked: Share is like a Unix pipe. You take content from one app and send it to another in one action. No copy-paste, no saving files, no searching for “import” — just pick the next tool and continue the chain. The only difference is that instead of a stream, Share passes a package (a link, text, a file, or several photos).

share ≈ pipe

The problem is that Unix pipes live inside a single user environment, while Share crosses app boundaries: separate sandboxes, separate processes, separate security rules. So the task “pass data” quickly becomes “pass data in the user’s context”: the receiver needs to know who the current user is and where to put this package. In my case: a user shares a link to my app, but processing happens on the server, so the user needs to be authenticated first. What do you do when a Share request arrives but there’s no session — or the session isn’t available right now?

The naive solution: “just open the app”

The first idea that comes to mind: the extension detects that the user is not logged in and opens the main app. The user logs in, goes back to Safari, taps Share again — and now everything works.

In code, it looks simple:

extensionContext?.open(URL(string: "dropkind://login")!) { success in
    self.extensionContext?.completeRequest(returningItems: nil)
}

Or via Universal Links:

extensionContext?.open(URL(string: "https://dropkind.app/auth")!)

This pattern worked for years. Share Extension acted as a launcher: it detected a problem, passed control to the main app, and closed.

In iOS 18, this stopped working.

What Apple broke (and why it’s not a bug)

When you try to open an app from a Share Extension in iOS 18, you get an error:

LSApplicationWorkspaceErrorDomain Code=115

This is not a bug or a temporary regression. Apple explicitly states that app extensions are not allowed to open URLs directly; runtime workarounds are being blocked. If you need the user’s attention — use a local notification.

Cold start / Warm start

In my experience, extensionContext.open(...) sometimes works when the app is already in memory — but you can’t control or predict that, and it’s not documented. The user might have closed the app an hour ago, and the call will silently fail.

What Apple recommends instead of openURL

On the same forum, Quinn writes:

“If your app extension needs to get the user’s attention, do that by posting a local notification.”

The idea is that an extension should not be a trampoline to the main app — it should handle the task on its own. If it can’t — it should just tell the user via a local notification.

Old hacks no longer work

If you googled this problem before, you probably saw the UIResponder chain “hack”:

// THIS NO LONGER WORKS
var responder: UIResponder? = self
while responder != nil {
    if let application = responder as? UIApplication {
        application.perform(#selector(openURL(_:)), with: url)
        break
    }
    responder = responder?.next
}

Starting with iOS 18, this code throws a sandbox error (NSOSStatusErrorDomain Code=-54). The system checks the call stack and blocks attempts to bypass restrictions.

How Apple’s own apps do it

Notes Share Extension

It’s interesting to see how Apple solves this problem in their own apps. Here’s what I found:

Notes: when sharing a link to Notes, the extension shows a folder picker, you tap “Save” and… you stay in Safari. The note is saved via background sync, and you only find out when you open Notes.

Photos: same approach — the extension saves to a shared container, sync happens in the background.

Messages: if you select a contact from “suggestions” in the Share Sheet, the system itself opens Messages. This is a system-level path, not openURL from an extension.

So Apple’s apps either don’t open the main app at all, or they use privileged system mechanisms that are not available to third-party developers.

Working solutions

Solution A: Shared Keychain — the extension handles auth on its own

Main App saves token to Shared Keychain; Share Extension reads it directly

The best solution is to make the extension autonomous. If the extension has access to the auth token, it can send data to the server by itself, without touching the main app.

The idea is simple: the main app saves the token to Keychain with a shared access group when the user logs in, and the Share Extension reads it and makes the API request itself.

// In the main app during login:
let query: [String: Any] = [
    kSecClass as String: kSecClassGenericPassword,
    kSecAttrAccount as String: "authToken",
    kSecAttrAccessGroup as String: "group.com.dropkind.shared",
    kSecValueData as String: token.data(using: .utf8)!
]
SecItemAdd(query as CFDictionary, nil)

// In Share Extension:
let query: [String: Any] = [
    kSecClass as String: kSecClassGenericPassword,
    kSecAttrAccount as String: "authToken",
    kSecAttrAccessGroup as String: "group.com.dropkind.shared",
    kSecReturnData as String: true
]
var result: AnyObject?
SecItemCopyMatching(query as CFDictionary, &result)

The main advantage is perfect UX: the user taps “Save” and stays where they were. You just need to set up Keychain Sharing between targets and make sure the token is available (not expired, not revoked).

Important detail: Keychain can survive app deletion, so you can’t rely on automatic cleanup. Imagine this scenario: a user logged in, deleted the app, created a new account a year later (for example, in the web version of your service), installed the app, and immediately used Share — the extension would find the old token and send data to the wrong user.

The fix: App Group UserDefaults, unlike Keychain, gets deleted with the app. Store currentUserId in UserDefaults and check for it before using the token:

// In Share Extension:
let shared = UserDefaults(suiteName: "group.com.dropkind.shared")
guard shared?.string(forKey: "currentUserId") != nil else {
    // UserDefaults is empty → app was reinstalled → require login
    return
}
// Only now trust the token from Keychain

Solution B: App entry via local notification

Share Extension saves data and schedules a notification; Main App completes the flow when user taps

If the extension can’t open the app programmatically, let the user do it by tapping a local notification:

Extension detects there’s no session
Saves data to temporary storage (App Group UserDefaults)
Shows a local notification: “Tap to log in and save the link”
User taps — this is a legitimate action, the system allows the app to launch

// In Share Extension:
func showLoginNotification(pendingURL: URL) {
    // Save data
    let defaults = UserDefaults(suiteName: "group.com.dropkind.shared")
    defaults?.set(pendingURL.absoluteString, forKey: "pendingShare")

    // Schedule notification
    let content = UNMutableNotificationContent()
    content.title = "Login required"
    content.body = "Tap to log in to DropKind and save your link"
    content.userInfo = ["action": "completeShare"]

    let request = UNNotificationRequest(
        identifier: "loginRequired",
        content: content,
        trigger: nil // Show immediately
    )
    UNUserNotificationCenter.current().add(request)
}

The implementation is simple and works reliably. The downside is an extra step for the user, and you need notification permission.

In practice: robust API, flaky UX

I tried this approach and found it unreliable in practice:

Users deny notification permission reflexively
Focus Mode / Do Not Disturb suppresses notifications silently
Even delivered notifications get dismissed without reading
Too many steps between “tap Share” and “complete the action”

Local notifications are robust from iOS’s standpoint — Apple recommends it, the API is stable and documented. But they’re flaky from a UX standpoint. Too many points of failure for the user to actually complete the share.

Solution C: OAuth inside the extension

Share Extension handles OAuth flow internally; token saved for future use

The most complex option: implement full OAuth login right in the Share Extension using ASWebAuthenticationSession. The user authenticates without leaving the Share Sheet, the token is saved to Shared Keychain, and future shares work autonomously.

The UX is seamless — but the implementation is a lot of work. Not all OAuth providers play nice with extensions, and ASWebAuthenticationSession has quirks when running outside the main app context.

Solution D: UIWindowScene.open() via responder chain

Share Extension walks the responder chain to find UIWindowScene and calls open()

Remember the old UIResponder chain hack that Apple blocked? It turns out there’s a variation that still works on iOS 18+. Instead of walking up to UIApplication, you walk up to UIWindowScene and call its open(_:options:completionHandler:) method:

static func openViaResponderChain(
    from viewController: UIViewController,
    url: URL,
    completion: ((Bool) -> Void)? = nil
) {
    var responder: UIResponder? = viewController

    while let current = responder {
        if let scene = current as? UIWindowScene {
            scene.open(url, options: nil) { success in
                completion?(success)
            }
            return
        }
        responder = current.next
    }
    completion?(false)
}

This works because UIWindowScene.open() isn’t subject to the same restrictions as UIApplication.open(). The system doesn’t block it — at least not yet.

Caveat: This is undocumented behavior. Apple could block it in a future iOS version, just like they blocked the UIApplication approach. Always have a fallback ready.

What I chose

DropKind is a simple app that sends articles and text to your Kindle. You find something interesting while browsing — share it to DropKind, and it lands on your e-reader. The Share Extension is the main entry point: most users discover content in Safari, not in the app itself. So a broken or clunky share flow means a broken product.

For DropKind, I chose a combination of solutions A and D, with a manual fallback. The main path is Shared Keychain: if the user is already logged in to the app, the extension picks up the token and works autonomously. If there’s no token — we try to open the app via UIWindowScene, and if that fails, we show an in-extension prompt.

Implementation details:

1) Separate share-token in Keychain. The main app gets a separate share-token and saves it to Shared Keychain. Share Extension uses this token for direct POST to the API.

2) user_id in App Group. Along with the token, we save user_id to App Group UserDefaults. This serves two purposes: the server verifies the token owner, and the presence of user_id itself is a marker that the app wasn’t reinstalled (UserDefaults gets deleted on uninstall, unlike Keychain).

3) If there’s no token — the extension saves data to App Group and attempts UIWindowScene.open() via the responder chain. This has worked reliably for me on iOS 18.

4) If UIWindowScene.open() fails — the extension shows an in-extension prompt asking the user to open the app manually. This is the final fallback, no notifications involved.

5) The main app finishes the job. On launch, the app checks for pending share data in the App Group, guides the user through login if needed, and completes the share.

Authenticated user

Unauthenticated: prompts to open app

App completes the pending share

This approach gives the best UX for most users (those already logged in), but doesn’t break for new users.

Going back to the pipe analogy: grep doesn’t ask you to configure anything — it just works with what it has. A Share Extension should aim for the same, handling auth silently whenever possible.

Dictionary-Quality Word Pronunciation Without Dictionary APIs

2025-12-19T00:00:00+00:00

When I started building word pronunciation features for my language learning app, the obvious first idea was to pull audio files from “reputable” dictionaries — Oxford, Cambridge, Collins, etc.

But I quickly ran into limitations:

Access. Getting an API key can be a hassle even for testing.
Rate limits. Restrictions on requests and pricing.
Caching. Storing audio locally is often prohibited — which is a dealbreaker for a learning app.
Vendor lock-in. Once you commit to a specific dictionary (its article structure, response formats, definition markup, pronunciation quirks), adding other languages becomes painful. Each language has its own dictionaries and formats, and stitching them together cleanly gets messy fast.

So I ended up with a solution I’d been avoiding: LLM-based Text-to-Speech. I’d tried similar things a year or so ago — back then, the quality wasn’t good enough for “dictionary-grade” pronunciation. But there’s been noticeable progress in both the models and available APIs since then: with the right setup, the results are now quite practical.

System Instructions: A Separate Channel for Style Control

APIs let you pass system instructions separately from the text. This is useful because you can treat them as a “contract” for pronunciation style:

accent variant: British / American;
delivery: clear, neutral, steady pace — closer to a dictionary narrator than a voice actor.

That’s enough to get consistent “educational” audio instead of “theatrical line readings.”

Heteronyms: The Model Will Be Wrong “Sometimes,” but You Need “Never”

Then came a less obvious problem — heteronyms: words spelled the same but pronounced differently depending on context (part of speech, meaning, tense).

The classic example is read:

present: /riːd/
past: /rɛd/

You can try to tweak the system instructions so the model always picks the right variant from context — but reliability will still be hit or miss. And in a learning app, mispronunciation is a bad experience: the user will memorize the wrong pattern.

IPA Substitution: Explicit Phonetics Instead of Guessing

The most practical trick turned out to be simple: replace the ambiguous word with IPA (International Phonetic Alphabet).

Example:

We read (present) → We /riːd/
We read (past) → We /rɛd/

You turn ambiguous spelling into unambiguous phonetics, and TTS no longer “guesses” — it just pronounces exactly what you’ve specified.

IPA on Demand

You don’t want to generate IPA for all text all the time: it’s more expensive and more complex. So here’s the approach:

Check the text: does it contain any words from a small list of heteronyms (a candidate dictionary)?
If not — send it straight to TTS.
If yes — make an additional request to an LLM tuned for the short task “pick the correct pronunciation”:
- word + sentence/context;
- (optionally) structural hints from your context, e.g.: {pos: "verb", tense: "past"};
- output: IPA or a choice between variants.
Replace the word in the text with /ipa/.
Send the “phonetic-ready” text to TTS.

The key idea: the extra request only fires when there’s a real risk of mispronunciation.

The Final Pipeline

Before:

text → TTS LLM → audio

After:

text + context + user-prefs → heteronyms → IPA → system instructions → cache → TTS LLM → audio

More steps, but they give you full control over exactly what makes dictionary APIs seem “more reliable”: style, pronunciation, and caching.

On Quality

The best part: quality turned out better than I expected.

I tested the results on Russian — not exactly a top-tier target language for TTS products. There’s an accent, but it’s barely noticeable: far less than any non-native speaker, and even less than many bilinguals. For second-language learning, that’s more than good enough.

Diving into Fizzy’s Routes: Rails’ resolve and direct

2025-12-15T00:00:00+00:00

37signals open-sourced their latest product last week. I cloned it and started where I always start when exploring a new Rails app: config/routes.rb.

I think config/routes.rb is the best place to crack open any Rails codebase. It’s the table of contents — you instantly see what resources exist, how they’re nested, and the overall shape of the domain. In Fizzy’s case: Accounts, Boards, Cards, Columns, Comments, Webhooks, Notifications.

But then I spotted something I’d honestly never used in production:

# config/routes.rb
direct :published_board do |board, options|
  route_for :public_board, board.publication.key
end

direct :published_card do |card, options|
  route_for :public_board_card, card.board.publication.key, card
end

resolve "Comment" do |comment, options|
  options[:anchor] = ActionView::RecordIdentifier.dom_id(comment)
  route_for :card, comment.card, options
end

resolve "Mention" do |mention, options|
  polymorphic_url(mention.source, options)
end

resolve "Notification" do |notification, options|
  polymorphic_url(notification.notifiable_target, options)
end

resolve "Event" do |event, options|
  polymorphic_url(event.eventable, options)
end

resolve "Webhook" do |webhook, options|
  route_for :board_webhook, webhook.board, webhook, options
end

What are direct and resolve?

Custom URL Helpers with `direct`

direct creates custom named URL helpers. Fizzy boards can be published publicly with a shareable link — but the public URL uses publication.key instead of the board’s ID. Rather than building this URL manually every time, direct gives you published_board_url(board) and published_card_url(card).

<%# app/views/public/cards/show.html.erb %>
<%= tag.meta property: "og:url", content: published_card_url(@card) %>

You could achieve the same with a helper method. Here’s the comparison:

Using direct in routes.rb:

# config/routes.rb
direct :published_board do |board, options|
  route_for :public_board, board.publication.key, options
end

Traditional helper in app/helpers/:

# app/helpers/boards_helper.rb (hypothetical alternative)
module BoardsHelper
  def published_board_path(board, options = {})
    public_board_path(board.publication.key, options)
  end

  def published_board_url(board, options = {})
    public_board_url(board.publication.key, options)
  end
end

The direct version defines both _path and _url automatically from a single block (though for public shareable links, you’d only ever need _url). Honestly, the helper version looks simpler and more straightforward. The advantage of direct is locality: all URL-generation logic lives in routes.rb.

Another bonus: direct helpers are automatically available in Rails.application.routes.url_helpers, so you can use them in models, background jobs, or anywhere outside controllers and views:

Rails.application.routes.url_helpers.published_board_url(board)

One thing that confused me at first: direct and resolve routes don’t appear in rails routes output. This is by design — they’re URL generation helpers, not HTTP endpoints. A direct can even point to an external URL:

# config/routes.rb
direct :homepage do
  "https://rubyonrails.org"  # Not a route in your app!
end

Customizing Polymorphic URLs with `resolve`

The Rails docs dedicate about two sentences to resolve: “Define custom polymorphic mappings of models to URLs” and a brief example with a Basket model.

You know how link_to @post generates /posts/123? That’s polymorphic_url under the hood — Rails introspects the model and finds the matching route.

But what happens when a model doesn’t have its own route? Comments in Fizzy don’t live at /comments/:id — they’re displayed on their parent Card. Events are polymorphic wrappers around other actions. Notifications point to something else the user should see.

Without resolve, you’d write helpers like this everywhere:

# app/helpers/comments_helper.rb
def comment_url(comment)
  card_url(comment.card, anchor: dom_id(comment))
end

And then remember to call comment_url(comment) instead of url_for(comment). The resolve DSL fixes this — it teaches Rails how to generate URLs for specific model classes, keeping route logic in routes.rb where you’d naturally look for it.

The block receives:

The model instance
An options hash (anchors, format, etc.)

It returns whatever route_for or polymorphic_url can handle.

Both live in the same CustomUrls module, both take a block that returns something url_for can handle.

Under the Hood: How resolve Actually Works

Step-by-step source code walkthrough

The docs are sparse, so let’s read the source. When you write:

# config/routes.rb
resolve "Comment" do |comment, options|
  route_for :card, comment.card, options
end

Here’s what Rails does at boot time.

Step 1: The DSL method (mapper.rb)

def resolve(*args, &block)
  unless @scope.root?
    raise RuntimeError, "The resolve method can't be used inside a routes scope block"
  end

  options = args.extract_options!
  args = args.flatten(1)

  args.each do |klass|
    @set.add_polymorphic_mapping(klass, options, &block)
  end
end

It validates you’re at the root level (not inside a namespace or scope), then registers your block for each class name.

Step 2: Store the mapping (route_set.rb)

def add_polymorphic_mapping(klass, options, &block)
  @polymorphic_mappings[klass] = CustomUrlHelper.new(klass, options, &block)
end

Your block gets wrapped in a CustomUrlHelper and stored in a hash: { "Comment"=> [helper instance], ... }.

Step 3: The lookup (polymorphic_routes.rb)

When you call link_to(@comment) or url_for(@comment), Rails eventually hits polymorphic_url:

def polymorphic_url(record_or_hash_or_array, options = {})
  if mapping = polymorphic_mapping(record_or_hash_or_array)
    return mapping.call(self, [record_or_hash_or_array, options], false)
  end
  # ... default polymorphic resolution
end

def polymorphic_mapping(record)
  _routes.polymorphic_mappings[record.to_model.model_name.name]
end

It checks the hash using the model’s class name. If found, it calls your block instead of the default route resolution.

Step 4: Execute the block (route_set.rb)

class CustomUrlHelper
  def call(t, args, only_path = false)
    options = args.extract_options!
    url = t.full_url_for(eval_block(t, args, options))
    only_path ? "/" + url.partition(%r{(?).last : url
  end

  private
    def eval_block(t, args, options)
      t.instance_exec(*args, merge_defaults(options), &block)
    end
end

The helper runs your block via instance_exec, passing the model and options. Whatever you return gets passed to full_url_for to generate the final URL string.

The complete flow:

link_to(@comment)
  → url_for(@comment)
    → polymorphic_url(@comment)
      → polymorphic_mapping(@comment)
        → @polymorphic_mappings["Comment"]  # Your CustomUrlHelper
      → helper.call(self, [@comment, {}], false)
        → instance_exec(@comment, {}, &block)
          → route_for(:card, comment.card, anchor: "comment_123")
        → full_url_for([:card, card, {anchor: "comment_123"}])
          → "/cards/abc#comment_123"

Result: link_to(@comment) → "/cards/abc#comment_123"

Fizzy’s Patterns

Notification → Whatever It’s About

# config/routes.rb
resolve "Notification" do |notification, options|
  polymorphic_url(notification.notifiable_target, options)
end

Notifications wrap Events or Mentions. Rather than linking to a “notification show page” (boring), this links directly to the thing you’re being notified about. The notifiable_target method is delegated to source:

# notification.rb
delegate :notifiable_target, to: :source

# event.rb
def notifiable_target
  eventable  # Card, Comment, etc.
end

# mention.rb
def notifiable_target
  source  # The Card or Comment containing the @mention
end

# user.rb
def notifiable_target
  self  # "New user joined" → links to their profile
end

Now link_to(@notification) in the notification tray just works:

# notifications_helper.rb
link_to(notification, class: "card card--notification", ...)

Why This Matters

Fizzy has many “indirect” models — objects users interact with through their parents:

Comments live on Cards
Events describe actions on Cards/Comments
Notifications wrap Events/Mentions
Mentions point to Cards/Comments

The direct and resolve blocks centralize URL generation logic in routes.rb rather than burying it in helpers. You write link_to(@notification) and trust the router to figure it out. When someone asks “how do URLs work in this app?” — there’s exactly one file to check.

It’s one of those Rails features that’s been there since Rails 5, hiding in plain sight. I’ve walked past it a hundred times in the docs. Seeing it used by the Rails creators themselves? Now I get it.

The official docs are sparse, but Fizzy’s config/routes.rb is a good example of real-world use cases.