AI stack

Models

We default to Gemini 3 Flash via AI Gateway. It gives us fast, cheap, high-quality reasoning with native tool calling. For heavier reasoning (multi-page contract review, complex triage) we route to Gemini 2.5 Pro or GPT-5 through the same gateway — no client changes.

Tool calling

Every agent has a small, typed set of tools defined with Zod. Tools do one thing well:

qualify_lead, analyze_sentiment, draft_reply, schedule_appointment, lookup_status.

Multi-step loops are bounded with stopWhen: stepCountIs(50) and observed with OpenTelemetry so a runaway agent shows up as a spike, not a surprise bill.

Grounding

Retrieval-augmented generation over Pinecone (product docs, FAQs, prior tickets). Every response cites its sources; if confidence is below threshold, the agent refuses and hands off to a human.

Guardrails

Structured output schemas (Zod) reject hallucinated shapes at parse time.
Brand voice checker runs before any customer-facing send.
PII redaction on inbound before it hits the model.
Rate limits per tenant, per tool, per model.