Skip to content

    We use cookies for analytics and to improve your experience. Learn more in our Cookie Policy.

    Back to blog
    2026-04-139 min read

    Voice AI Agents for B2B Sales: Replacing Cold Calls with ElevenLabs and GPT-4o

    Voice AIElevenLabsB2B SalesAutomation

    Outbound calling is one of the oldest and most expensive motions in B2B sales. A fully loaded SDR in the US costs $70,000–110,000 per year, makes roughly 60–100 dials a day, and connects with a human on maybe 5% of those. The cost per connected conversation lands somewhere between $30 and $50. When the economics are that ugly, any technology that can move the needle gets attention — and voice AI, finally, is that technology.

    The shift happened quietly over 2024 and 2025. Voice synthesis became indistinguishable from a real person for short calls. Latency dropped under 500ms round-trip, low enough that natural conversation feels possible. Models that can actually listen, interrupt, and stay on script became reliable enough for production. By early 2026, several of our clients are running voice agents in production for qualification, appointment setting, and early-stage outbound. The results are uneven — some calls work, some do not — but the economics are no longer theoretical.

    The stack we actually use

    A production voice agent for B2B sales has four layers. None of them are exotic.

    • Telephony: Twilio or Vapi for placing and receiving calls. Vapi is purpose-built for AI voice and cheaper per minute; Twilio is the battle-tested default. Either works.
    • Speech-to-text: Deepgram Nova-3 or OpenAI Whisper for turning the prospect's voice into text. Deepgram is faster in streaming mode, which matters for sub-second latency.
    • Reasoning model: GPT-4o or Claude Sonnet for deciding what to say next. The prompt carries the call script, the qualification criteria, and the handoff rules.
    • Text-to-speech: ElevenLabs for the agent's voice. Their Turbo v2.5 model produces natural-sounding speech at 300–400ms latency. This is the piece that makes the whole thing feel real — a robotic voice kills the call in three seconds.

    Everything is held together by a session manager that streams audio in both directions, handles interruptions (the prospect talks over the agent — the agent needs to stop), and writes structured call outcomes to your CRM at the end. For a typical deployment we write this layer ourselves in Node or Python, or use a framework like Pipecat.

    The economics

    The numbers that matter: a voice AI call costs roughly $0.25–0.40 per minute in API and telephony fees. A typical qualification call runs 2–4 minutes. So cost per connected conversation is around $0.60–$1.50. Compare that to $30–$50 for a human SDR, and the gap is two orders of magnitude.

    Per-call cost is the easy number. The harder number is conversion. In our deployments, voice AI qualification calls convert to booked meetings at 50–70% of the rate of a good human SDR. That is not a small gap. It means for high-value deals, you still want humans. But the cost difference is so large that the math usually still works in the agent's favor — you can run 20x the call volume for the same budget, and even at half the conversion rate, you end up with more meetings.

    When voice AI wins

    Voice AI is not a replacement for human SDRs across the board. It wins in specific scenarios:

    • High-volume, low-stakes qualification: Calling a list of 10,000 leads to figure out who is actually in-market. A human SDR cannot physically do this economically. A voice agent can.
    • Appointment reminders and confirmations: Outbound calls that do not require selling anything. Almost zero downside.
    • Follow-up on inbound leads: When someone fills out a form at 2am, a voice agent can call them within 60 seconds. Speed-to-lead is one of the strongest predictors of conversion in B2B.
    • Re-engagement campaigns: Calling past customers or cold leads where the opportunity cost of a human is too high.

    For any deal with ACV over $50,000 — where the prospect expects a real relationship with a real salesperson — voice AI should be used only for qualification, never for the actual sales conversation. The gap between "acceptable short call" and "acceptable 30-minute discovery" is still very wide, and pretending otherwise damages the brand.

    Legal and ethical constraints

    This is the part most vendors handwave. You cannot. In the US, the TCPA (Telephone Consumer Protection Act) and FCC rulings from 2024 classify AI-generated voices in outbound calls as "artificial or prerecorded voice" — which requires prior express written consent for most marketing calls to mobile numbers. Calling mobile numbers without consent using a voice agent is a statutory violation at $500–$1,500 per call. That is not a rounding error.

    In the EU, the AI Act and GDPR add more layers. Calls must disclose that the caller is AI (Article 50 transparency obligation). Call recordings are personal data and need a lawful basis. Profiling based on call content requires additional safeguards.

    The workable pattern we see succeeding: only call leads who have opted in (form submissions, existing customers, opted-in lists), always disclose AI up front, and keep full call logs for audit. This is not legal advice — talk to counsel before going live — but it is the compliance posture that keeps deployments out of trouble.

    What we have learned from real deployments

    On our AI Assistant project for an interior design firm, the voice layer was originally meant to be the centerpiece. It ended up as one channel among several — the text chat worked better for the kinds of questions customers actually asked. The voice agent shines in a narrower slot: quick confirmations, appointment scheduling, and the first 90 seconds of outbound qualification. The lesson we took away was not that voice AI does not work, but that it works in specific slots, and trying to force it into the full conversation loop fails.

    The other consistent lesson is interruption handling. A voice agent that keeps talking when the prospect is trying to speak sounds more robotic than one with a slightly worse voice model but better turn-taking. Prioritize latency and interruption over voice quality if you have to choose.

    If you are running an outbound motion where cost-per-connected-conversation is eating your unit economics, and your leads have opted in, voice AI is worth a pilot. We build these pilots at N40 — start a conversation at /contact.