ElevenLabs: The long game
Inside the most ambitious company in voice, where $11bn is just the start.
Mati and Piotr are sleeping in their rented Camry. Back in San Francisco they have clean corporate apartments, but for this weekend in Yosemite they wanted something simple. A long hike, a quiet break, a chance to switch off. So simple that they didn’t even book a room. Except Piotr still has work to do on his NeurIPS paper, and if they’re being honest with themselves, clean breaks have never been part of their arrangement.
So, Mati heads out alone towards Half Dome, and Piotr stays in the passenger seat, editing. Head down, headphones on. The car, already their bedroom, now doubling as a study hall.
Out in the valley, the sun falls faster than expected. On the descent Mati’s phone dies. No signal. No light. No map. By the time he reaches the base, the trailhead is empty. No Piotr.
He boards the last shuttle of the night, a slow loop back to the gravel patch where they slept the night before. A forgettable corner of the valley. A place that feels like a misprint against all the beauty around it. Mati stands there for a moment, doubt creeping in. He weighs his options. Walk back. Wait. Find a motel. Or trust the logic that has carried them through fifteen years of shared problems.
He chooses trust.
Five minutes later headlights cut through the dark. Piotr pulls in beside him. Calm. Unhurried. A drift of printed pages and protein bar wrappers on the passenger seat that Mati brushes aside as he climbs in.
“How’d you know where to go?” he asks.
“You weren’t at the trailhead,” Piotr says. “So I figured you’d double back.”
Not instinct. Not luck. Just reasoning. A Bayes-flavored friendship honed across years of math olympiads, side projects, and the learned practice of thinking in tandem with someone.
That was May 2017. Today, they run ElevenLabs, one of the fastest growing and most technically ambitious companies in AI. But the old pattern remains intact. One moves outward. One goes inward. One tests the edges. One traces the logic. Both arrive at the same point.
They were lucky to have found each other early. And the world, as it happens, was lucky too.
A perfect marriage
Mati and Piotr have this quiet, practiced way with each other, a mutual understanding at a molecular level. “They’re like an old married couple,” a16z’s General Partner Jennifer Li tells me.
Sitting in Eleven’s London office, Mati pulls up an old photo from high school: the two of them at Poland’s national math olympiad. “We had all the same classes,” he said. “We were competing on everything.”
But the competition was surface. Underneath was recognition. Piotr could sit with a problem until the structure appeared. Mati would push outward, try things, break things, bring back a piece that made the whole thing clearer. A natural split of focus, but with reflexive collaboration.
It carried through everything they did. Officially, they first worked together at Opera in 2015, but unofficially they’d been working together for years. They spent one summer road tripping across the Balkans, once again deciding a car was accommodation enough. “Not optimal,” Mati said, looking back, “but it was us.”
And they built things. A recommendation engine. An early voice analyzer. A crypto analytics engine during the 2021 frenzy. The substance of the projects mattered less than the pattern: they kept choosing each other.
“We used to speak every day. Now it’s not exactly daily, but I usually predict what he’d say,” Mati told me. “Piotr and I know each other so well we even know the weaknesses the other person is trying to hide.”
It’s part married couple, part platonic ideal: two people shaped by the same culture, trained on the same problems, building on decades of shared instinct before a single line of ElevenLabs code existed.
A gap in the market
A movie in a foreign country usually comes in two forms. A dubbed version, where actors are fully replaced by local voice talent. Or subtitles, where you keep the original performance and read your way through it.
Poland took a third path.
Mati and Piotr grew up on lektor films, the uniquely Polish format where every foreign movie is voiced over by one man reading every line in the same flat tone. You hear the original actors murmuring underneath, like ghosts trapped inside the film. Jarring when you first encounter it and invisible if you grow up with it.
“It shapes how you listen,” Mati told me. “You start paying attention to what’s missing.”
So they learned early on that tone isn’t purely decoration, it’s meaning. That delivery shapes understanding. Which is why the rest of the world’s indifference to voice always felt a little strange to them. For most of the 2010s, voice tech plateaued. Siri and Alexa handled weather and kitchen timers, but nothing that required nuance.
Then the generative wave hit. Text exploded. Images followed. Video became the next gravitational center. Voice stayed peripheral. For Mati and Piotr, voice hadn’t been solved. It was the one category that still felt open.
By 2022, when Mati began thinking seriously about building something together, the pieces had aligned: “Piotr came to me and simply said ‘the models are ready.’” The tools had finally caught up to the problem they’d been circling since they were teenagers.
Meanwhile the industry still hadn’t noticed. Voice looked irrelevant. And that disconnect showed up immediately in fundraising.
They pitched. Most VCs passed. But Credo Ventures said yes. They led the pre-seed with support from London-based Concept Ventures. At this stage, Mati and Piotr didn’t even have an office, so the first few months of ElevenLabs were built from Concept’s boardroom.
As Concept’s investment memo put it at the time, audio had been neglected by recent advances in research, and ElevenLabs looked like the team to bring the medium back to life. “It felt like a golden window,” Oliver Kicks told me. “A moment where the technology and the team lined up.”
It wasn’t obvious. Voice wasn’t dead. It had simply been abandoned. And the only people who noticed were two founders who had grown up attuned to absence.
The birth of the company
If you ask Mati about the origins of ElevenLabs, there’s no eureka moment. What he described instead was a narrowing. The sense that the problem they’d been circling for years had finally come into focus, sharp enough that they could step straight into it.
So they fell back into the rhythm they always had. Piotr went inward. He dug into papers, built prototypes, and pushed at the edges of the new architecture. Mati went outward. He talked to early users. He sketched use cases.
Their first product was a wedge: a dubbing tool for creators. The product was small, but the ambition wasn’t: they figured creators alone could reach one hundred million in ARR over time. But they never saw that as the destination. They saw voice as infrastructure. As a new interface waiting for its moment.
The first version of ElevenLabs was a simple text-to-speech tool with a tweet-length character limit. They launched it quietly, but users loved it: one author copied and pasted an entire book into it, 240 characters at a time.
From the outside, the rise looked sudden. From the inside, it felt like a long, submerged line finally breaking the surface. Even the structure of the company reflected the balance Mati and Piotr spent years protecting. They split equity evenly. They split their focus the same way: Piotr on research, Mati on go-to-market. They moved differently but always in parallel, each layer reinforcing the other.
This wasn’t two founders discovering a market. It was a market finally catching up to two founders who had been preparing for it their whole lives.
The two-body solution
I didn’t get to meet Piotr. Almost nobody does. The distance can make Piotr sound mythic, but he just loves the work, and it’s hard to pull him away from it. Inside the company, he is deeply present in the work itself, protecting the space where the real thinking happens.
Mati moves differently. He is the surface area of the company, the part that meets the world. He gathers information, translates ambiguity into direction, and pulls the horizon closer for everyone else. Even investors who passed on the early rounds told me they walked out of their first meeting with him convinced he would eventually run something enormous.
Jennifer Li told me about a fundraising meeting with a16z. Both sides had been going back and forth for hours. Then Piotr gestured for Mati. They stepped aside, spoke to each other in Polish for five minutes, came back, and closed the deal.
Piotr’s research team is tiny, but unusually effective. They often hire by artifact, searching GitHub repos. One early researcher came straight from a call center. Another was building voice models in his spare time.
Mati is the counterweight. He built the commercial engine, the partnerships, the teams, the surface of the product. People kept telling me the same thing: he learns at a frightening pace. Introduce him to someone sharp and he absorbs their best instincts within a week. The company compounds because he does.
On paper, ElevenLabs is a high-growth LLC. In practice, it still runs like a partnership. The culture is what forms in the magnetic field between the two.
Early-middle game
Mati is a chess player, so I asked him how he sees the board. “We’re in the early middle game,” he said. Enough clarity to see the lines, enough uncertainty that every move matters.
The first phase of AI voice is over. The primitives are stable. The world has accepted that expressive audio is solvable. Now the real contest begins: latency, reliability, multimodal agents, distribution, and all the invisible seams that determine whether a company becomes infrastructure or an afterthought.
But now is where most companies lose their way. This is where patterns calcify. It’s where teams start optimizing for the wrong thing, or get distracted by adjacent categories, or confuse motion with progress. “It’s so easy to copy the primitives now,” Mati told me, “that people assume that’s the work.”
So I asked him what the endgame looks like.
There are real risks for ElevenLabs. Their research advantage is powerful but fragile. Fifteen researchers competing with labs that have hundreds. As Mati noted: fall behind on research, and nothing else matters. Product momentum can’t compensate for losing the frontier. Culture is another pressure point. They’re over 400 people and growing. The original rhythm—Piotr deep, Mati wide—gets harder as the company expands. Growth rewards alignment. It punishes noise.
The market is changing too. Voice is no longer an afterthought. The first wave of competitors is arriving. The big labs are now paying attention. Customers want more reliability, more control, more integration. Winning the opening doesn’t guarantee winning the game.
Mati knows all of this. That’s why he says early middle game. The opening gave them shape; the middle game decides whether that shape endures.
Ghosts in the machine
All the chess talk has lit Mati up now, and we end up talking about Richard Feynman. He’s one of Mati’s heroes. Feynman believed that if you truly understood something, you could explain it simply. Strip the idea down until the shape shows itself.
When I asked who else he admired, he mentioned von Neumann. A different creature entirely. The kind of mind that sees the full architecture of a problem all at once. Hearing Feynman and von Neumann side by side, it seemed obvious he was sketching the archetypes of him and Piotr.
So I asked him.
Mati gave a small smile. “Yes,” he said. “That makes sense.” In recent weeks, ElevenLabs announced they now offer a licensed Richard Feynman voice—one can only hope for a future von Neumann.
Another voice on offer is Alan Turing. The man who gave us the cultural shorthand for machine intelligence is now a machine himself. Playing around with the product, I used ElevenLabs to hear myself speak Spanish. Properly speak it. Not the strained, halting version I manage in real life, but a version with flow and timing. The voice sounded like me. I passed my own Turing test.
I had given the model a clip of me doing standup. Partly because that’s the most clips I have of myself, but partly because so much of standup is dependent on voice. Standup is jokes, sure, but it’s also breath and timing and intention. Delivery is the difference between a joke landing and a joke dying.
This is the frontier Mati and Piotr are pushing. A synthetic voice that holds timing and tone with enough fidelity that you stop thinking about the mechanism and start reacting the way you react to people. You hear something that feels real, and your brain follows.
Maybe that is the real threshold. Not the moment a machine fools you, but the moment you stop thinking in terms of fooling at all.
The business behind it
The company closed 2025 with three hundred and thirty million ARR, with close to a fifty-fifty split between self-serve and enterprise. Last week, the company raised a $500m Series D at $11 billion, a relatively modest multiple in a frothy AI market.
The business today runs on three lines of revenue. There is the Creative platform used for expressive translations, dubbing, and text-to-speech. There is the developer API that powers thousands of voice, sound, and music applications. And then there is the Agents platform, ElevenLabs’ enterprise layer, growing more than three hundred percent in the past twelve months.
That shift in scale came with a shift in footprint. ElevenLabs now works with companies across many sectors, from customer support to entertainment to education, including names like Cisco, Epic Games, Adobe, and Nvidia. A large part of this momentum comes from their work on conversational agents. Developers and enterprises have already built more than two million of these agents on the platform, deployed across web, phone, and apps. Businesses can bring their own logic and knowledge base, connect to existing systems, and rely on low latency under the hood.
What makes all of this possible is the company’s decision to train and run its own models from day one. Most generative AI companies today pay large fees to upstream providers for every request. ElevenLabs does not. That means its margins look more like traditional software than most AI companies. It also means they can expand into new modalities without breaking the economics beneath them. Voice led naturally to sound effects, then music, then multi-speaker scenes, and those will eventually lead into richer studio and enterprise pipelines.
The creator wedge was never just about distribution. It stress tested the product at the highest possible resolution: accents, pacing, emotion, and edge cases. When the Fortune-level customers arrived, the foundation was already set. In a similar expansion to their original creator focus, ElevenLabs has recently added generative image and video models to the platform.
All of this has been built by a team of roughly four hundred people. Many other frontier AI labs number in the thousands, but ElevenLabs feels very different. Smaller, faster, more opinionated about taste and quality. It is a generative AI company, but also an audio company. A consumer product that grew into enterprise infrastructure. It now feels less like a single model and more like the early architecture of a full audio platform, the layer other companies build on top of.
From the outside, ElevenLabs looks like it clicked into place overnight. From the inside, it was a precise alignment of timing, taste, and two people who’d been circling the same problem for half their lives. Once the models matured, everything else snapped into place.
Voice as interface
Voice technology is a good business, but there’s a bigger question underneath. Because voice isn’t just a feature. It’s the oldest interface we have. Long before writing, before screens, people taught one another, signaled trust, and revealed intention through tone and cadence. Meaning lives in how something was said as much as in the words themselves.
For years, voice technology gave us a flattened version of that. A thin synthetic voice on one end and you yelling ‘operator’ on the other. Expectations fell because imagining anything better felt abstract. The best we could hope for was eventually reaching a human agent. ElevenLabs shifts that baseline to the point where many people may prefer to speak to an agent. And not just for navigating refunds, but for the real stuff.
Mati’s been speaking with his hands, and I notice a bracelet on his wrist: black and white beads spelling out the number eleven. “My niece made it,” he tells me, and I think of the world she’ll grow up in. While Mati and Piotr had math textbooks and lektor films, she’ll grow up in a world of voice.
If she wants to learn chess, she won’t be staring at some dead static board on an app. She’ll be talking to an instructor that can match her pacing, explain a tactic three different ways, slow down when she hesitates, and speed up when she grasps it. And when she gets curious about the people she reads about, she’ll be able to talk to them too. Imagine learning about ancient Egypt with Cleopatra, or early aviation with Amelia Earhart.
Educators have dreamed of this for decades. Having a tutor can raise student achievement by two full standard deviations: going from the middle of the pack to the top of the class. It’s how we educated royalty, and now it can be for everyone.
Once models can express tone, pacing, hesitation, and emotional contour with fidelity, whole categories move. Education becomes more personal. Accessibility becomes more humane. Entertainment becomes easier to translate across borders. Agents feel present rather than scripted. Translation keeps the performer’s voice instead of replacing it.
None of this replaces human voice. It expands what’s possible. Voice is a medium we left underdeveloped, and what ElevenLabs is building is a world beyond flat assistants and canned prompts. It’s a set of primitives for expressive audio at internet scale.
And in a neat loop, it returns to where Mati and Piotr began. Two kids in Poland listening to flat lektor films, learning to notice what was missing long before they could name it. Now they’re building the infrastructure that restores the range they never heard growing up.
Coda
Later that night, they were back in the same gravel patch. Yosemite was a shadow outside the windows, and they’d split the car in two: front seat for work, back seat for sleep. Mati lay in the back, head against a rolled jacket. Up front, Piotr was still awake, his face lit by the glare of his MacBook. His wallpaper was a picture of the valley outside, the closest he’d come to hiking it that weekend.
Piotr’s paper would be finished in time and accepted into the conference. A clean result for a weekend spent sleeping in a rental car. A few hundred miles away, a group of Google researchers were working on a paper for the same conference. “Attention Is All You Need”, a similar last-minute sprint, would lay the foundations for the field Mati and Piotr would enter a few years later.
None of that felt real in the car. It was just two friends at the end of a long day. One reading. One sinking toward sleep. The valley was silent, save for the clicking of keys and the occasional rustling of pages. Mati closed his eyes, dreams unfolding like a lektor film, voices murmuring underneath.
Thanks to Mati, Luke, Olly, Jennifer, Mark, Nev, Victoria, Maciek, Franek, Hannah, and everyone else who took the time to help with this piece.
Enjoyed this deep dive? Subscribe for future ones.



Great writeup Ben!
So much heart in this, congrats on the launch!