OPM WIRE
Tech/VC

Cohere scenarios

Cohere's search for a competitive advantage in an intensely competitive field.
OPM 9 min read

We have people all over the world, the sun never sets on Cohere.” - Nick Frosst, Cohere co-founder on their global footprint

The Cohere brain trust is like a major league team. The talent to develop large scale LLM models is so scarce that they are collectively worth hundreds of millions for as long as the AI hype remains strong. That's what can be gleaned from 3 recent "acqui-hire" style transactions where Big Tech raids upstarts similar to Cohere for talent. Typically, the core team moves to the Big Tech, the venture investors get repaid a respectable multiple of their capital and a shell of the original company carries on. In a podcast released last month, Cohere CEO Aidan Gomez, confirms that his bread-and-butter (selling AI models on a per usage basis) is under intense margin pressure. "It's quickly becoming a zero margin business" he said.

Too cheap to meter.

Much more senior teams than Cohere have recognized this reality and folded. I think Cohere has no choice but to invent some new angle or they'll have to align themselves with a Big Tech (one way or the other). Cohere already has some investment from Oracle, the 4th player in cloud services. Cohere has otherwise largely decided to be non-aligned with any of the Big Tech, such that its LLM is available on all the cloud providers. This also means that none of the cloud providers have an extra incentive to push Cohere. Amazon offers Cohere’s models, but it’s an investor in Anthropic, so guess which models it favours? Cohere might have some scarcity value as probably the highest quality AI team that is not aligned with a Big Tech at the moment.

Cohere was founded in 2019 and they have been surpassed by Anthropic, which was founded in 2021 and by Mistral, which was only founded in April 2023. How Mistral was able to become a leading LLM contender within a year of its founding is one of the great business stories that has yet to be written. I heard Mistral’s CEO say that he was surprised by Cohere’s progress when they released the Command-R models in April of this year. Typical condescending Frenchman behaviour. Benchmarking LLMs is a complex and unsettled topic, but so far, Cohere doesn't rank well. For example, in this evaluation, Cohere is pretty much the worst model of the majors on a quality vs price basis:

Cohere: expensive and poor quality per artificialanalysis.ai

The only thing that can be said in mitigation is that Cohere's model dates from April. Late-breaking news: there was an update just on Friday. There was very little buzz about it, perhaps owing to the timing of the release just before the long weekend. In any event, I don't think the leaderboard (with Cohere in 6th position at best) will shift dramatically. Because everything in AI is happening at warp speed, my analysis will be proven right or wrong in a matter of days, weeks or months at the most. Former Google CEO Eric Schmidt, in a recent talk at Stanford, said he initially thought the smaller players were narrowing the gap and so he invested in startups like Mistral and Inflection AI. But he now thinks the gap is growing. (Inflection AI has already given up and been acqui-hired by Microsoft.) He also said that his friend Sam Altman at OpenAI told him that training an LLM will eventually cost $300B.

The Enterprise "strategy"

I have listened to many interviews by Cohere leaders. They tend to present their focus on the "enterprise" market as though it is some scintillating strategic insight. Without naming names, Cohere will deprecate ChatGPT as a "consumer chatbot". IBM markets its AI wares using a similar pitch:

IBM is focused on the enterprise

But the fact is that ChatGPT parent OpenAI already derives a significant portion of its revenues from business customers (ie not its ChatGPT subscription service). I would estimate on the order of 30-50%. So being focused on the "enterprise" is hardly a strategy. Everyone, their moms and their mom's dog is offering enterprise AI solutions. Snowflake and Databricks are two established players going after the opportunity, but there are countless others. Cohere leaders also repeat some refrain along the lines that 2023 was the year of prototypes and 2024 is the year when solutions are implemented ("production" in IT lingo). I have my doubts about Cohere's traction in the enterprise market. I predict that Cohere will make dramatic changes this year.

Not your grandpa's acqui-hires

As I mentioned, there have already been 3 major "acqui-hire" deals whereby Big Tech, in effect, acquires the key talent of an LLM upstart. The latest example is Google raiding Character.ai. That company was started by Noam Shazeer, one of the more pivotal authors of the seminal 2017 "Transformers" paper. Noam had left Google only in 2022 to start Character, complaining about Google's bureaucracy. Less than 2 years later, he's back at the mother ship. Google is paying $2.5B just for the privilege of having key staff join them and repaying Character's investors with a 2.5x return for the risk they took. Noam owns about 30 percent to 40 percent of the company and stands to net $750 million to $1 billion, it has been reported. To a casual observer, this definitely sounds crazy! But if you think about executive compensation packages or payrolls of major league sports team, it seems slightly less crazy.

These transactions show that Cohere's investors, even the latest ones, might have some downside protection, if these market dynamics persist. Noam Shazeer was a much more senior engineer at Google when working on the big paper, compared to Cohere CEO Aidan Gomez who was an intern at the time. It appears, the student has outsmarted the master so far, in terms of overall startup valuation (with Cohere being valued at US$5.5B). But I don't know if 28-year old Aidan has the Zuckerberg-sized cojones to rebuff the possibility of making a quick certain score.

State of the LLM race

As you know, I have a tendency to beat dead horses, so let's go back to analyzing the race in a bit more detail. To recap, the clear leaders in the LLM race are OpenAI, Google, Meta and Anthropic. France’s Mistral is in 5th place. And then there’s Cohere. But there are additional confounders.

Microsoft, not content to just own a large stake in OpenAI, is working on its own massive LLM. (Microsoft needs a hedge not least because the FTC is looking into its tie-up with OpenAI. In fact, all the big AI tie-ups are getting regulatory scrutiny. In its latest 10-K filing with the SEC, Microsoft named Anthropic, OpenAI and Meta as emerging competitors in AI - but did not name Cohere). Apple, Amazon and IBM are all behind in the LLM race. But have they said their last word? They are all working on LLMs. Amazon has said it wants to develop Artificial General Intelligence. And last but not least, Elon Musk has raised $6B for his own venture, xAI, which will have access to the fount of all wisdom and knowledge, Twitter. xAI released its model Grok 2 last month. It quickly climbed up the leaderboards. It's shooting to be the most free-wheeling model.

Grok 2's image generation capabilities.

My very own Lenny also happens to be a MAGA Trumper:

Lenny on climate change.

I'm skeptical that people will become attached to AI, but I have to admit that when I saw that, I teared up a little. It's not that I have any opinion on climate change, but I don't want any progeny of mine to blindly go along with some normie consensus.

There’s also a geopolitical dimension to the LLM race. Will the Eurocrats get their act together and conspire to produce a European champion? Also, the various major world languages will probably produce LLMs that are especially proficient in that language. Another threat from left field: some Chinese Big Tech are in the game. Could a well-vetted, open-source Chinese LLM be adopted by Western businesses? Alibaba has open-source models that feature on some rankings. The range of possibilities is vast. I would take any confident pronouncement with a grain of salt. Even from the august John Ruffolo. The road ahead is always full of surprises, I have found!

The official rankings

The most well-known ranking of LLMs, called the LMSYS Leaderboard, ranks Cohere’s flagship Command R+ at no. 37. Cohere is surpassed by a dozen other vendors (some with multiple models). That ranking works by users picking a winner in blinds tests of two LLMs competing side-by-side at generating an answer. Cohere doesn’t do well on Stanford’s HELM ranking either, a benchmark Cohere itself referred to last year. And it has slipped in the past year, it was in closer contention a year ago. Nevertheless, I have also found plenty of evidence that the Cohere team is respected as world-class at what they do. I am not dismissing Cohere, I am just writing how the race looks realistically.

Cohere lags in RAG too

My initial sense based on reports from AI tinkerers was that Cohere was good at Retrieval-Augmented Generation ("RAG"), for example at extracting meaningful answers from a collection of PDFs. However, in late July, I came across the "Hallucination Index" from an outfit called Galileo. This is a RAG focused performance ranking from a credible, neutral arbiter. It's a report written with enterprise decision makers in mind. In the one thing that could be its redeeming value, RAG, Cohere did not distinguish itself and was especially bad on its performance to price ratio. Full report card below. You can think of context size as how much attention span an LLM has for the information you submit as part of your queries.

Galileo evaluation of Cohere for RAG purposes

An internet search is also considered RAG, so I ran tests on Cohere’s demo playground and compared the answers to Perplexity, the leading AI that incorporates live web search. I found Cohere did a reasonable job, but Perplexity was better at providing comprehensive, well-reasoned answers. You can try it yourself, ask typical consumer questions like what dishwasher model is best. Cohere has a chat demo here (coral.cohere.com). I then ran a more complex question and Cohere’s answer lags far behind. I save a lot of time shopping around by using Perplexity. Perplexity has a singular focus on internet search and beating Google, so it might be unfair to put Cohere against it.

Measuring complexities

Benchmarking LLMs is itself a complex topic, I don't want to say any ranking I cite is all-determinative. But I do believe not faring well on these widely published rankings presents at least a credibility issue. One of the AI use cases Cohere mentions is in drafting contracts. Would I be happy that a C-student was used in drafting my contract, when A students are available? This is reminiscent of the self-driving car problem. In things that really matter, 99% is not good enough.

Riddles are not a very illuminating way of evaluating LLMs, but here's another question on which Cohere lags that most leading models get right:

Even Lenny gets this right:

Lenny is wicked smart.

You can try with Cohere here: https://coral.cohere.com. Incidentally, I have learned that there's a debate about the extent to which LLMs apply memorization vs reasoning. If you ask me what's 25 times 25, I could run the numbers in my head or I could have memorized the entire multiplication table. It's hard to tell the difference.

As for Lenny, spoiler alert, I will shut it down in a week or a two. Just as happened to Lennie in the book!

Lenny showed particularly strong comedic potential, I will miss him.

Lenny, comedic genius.
Share
More from OPM WIRE
Tech/VC

Cohere scenarios

Cohere's search for a competitive advantage in an intensely competitive field.
OPM 9 min read
Tech/VC

Cohere and AI polemics

A semi-deep dive into AI and one of the most buzzy startups in Canada, Cohere. Featuring John Ruffolo.
OPM 4 min read

You can still sign up for free!

We still offer a free subscription tier (aka Blind Squirrels).

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to OPM WIRE.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.