Did Benedict Evans just discover AI is only good for solving 80%-OK problems?

Benedict Evans (one of the old-guard tech gurus) wrote a piece just a few days ago about the lack of viability of large language model (LLM) AI.

In a nutshell, he tested OpenAI’s Deep Research by evaluating smartphone adoption data and highlighted significant inaccuracies stemming from flawed data sources and misinterpretation of statistics. He then proceeded to discuss the “infinite intern” paradigm, questioning what was the value of spending as much time fact-checking your "intern" as you would have to do the research yourself, to begin with.

At this stage, the obvious response is to say that the models keep getting better, but this misses the point. Are you telling me that today’s model gets this table 85% right and the next version will get it 85.5 or 91% correct? That doesn’t help me. If there are mistakes in the table, it doesn’t matter how many there are - I can’t trust it. If, on the other hand, you think that these models will go to being 100% right, that would change everything, but that would also be a binary change in the nature of these systems, not a percentage change, and we don’t know if that’s even possible.

I was discussing this during the summer, not questioning whether these models might or might not get there but choosing to think they wouldn't. Which means the pragmatic question should be: Are you working on 80%-OK or 99%-OK problems? Self-driving cars must solve 99%-OK problems, and LLMs are, at best, dangerous. Summarizing your last meeting with middle management on current affairs is just an 80%-OK problem, and LLMs are plain OK.

In that regard, I was surprised by Benedict Evans's article, as I thought this conundrum was quite clear.

I guess not.

We're still all trying to grasp the implications of this now-widespread technology in real-time.