The new ChatGPT offers a lesson in the AI ​​hype

When OpenAI unveiled the latest version of its wildly popular ChatGPT chatbot this month, it had a new voice that possessed human inflections and emotions. The online demonstration also involved the bot teaching a child to solve a geometry problem.

Much to my chagrin, the demo turned out to be essentially a bait and switch. The new ChatGPT was released without most of the new features, including improved voice (which the company told me it has pushed back to make fixes). Even the ability to use a phone camera to get real-time analysis of something like a math problem isn't yet available.

Amid the delay, the company also disabled the ChatGPT voice that some said resembled actress Scarlett Johansson, after threatening legal action, replacing it with a different female voice.

For now, what has actually been implemented in the new ChatGPT is the ability to upload photos for the bot to analyze. Users can generally expect faster and more lucid responses. The bot can also perform real-time language translation, but ChatGPT will respond in its older, machine-like voice.

However, this is the leading chatbot that has disrupted the tech industry, so it's worth reviewing. After trying the fast chatbot for two weeks, I had mixed feelings. He excelled at language translation, but struggled with mathematics and physics. All in all, I haven't seen a significant improvement over the last version, ChatGPT-4. I definitely wouldn't let him tutor my son.

This tactic, in which AI companies promise new features and deliver an incomplete product, is becoming a trend that is bound to confuse and frustrate people. The $700 Ai Pin, a talking pin from the start-up Humane, funded by OpenAI CEO Sam Altman, was universally criticized for overheating and spouting nonsense. Meta also recently added an AI chatbot to its apps that did a poor job at most advertised tasks, such as web searches for airline tickets.

Companies are releasing AI products in a premature state, partly because they want people to use the technology to learn how to improve it. In the past, when companies introduced new tech products like phones, what we were shown – features like new cameras and brighter screens – was what we were getting. With artificial intelligence, companies provide a preview of a potential future, demonstrating technologies that are developing and that only work under limited and controlled conditions. A mature and reliable product may arrive, or not.

The lesson to be learned from all of this is that we, as consumers, should resist the hype and take a slow and cautious approach to AI. We shouldn't spend a lot of money on poorly developed technology until we see proof that the tools work as advertised.

The new version of ChatGPT, called GPT-4o (“o” as in “omni”), is now available to try for free on the OpenAI website and app. Non-paying users can make a few requests before reaching a timeout, and those with a $20 monthly subscription can ask the bot more questions.

OpenAI said its iterative approach to updating ChatGPT allowed it to gather feedback to make improvements.

“We believe it is important to preview our advanced models to give people an idea of ​​their capabilities and help us understand their real-world applications,” the company said in a statement.

(Last year the New York Times sued OpenAI and its partner Microsoft for using copyrighted news articles without permission to train chatbots.)

Here's what to know about the latest version of ChatGPT.

To show off ChatGPT-4o's new tricks, OpenAI released a video featuring Sal Khan, CEO of Khan Academy, the educational nonprofit, and his son Imran. With a camera trained on a geometry problem, ChatGPT was able to explain to Imran how to solve it step by step.

Although ChatGPT's video analysis feature has yet to be released, I was able to upload photos of geometry problems. ChatGPT solved some of the simpler problems successfully, but stumbled on more challenging problems.

For a triangle intersection problem I discovered on an SAT prep website, the bot understood the question but gave the wrong answer.

Taylor Nguyen, a high school physics teacher in Orange County, California, uploaded a physics problem involving a man on a swing that is commonly included in advanced calculus placement tests. ChatGPT made several logical errors to give the wrong answer, but was able to correct itself with feedback from Mr. Nguyen.

“I was able to coach him, but I'm a teacher,” he said. “How should a student spot those mistakes? They're assuming the chatbot is right.”

I noticed that ChatGPT-4o managed to perform some division calculations that its predecessors performed incorrectly, so there are signs of slow improvement. But it also failed at a basic mathematical task that previous versions and other chatbots, including Meta AI and Google's Gemini, have failed at: the ability to count. When I asked ChatGPT-4o for a four-syllable word that began with the letter “W,” it responded, “Wonderful.”

OpenAI said it is constantly working to improve its systems' responses to complex mathematical problems.

Mr. Khan, whose company uses OpenAI technology in its Khanmigo tutoring software, did not respond to a request for comment on whether he would leave ChatGPT to tutor his son alone.

OpenAI also highlighted that the new ChatGPT was better at reasoning or using logic to provide answers. So I ran it through one of my favorite tests: I asked it to generate a Where's Waldo? puzzle. When it showed a picture of a giant Waldo standing in a crowd, I said the point is that he should be hard to find.

The bot then spawned an even bigger Waldo.

Subbarao Kambhampati, a professor and AI researcher at Arizona State University, also put the chatbot through some testing and said he didn't notice any notable improvements in reasoning compared to the latest version.

He presented ChatGPT with a puzzle involving blocks:

If block C is on top of block A and block B is separately on the table, can you tell me how I can create a stack of blocks with block A on top of block B and block B on top of block C, but without floating block C?

The answer is that it is impossible to arrange the blocks under these conditions, but, just as with previous versions, ChatGPT-4o consistently found a solution that involved moving block C. With this and other reasoning tests, ChatGPT was occasionally able to take feedback to get the right answer, which is antithetical to how AI is supposed to work, Kambhampati said.

“You can correct it, but when you do that you are using your own intelligence,” he said.

OpenAI pointed to test results that showed GPT-4o scored about two percentage points higher in answering general knowledge questions than previous versions of ChatGPT, demonstrating that its reasoning abilities had improved slightly.

OpenAI also said that the new ChatGPT could perform real-time language translation, which could help you converse with someone who speaks a foreign language.

I tested ChatGPT with Mandarin and Cantonese and confirmed that I was correctly translating phrases like “I'd like to book a hotel room for next Thursday” and “I want a king-size bed.” But the accents were slightly off. (To be fair, my broken Chinese isn't much better.) OpenAI said it's still working on improving the accents.

ChatGPT-4o also excelled as an editor. When I provided him with paragraphs I wrote, he was quick and effective in removing excessive words and jargon. ChatGPT's decent performance with language translation gives me confidence that this will soon become a more useful feature.

One important thing OpenAI got right with ChatGPT-4o is making the technology free for people to try. Free is the right price: since we are helping train these AI systems with our data to improve them, we shouldn't have to pay for them.

The best of AI is yet to come and one day we might talk to a good math tutor. But we should believe it when we see and hear it.

Leave a Reply

Your email address will not be published. Required fields are marked *