OpenAI has released GPT-4, its highly anticipated new language model incorporating multi-modal learning. While details are still scarce, a few key features have been speculated about or confirmed by OpenAI.
One of the most exciting potential developments with GPT-4 is the expected increase in performance over its predecessor, GPT-3.5. The current GPT-3.5 is known for its impressive ability to generate human-like text and answer questions. Hence, the possibility of an even more advanced language model is tempting. OpenAI has also hinted that the GPT-4 could be capable of more nuanced reasoning and decision-making, potentially even demonstrating a form of common sense.
The multi-modal learning feature will allow the model to learn from different types of data inputs, such as images, audio, or videos. This could lead to even more sophisticated text generation, as the model would have a more comprehensive understanding of the world.
The company says GPT-4’s improvements are evident in the system’s performance on a number of tests and benchmarks, including the Uniform Bar Exam, LSAT, SAT Math, and SAT Evidence-Based Reading & Writing exams. In the exams mentioned, GPT-4 scored in the 88th percentile and above, and a full list of exams and the system’s scores can be seen here.
Speculation about GPT-4 and its capabilities have been rife over the past year, with many suggesting it would be a huge leap over previous systems. However, judging from OpenAI’s announcement, the improvement is more iterative, as the company previously warned.
“People are begging to be disappointed and they will be,” said Altman in an interview about GPT-4 in January. “The hype is just like… We don’t have an actual AGI and that’s sort of what’s expected of us.”
The rumor mill was further energized last week after a Microsoft executive let slip that the system would launch this week in an interview with the German press. The executive also suggested the system would be multi-modal — that is, able to generate not only text but other mediums. Many AI researchers believe that multi-modal systems that integrate text, audio, and video offer the best path toward building more capable AI systems.
Also read: Microsoft restricts Bing AI chat to only 5 chats per session
GPT-4 is indeed multimodal, but in fewer mediums than some predicted. OpenAI says the system can accept both text and image inputs and emit text outputs. The company says the model’s ability to parse text and image simultaneously allows it to interpret more complex input.
In its announcement of GPT-4, OpenAI stressed that the system had undergone six months of safety training and that in internal tests, it was “82 percent less likely to respond to requests for disallowed content and 40 percent more likely to produce factual responses than GPT-3.5.”
However, that doesn’t mean the system doesn’t make mistakes or output harmful content. For example, Microsoft revealed that its Bing chatbot had been powered by GPT-4 all along, and many users were able to break Bing’s guardrails in all sorts of creative ways, getting the bot to offer dangerous advice, threaten users, and makeup information. GPT-4 also still lacks knowledge about events “that have occurred after the vast majority of its data cuts off” in September 2021
Also read: How to try new Bing with ChatGPT
You can try ChatGPT with GPT-4 model if you’ve subscribed to ChatGPT Plus, which only costs $20/month. You can watch the video of the GPT-4 demo by OpenAI here on YouTube.
Update: Added GPT-4 demo video by OpenAI from YouTube.