GPT-4: A New Milestone in Scaling Up Deep Learning

OpenAI has released its latest natural language processing system, GPT-4, which promises to be even more advanced and capable than its predecessor, GPT-3. GPT-4 is built on the same deep learning approach as the previous models but leverages more data and computation to achieve greater sophistication and accuracy.

OpenAI has recently announced the creation of its latest deep learning model, GPT-4. It is a large multimodal model that accepts image and text inputs and outputs text. While GPT-4 is not as capable as humans in many real-world scenarios, it has shown human-level performance on various professional and academic benchmarks.

Access to GPT-4

GPT-4's text input capability is now available via ChatGPT Plus and the API, with a waitlist. OpenAI is also collaborating closely with a single partner to prepare the image input capability for wider availability. Additionally, OpenAI is open-sourcing OpenAI Evals, their framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in their models to help guide further improvements.

GPT-4's Capabilities

GPT-4 is a multimodal model that can accept both image and text inputs and generate text outputs. While it is not as capable as humans in real-world scenarios, it has achieved human-level performance on several academic and professional benchmarks.

OpenAI tested them on a variety of benchmarks, including simulating exams that were originally designed for humans. They tested GPT-4 using the most recent publicly-available tests (in the case of the Olympiads and AP free response questions) or by purchasing 2022–2023 editions of practice exams. OpenAI did no specific training for these exams. A minority of the problems in the exams were seen by the model during training, but they believe the results to be representative.

GPT-4 has shown better performance than GPT-3.5 on the benchmarks, exhibiting human-level performance on various professional and academic exams, such as passing a simulated bar exam with a score around the top 10% of test takers. In contrast, GPT-3.5's score was around the bottom 10%.

OpenAI has also shared the estimated percentile lower bound among test-takers for each exam. The results showed that GPT-4 performed better than GPT-3.5 in all the tests. For example, GPT-4 scored 298 out of 400 on the Uniform Bar Exam (MBE+MEE+MPT), which is around the 90th percentile, while GPT-3.5 scored 213 out of 400, around the 10th percentile. The results demonstrate the significant improvement that GPT-4 has achieved over its predecessor.

GPT-4 scored high on traditional machine learning benchmarks, outperforming existing large language models.

Safety and alignment

In addition to its advanced language processing capabilities, GPT-4 is designed with safety and alignment in mind. OpenAI spent six months making the model safer and more aligned, resulting in a system that is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on internal evaluations. OpenAI achieved this by incorporating more human feedback, including feedback from ChatGPT users, and working with over 50 experts in domains such as AI safety and security.

Built with GPT-4

OpenAI has released some information about companies that are already integrating ChatGPT in their product, some of them are:

  • Duolingo
  • Be My Eyes
  • Stripe
  • Morgan Stanley
  • Government of Iceland

Conclusion

GPT-4 is the latest achievement in OpenAI's efforts to scale up deep learning. It is a large multimodal model that has achieved human-level performance on various academic and professional benchmarks. Although it is still far from perfect, it exhibits improved factuality, steerability, and adherence to guardrails, making it more reliable, creative, and able to handle nuanced instructions than its predecessor, GPT-3.5. We are excited to see what GPT4 brings after seeing the hype and incredible amount of products that raised from ChatGPT.

Get up and running with one engineer in one sprint

Guaranteed lift within your first 30 days or your money back

100M+
Users and items
1000+
Queries per second
1B+
Requests

Related Posts

Nina Shenker-Tauris
 | 
February 21, 2023

Do Large Language Models (LLMs) reason?

Daniel Camilleri
 | 
November 1, 2023

Search the way you think: how personalized semantic search is disrupting traditional search

Javier Iranzo-Sanchez
 | 
August 8, 2022

Information Retrieval Systems, the precursors of Recommender Systems