Chinese AI start-up DeepSeek has introduced a new way to improve the reasoning abilities of large language models (LLMs) to deliver better and faster results for general queries compared to its competitors.
DeepSeek sparked excitement when it launched with R1, an AI model and chatbot that claimed to be more cost-effective and perform as well as OpenAI’s ChatGPT.
Collaborating with researchers from China’s Tsinghua University, DeepSeek stated in its latest paper released on Friday that it developed a method for self-improving AI models.
The underlying technology, named self-principled critique tuning (SPCT), trains AI to create its own rules for judging content and then applies those rules to provide detailed critiques.
The AI model achieves better results by performing several evaluations simultaneously instead of relying on larger models. This approach is known as generative reward modeling (GRM), which evaluates and rates AI outputs to ensure alignment with human expectations, using SPCT.
How does it work?
Improving AI usually involves enlarging models during the training process, which requires significant human effort and computational power. In contrast, DeepSeek has developed a system with an integrated “judge” that assesses the AI’s responses in real-time.
When a question is asked, this judge compares the AI’s planned response against its core principles and the expected quality of the answer.
If there is a close match, the AI receives positive feedback, facilitating its improvement.
DeepSeek refers to this self-improving system as “DeepSeek-GRM”. The researchers assert that this will enable models to perform better than competitors such as Google’s Gemini, Meta’s Llama, and OpenAI’s GPT-4o.
DeepSeek intends to make these advanced AI models available as open-source software, although no specific timeline has been provided.
The release of the paper coincides with rumors that DeepSeek is about to unveil its latest R2 chatbot. However, the company has not publicly commented on any planned releases.