After shaking up Silicon Valley with AI models earlier this year, Chinese startup DeepSeek is working on another innovation to help reduce operational costs. The company, led by Liang Wenfeng, has been working with researchers at Tsinghua University to develop a new approach called generative reward modelling (GRM), which rewards the AI model for following human preferences.
The new approach, first revealed in a pre-print paper (via Bloomberg), discusses the use of a technique called self-principled critique tuning (SPCT) to make AI models smarter and more efficient in a self-improving way.
The Chinese startup is calling these new models DeepSeek-GRM and plans to release them on an open source basis, just like its previous models. DeepSeek says its new AI models beat Google‘s Gemini 1.5 Pro, Meta’s Llama 3.1 and OpenAI’s GPT-4o in benchmark scores.
What is DeepSeek? What impact did it have on Western AI chatbots?
DeepSeek is a Chinese AI startup that operates a chatbot bearing the same name with two underlying models, DeepSeek V3 (frontier model like GPT-4o and Gemini 2.0) and DeepSeek R1 (reasoning model OpenAI o1).
DeepSeek’s AI models went public in January when the chatbot overtook ChatGPT as the most popular app on Apple and Google’s App Store. As demand for DeepSeek’s chatbot grew, the idea that Western companies were ahead of China in the AI race quickly collapsed, wiping around $1 trillion of value off tech stocks such as Nvidia and Microsoft.
DeepSeek’s AI models, built on a shoestring budget, also defied the notion that building AI models would require billions of dollars in investment.
DeepSeek had used a machine learning technique called Mixture of Experts (MoE) to make its models more efficient. The same technique was used by Meta when it released its Llama 4 Maverick and Llama 4 Scout models on Saturday.
While DeepSeek hasn’t said when it will release its new AI models, some reports suggest that the DeepSeek R2 model could be released as early as May, as AI launches by Western companies become more frequent.