DeepSeek, in its research paper, revealed that the company bet big on reinforcement learning (RL) to train both of these models.
OpenAI suspects DeepSeek distilled its advanced models into a smaller, cheaper version without permission. Distillation implies that DeepSeek may have used OpenAI’s outputs as “teacher” data to train ...