December 26, 2024:
DeepSeek-V3 Outshines Rivals with Efficient AI - Chinese AI startup DeepSeek has launched DeepSeek-V3, a 671B-parameter model using a mixture-of-experts architecture, surpassing Meta's Llama and Qwen models. It introduces load-balancing and multi-token prediction for enhanced performance and speed. Trained economically at $5.57 million, DeepSeek-V3 leads open-source models and challenges closed models like GPT-4o in benchmarks.
The model is available on GitHub under DeepSeek's license, with an API for commercial use, promoting competitive industry diversity.