Deepseek (AI)

= DeepSeek :author: The DeepSeek Team :revnumber: R1 :revdate: 2026-06-14 :doctype: article :lang: en

== Overview

DeepSeek is a family of open-source large language models (LLMs) developed by a Chinese nostr:naddr1qvzqqqrcvgpzpdwnfmku0k8crn42tmfh0ddz656wa9yl4cx94va6uzjhd2n5wh9lqqtkzun5d9nxjcmfv9kz66tww3jkcmrfvajkucm9xjg9sw[artificial intelligence] research company of the same name. Headquartered in Hangzhou, China, DeepSeek has rapidly emerged as a disruptive force in the AI industry since its founding in 2023. Its models, including the flagship DeepSeek-V3 and the reasoning-optimized DeepSeek-R1, have achieved performance comparable to industry-leading closed-source systems like OpenAI’s GPT-4 and GPT-o1, while requiring only a fraction of the training cost.

Unlike most commercial AI offerings, DeepSeek makes its model weights publicly available under permissive open-source licenses (such as the MIT License), enabling researchers, developers, and businesses worldwide to freely use, modify, and deploy its technology.

== History

=== Origins: From Quant Fund to AI Lab

DeepSeek’s parent company is High-Flyer, a quantitative hedge fund founded in 2015 in Hangzhou by Liang Wenfeng, a graduate of Zhejiang University who studied machine vision. Between 2019 and 2022, High-Flyer invested approximately $139 million to stockpile 10,000 Nvidia A100 GPUs, building a powerful supercomputer years before the AI boom.

In 2021, as Chinese regulators began cracking down on high-frequency speculative trading, Liang created a separate research team focused on fundamental AI research. This team operated independently from the investment group, hiring top graduates from Chinese universities and collaborating closely with academics.

=== Founding of DeepSeek

In July 2023, High-Flyer spun off this AI research division as DeepSeek (a name combining “deep” from deep learning and “seek” for exploration). Unlike typical AI startups chasing immediate commercialization, DeepSeek was funded directly from High-Flyer’s R&D budget, allowing the company to focus on long-term research goals without pressure for rapid monetization.

By 2025, the company had grown to approximately 160-200 employees—a fraction of the workforce of competitors like OpenAI (3,500+ employees).

=== Rapid Release Cycle

DeepSeek has maintained an aggressive release schedule:

|=== | Date | Release | Key Features

| November 2023 | DeepSeek Coder | First open-weight models for code generation

| December 2023 | DeepSeek-V1 | General-purpose LLM

| May 2024 | DeepSeek-V2 | Introduction of MLA and DeepSeekMoE architectures

| December 27, 2024 | DeepSeek-V3 | 671B parameter MoE model, trained for $5.6M

| January 20, 2025 | DeepSeek-R1 | Reasoning-optimized model with visible chain-of-thought

| January 27, 2025 | Janus-Pro | Image generation model surpassing DALL-E 3 |===

=== Global Impact (January 2025)

The release of DeepSeek-V3 and R1 in January 2025 sent shockwaves through global technology markets. The revelation that a small Chinese startup could train models comparable to GPT-4 for under $6 million (versus estimated $100+ million for OpenAI) caused steep declines in the stock prices of AI infrastructure companies like Nvidia and sparked widespread reassessment of AI development economics.

In March 2025, the Chinese government officially designated DeepSeek a “national treasure,” placing travel restrictions on key employees to protect the technology.

== What DeepSeek Is About

=== Philosophy

DeepSeek’s approach to AI development is defined by several core principles:

Open-source transparency – All major model weights are publicly released under permissive licenses, enabling independent verification, customization, and deployment
Radical efficiency – Prioritizing algorithmic innovation and system optimization over brute-force compute scaling
Reasoning-first design – Developing models that can “show their work” through explicit chain-of-thought reasoning
Long-term research focus – Backed by hedge fund resources rather than venture capital, allowing patient R&D without short-term monetization pressure

=== What Makes DeepSeek Different

Remarkable cost efficiency – DeepSeek-V3 was reportedly trained for approximately $5.6 million in compute costs, compared to estimates of $100 million to $1 billion for comparable models from OpenAI and Meta. This represents a 10-100x reduction in training cost.

Open-source under MIT license – Unlike GPT-4, Claude, or Gemini, DeepSeek’s model weights are freely downloadable and usable for commercial purposes.

Visible reasoning – DeepSeek-R1 pioneered exposing its chain-of-thought reasoning process to users, enclosed in <think> tags, providing unprecedented transparency into how the model arrives at answers.

Bilingual proficiency – DeepSeek models are trained on both English and Chinese corpora, offering strong performance in both languages.

Small team, big impact – With roughly 200 employees, DeepSeek achieved what took organizations 10-20 times larger, demonstrating that algorithmic innovation can substitute for raw scale.

=== Key Technical Innovations

Mixture-of-Experts (MoE) Architecture – DeepSeek-V3 uses 671 billion total parameters, but only 37 billion are activated per token. This sparse activation dramatically reduces computational costs while maintaining massive model capacity. Unlike dense models (like GPT-4’s reported 1.8 trillion parameters activated for every operation), DeepSeek’s approach avoids redundant calculations.

Multi-Head Latent Attention (MLA) – A novel attention mechanism that compresses the key-value (KV) cache into a low-rank latent vector, reducing memory consumption by up to 93% during inference. This enables efficient handling of context windows up to 128,000 tokens.

Auxiliary-Loss-Free Load Balancing – Traditional MoE models require auxiliary losses to balance expert utilization, which degrades performance. DeepSeek pioneered a load balancing strategy that avoids these losses entirely.

Group Relative Policy Optimization (GRPO) – A reinforcement learning algorithm developed by DeepSeek that eliminates the need for a separate “critic model” used in traditional RLHF (Reinforcement Learning from Human Feedback). GRPO uses group-based scoring comparisons, making training more stable and efficient.

FP8 Mixed Precision Training – DeepSeek was the first to successfully validate 8-bit floating-point precision training on an extremely large-scale model (671B parameters), accelerating training and reducing GPU memory usage.

Multi-Token Prediction (MTP) – Rather than predicting tokens one at a time during training, DeepSeek models learn to predict multiple future tokens simultaneously, improving overall performance and enabling speculative decoding for faster inference.

== Example Snippets

=== Interacting with DeepSeek-R1 (Reasoning Demonstration)

When solving a complex problem, DeepSeek-R1 reveals its chain-of-thought reasoning process:

[source,text] –– User: If a bat and a ball cost $1.10 in total, and the bat costs $1.00 more than the ball, how much is the ball?

<think> Let me work through this step by step. Let the ball cost x dollars. Then the bat costs x + $1.00. Together: x + (x + 1.00) = 1.10 2x + 1.00 = 1.10 2x = 0.10 x = 0.05 So the ball costs 5 cents. </think>

The ball costs $0.05 (5 cents). The bat costs $1.05, and together they sum to $1.10. ––

=== Using DeepSeek via API

[source,python] –– # Using DeepSeek’s API (similar to OpenAI’s interface) import requests

api_key = “your-deepseek-api-key” url = “https://api.deepseek.com/v1/chat/completions”

headers = { “Authorization”: f“Bearer {api_key}”, “Content-Type”: “application/json” }

payload = { “model”: “deepseek-reasoner”, # DeepSeek-R1 “messages”: [ “role“: “system”, “content”: “You are a helpful coding assistant.”, “role“: “user”, “content”: “Write a Python function to check if a number is prime.” ], “temperature”: 0.7 }

response = requests.post(url, json=payload, headers=headers) print(response.json()“choices”“message”) ––

== The DeepSeek Model Family

Beyond the flagship V3 and R1 models, DeepSeek has released specialized variants:

DeepSeek-Coder – Models optimized for code generation and programming tasks, supporting 338 programming languages
DeepSeek-Math – Specialized 7B parameter model achieving 51.7% accuracy on the MATH benchmark
DeepSeek-Prover-V2 – A 671B MoE model designed for theorem proving in Lean 4
DeepSeek-VL / VL2 – Multimodal models capable of processing images alongside text (up to 1024×1024 resolution)
Janus-Pro – Image generation model reported to surpass DALL-E 3
Distilled R1 variants – Smaller models (1.5B to 70B parameters) based on Llama and Qwen architectures

== Performance and Cost Comparison

DeepSeek’s pricing is highly competitive compared to other leading models:

| Claude-3.7 | $3.00 / 1M tokens | $15.00 / 1M tokens | 44 | 57 |===

DeepSeek-R1 achieves the second-highest Coding Index (49) and Intelligence Index (60) among all models listed, while maintaining token prices approximately 10-30x lower than GPT-o1.

== Applications

DeepSeek models are being deployed across numerous domains:

Healthcare – Medical literature analysis, clinical decision support
Finance – Algorithmic trading signals, financial document analysis
Education – Personalized tutoring, automated grading, explanation generation
Software development – Code generation, debugging assistance, documentation
Customer service – Intelligent chatbots, ticket triage, FAQ automation
Content creation – Marketing copy, technical writing, translation
Research – Scientific literature summarization, hypothesis generation

== Ethical and Geopolitical Considerations

DeepSeek’s rise has raised several important questions:

Export controls and hardware access – DeepSeek developed its models using Nvidia H800 GPUs (a lower-bandwidth variant of the H100 that remained permissible for export to China after October 2023 restrictions). The company’s ability to achieve world-class results with restricted hardware demonstrates that algorithmic innovation can partially circumvent chip sanctions.

Estimates vs. reality – While DeepSeek claims $5.6 million in training costs, some analysts (including SemiAnalysis) estimate the company had access to 50,000 H800 GPUs and spent over $1.3 billion in total development costs. The true cost picture remains debated.

National security designation – The Chinese government’s classification of DeepSeek as a “national treasure” and imposition of travel restrictions on key employees in March 2025 underscores the technology’s perceived strategic value.

Open-source security – The permissive MIT license allows anyone to use DeepSeek models, raising both democratization benefits and potential misuse concerns.

== Further Resources

Official DeepSeek website: https://deepseek.com
DeepSeek API documentation: https://platform.deepseek.com
GitHub repository: https://github.com/deepseek-ai
Technical papers: https://github.com/deepseek-ai/DeepSeek-V3 (technical report)
Hugging Face model hub: DeepSeek models available for download

== License

DeepSeek models are released under the MIT License, a permissive open-source license that allows commercial use, modification, distribution, and private use with minimal restrictions. The company’s hosted API services are governed by separate terms of service.

Comments

About this entry

Event Id

Raw event

Other authors