Open source DeepSeek R1 Distilled Models now available on watsonx.ai

Tablet lying on a table black background screen lit with text Hello Kate

Author

Maryam Ashoori

VP of Product and Engineering, watsonx.governance

IBM

Armand Ruiz

Vice President, Product Management - AI Platform (watsonx.ai & watsonx.gov)

IBM

Nisarg Patel

Product Manager, watsonx.ai

IBM

Both the Llama 3.1 8B and Llama 3.3 70b distilled variants of DeepSeek-R1 are now available on watsonx.ai, IBM’s enterprise-grade AI developer studio.

What is DeepSeek-R1?

DeepSeek-R1, the reasoning LLM from DeepSeek AI, is among the world's most powerful open-source models, rivaling the capabilities of even OpenAI’s o1. Released under the MIT License, DeepSeek-R1 was developed primarily through the direct use of reinforcement learning (RL) on the base model DeepSeek-V3—a major innovation in fine-tuning LLMs.

DeepSeek also used a technique called knowledge distillation to fine-tune multiple Llama and Qwen models using the data generated by the much larger R1 model. Users can access DeepSeek distilled models on watsonx.ai in two ways:

  • IBM offers both Llama distilled variants developed by DeepSeek AI and published on HuggingFace within watsonx.ai through the Deploy on Demand catalog, which allows users to deploy a dedicated instance for secure inferencing.
  • Users can also use the Custom Foundation Models import feature to import other variants of the DeepSeek-R1 model like the Qwen distilled models.

DeepSeek-R1-Distill-Llama-8B and DeepSeek-R1-Distill-Llama-70B are available in watsonx.ai “Deploy on Demand” catalog today.

It is important to note that no additional distillation was performed by IBM. The DeepSeek models are non-IBM models and their use is subject solely by the terms of the open source license under which they are released (e.g., no indemnification or warranty). The DeepSeek distilled models available via watsonx.ai do not interact with the DeepSeek mobile application or API.

What kind of use cases are enabled by DeepSeek-R1?

DeepSeek-R1 is an advanced AI model renowned for its exceptional reasoning capabilities, enabling a wide array of applications across various industries:

  • Planning: DeepSeek-R1's focus on chain-of-thought logic allows it to perform tasks requiring step-by-step reasoning, making it ideal for powering agentic applications.
  • Coding: DeepSeek-R1 excels in coding tasks, providing code generation, debugging assistance, and optimization suggestions.
  • Mathematical Problem-Solving: The model's strong reasoning capabilities make it adept at solving complex mathematical problems, which is beneficial in academic research, engineering, and scientific computations.  

Developers can build AI solutions within IBM watsonx.ai using deployed models like DeepSeek-R1 and solution capabilities that:

  • Test and evaluate model outputs in an easily digestible format and UI
  • Build a RAG pipeline by connecting to various vector DBs and embedding models
  • Work with popular frameworks and connectors like LangChain, CrewAI, and more

Why use DeepSeek Distilled Models on watsonx.ai?

IBM watsonx.ai enables clients to customize implementation of open-source models like DeepSeek-R1, from full flexibility of deployment environments to intuitive workflows for agent development, fine-tuning, RAG, prompt engineering and integration with enterprise applications. Users can take advantage of watsonx.ai’s built-in guardrails to protect their applications.

Of course, data security and AI governance are top concerns for our customers. In addition to guardrails, when deployed on watsonx.ai, these models become dedicated instances, which means that there is no data being shared anywhere else beyond the platform. Additionally, seamless integration with IBM watsonx.governance, a powerful governance, risk and compliance (GRC) tool kit, ensures your AI is responsible, transparent and explainable across the entire AI lifecycle.

Getting started with DeepSeek on IBM watsonx.ai

Support for DeepSeek-R1's distilled variants is part of IBM’s commitment to open source innovation in AI. Both DeepSeek Llama distilled models are available as part of the Deploy on Demand catalog on IBM watsonx.ai, able to be deployed on an hourly basis on a dedicated GPU.

Deploying R1 models on demand from the watsonx Resource hub

To deploy a foundation model on-demand from the Resource hub, complete the following steps:

1.   Open the Resource hub from the Navigation Menu.

2.   From the Pay by the hour section, find the DeepSeek model that you want to deploy on demand.

3.  From the model details page, click Deploy.

4.   Click Deploy from the foundation model tile, and then choose the deployment space where you want the foundation model to be deployed.

5.   Click Create.

6.   Start using the model through the Prompt Lab or through API/SDK:

Watsonx.ai is displaying the thought process of the model here in italics with the final output displayed in non-italics. As you can see, with a simple prompt, the model is reasoning and planning out the various sections that would need to be included in its response.

Deploying R1 models on watsonx through REST API

Alternatively, you can also use the REST API to deploy the model:

1.   Creating a model asset:

curl -X POST "https://<cluster url>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
    "type": "curated_foundation_model_1.0",
    "version": "1.0",
    "name": "deepseek",
    "space_id": "<Space id for deployment>",
    "foundation_model": {
    "model_id": "deepseek-ai/deepSeek-r1-distill-llama-8b-curated"
    }
}'

2.   Creating a deployment for an on-demand foundation model: 

curl -X POST "https://<cluster url>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer <replace with your token>" \
-H "content-type: application/json" \
--data '{
    "asset": {
        "id": <Asset id created>
    },
    "online": {
        "parameters": {
            "serving_name": "llama"
      }
    },
  "description": "<Description>,
  "name": "mi",
  "space_id": <Space id for deployment>
 }'

Of course, you must replace information like bearer token and space-id with the appropriate credentials. After the model is deployed, you can prompt the foundation model from the Prompt Lab or watsonx.ai API.

By providing users with access to best-in-class open models in watsonx.ai, including both third-party and IBM Granite, our goal is to foster a culture of collaboration and knowledge sharing.