My IBM

Deepseek Reasoning: Improving an R1 distilled model with RAG and watsonx.ai

5 February 2025

Authors

Anna Gutowska

AI Engineer, Developer Advocate

IBM

Ash Minhas

Manager, Technical Content | AI Advocate

IBM

In this tutorial, we will leverage the reasoning abilities of the Llama 3.3 70b distilled variant of the DeepSeek-R1 large language model (LLM) now available on watsonx.ai™ by using IBM® Docling in Python. The use case is to process a request for proposals (RFP) and create a business-specific proposal in response.

Reasoning capabilities of LLMs

Recent advancements in machine learning and deep learning have greatly improved the emergent logical reasoning skills of state-of-the-art large language models (LLMs). This development has caused significant debate about whether LLMs are truly capable of reasoning or whether they are simply imitating human decision-making by following the patterns present in their training data.

There are many types of reasoning such as common sense, abductive, deductive and inductive reasoning. These forms of reasoning are innate to many of us humans, but it is exceedingly difficult to build an AI model capable of excelling in all forms of reasoning. LLMs are bound by the knowledge acquired during the training process. A model might excel in mathematical reasoning or a common benchmark but might completely falter when applied to a different use case. If LLMs are truly capable of multi-step reasoning, their capacity to do so would be general and not limited to a particular example. Hence, in this tutorial, we recognize human and LLM reasoning as distinct from one another.

Ways to improve LLM reasoning

To supplement an LLM’s training data without fine-tuning, we can perform retrieval augmented generation (RAG). RAG is a technique in natural language processing (NLP) that grounds the model on an up-to-date, accurate dataset to facilitate in-context learning.

To encourage complex reasoning and problem-solving, chain of thought (CoT) prompting can also be used. Chain of thought reasoning is an approach in artificial intelligence that simulates human-like reasoning processes by decoding complex problems into reasoning steps toward a final resolution. Variants of chain of thought prompting can be zero-shot, automatic and multimodal.

DeepSeek-R1 combines chain of thought reasoning with reinforcement learning to enhance performance. In this tutorial, we demonstrate how to use RAG as another way of improving the model’s semantic interpretability and reasoning.

Prerequisites

You need an IBM Cloud® account.

Steps

Step 1. Set up your environment

While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook.

Log in to watsonx.ai by using your IBM Cloud® account.
Create a watsonx.ai project.

You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.

This step will open a Notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. This Jupyter Notebook can be found on GitHub.

Step 2. Set up a watsonx.ai Runtime instance and API key

Create a watsonx.ai Runtime service instance (select your appropriate region).
Generate an API Key.
Associate the watsonx.ai Runtime service instance to the project that you created in watsonx.ai.

Step 3. Deploy DeepSeek-R1’s distilled variant on IBM watsonx.ai

The reasoning model that we use in this tutorial is deepseek-r1-distill-llama-70b. The DeepSeek-R1 distilled variants based on Llama and Qwen are the new models now available on watsonx.ai. DeepSeek-V3, DeepSeek-R1 and DeepSeek-R1-Zero, the generative AI models from Chinese startup DeepSeek, are some of the most powerful open-source reasoning models, rivaling the model performance of OpenAI’s o1 series of models.

Reference the instructions for how DeepSeek distilled variants can be deployed as a foundation model on-demand from the Resource hub in the IBM announcement blog. The DeepSeek-R1’s distilled smaller models can be deployed on an hourly basis on a dedicated GPU.

Important: To run DeepSeek-R1 distilled variants in watsonx.ai, you need to deploy the model to a GPU before proceeding with the rest of this tutorial.

Step 4. Install and import relevant libraries and set up your credentials

We need a few libraries and modules for this AI application. Make sure to import the following ones and if they're not installed, a quick pip installation resolves the problem.

# Install required packages
!pip install -q "langchain>=0.1.0" "langchain-community>=0.0.13" "langchain-core>=0.1.17" \
"langchain-ollama>=0.0.1" "pdfminer.six>=20221105" "markdown>=3.5.2" "docling>=2.0.0" \
"beautifulsoup4>=4.12.0" "unstructured>=0.12.0" "chromadb>=0.4.22" "faiss-cpu>=1.7.4" \
"requests>=2.32.0" "langchain-ibm>=0.3.5"

# Required imports
import os
import tempfile
import shutil
import getpass

from pathlib import Path
from IPython.display import Markdown, display
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

# Docling imports
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions, TesseractCliOcrOptions
from docling.document_converter import DocumentConverter, PdfFormatOption, WordFormatOption, SimplePipeline

# LangChain imports
from langchain_community.document_loaders import UnstructuredMarkdownLoader, WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain_ibm import WatsonxLLM

To set our credentials, we need the WATSONX_APIKEY and WATSONX_PROJECT_ID you generated in step 1. We will also set the URL serving as the API endpoint.

WATSONX_APIKEY = getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): ")
WATSONX_PROJECT_ID = getpass.getpass("Please enter your project ID (hit enter): ")
URL = "https://us-south.ml.cloud.ibm.com"

Step 5. Initialize the LLM

We will use the Llama 3.3 70b distilled variant of the DeepSeek-R1 large language model as our model for this tutorial. To initialize the LLM, we need to set the model parameters. To learn more about these model parameters, such as the minimum and maximum token limits, refer to the documentation.

llm = WatsonxLLM(
    model_id="deepseek-ai/deepseek-r1-distill-llama-70b",
    url=URL,
    apikey=WATSONX_APIKEY,
    project_id=WATSONX_PROJECT_ID,
    params={
        GenParams.DECODING_METHOD: "greedy",
        GenParams.TEMPERATURE: 0,
        GenParams.MIN_NEW_TOKENS: 5,
        GenParams.MAX_NEW_TOKENS: 2000,
        GenParams.REPETITION_PENALTY:1.2
    }
)

Step 6. Document format detection

We work with various document formats in this tutorial. Let's create a helper function to detect document formats by using the file extension.

def get_document_format(file_path) -> InputFormat:
    """Determine the document format based on file extension"""
    try:
        file_path = str(file_path)
        extension = os.path.splitext(file_path)[1].lower()
        format_map = {
            '.pdf': InputFormat.PDF,
            '.docx': InputFormat.DOCX,
            '.doc': InputFormat.DOCX,
            '.pptx': InputFormat.PPTX,
            '.html': InputFormat.HTML,
            '.htm': InputFormat.HTML
        }
        return format_map.get(extension, None)
    except:
        return "Error in get_document_format: {str(e)}"

Step 7. Document conversion

Next, we can use the DocumentConverter class to create a function that converts any supported document to markdown. This function identifies text, data tables, document images and captions by using Docling. The function takes a file as input, processes it using Docling's advanced document handling, converts it to markdown and saves the results in a Markdown file. Both scanned and text-based documents are supported and document structure is preserved. Key components of this function are:

PdfPipelineOptions : Configures how PDFs are processed.
TesseractCliOcrOptions : Sets up OCR for scanned documents.
DocumentConverter : Handles the actual conversion process

def convert_document_to_markdown(doc_path) -> str:
    """Convert document to markdown using simplified pipeline"""
    try:
        # Convert to absolute path string
        input_path = os.path.abspath(str(doc_path))
        print(f"Converting document: {doc_path}")

        # Create temporary directory for processing
        with tempfile.TemporaryDirectory() as temp_dir:
            # Copy input file to temp directory
            temp_input = os.path.join(temp_dir, os.path.basename(input_path))
            shutil.copy2(input_path, temp_input)

            # Configure pipeline options
            pipeline_options = PdfPipelineOptions()
            pipeline_options.do_ocr = False # Disable OCR temporarily
            pipeline_options.do_table_structure = True

            # Create converter with minimal options
            converter = DocumentConverter(
                allowed_formats=[
                    InputFormat.PDF,
                    InputFormat.DOCX,
                    InputFormat.HTML,
                    InputFormat.PPTX,
                ],
                format_options={
                    InputFormat.PDF: PdfFormatOption(
                        pipeline_options=pipeline_options,
                    ),
                    InputFormat.DOCX: WordFormatOption(
                        pipeline_cls=SimplePipeline
                    )
                }
            )

  # Convert document
  print("Starting conversion...")
  conv_result = converter.convert(temp_input)

  if not conv_result or not conv_result.document:
      raise ValueError(f"Failed to convert document: {doc_path}")

  # Export to markdown
  print("Exporting to markdown...")
  md = conv_result.document.export_to_markdown()

  # Create output path
  output_dir = os.path.dirname(input_path)
  base_name = os.path.splitext(os.path.basename(input_path))[0]
  md_path = os.path.join(output_dir, f"{base_name}_converted.md")

  # Write markdown file
  print(f"Writing markdown to: {base_name}_converted.md")
  with open(md_path, "w", encoding="utf-8") as fp:
      fp.write(md)

  return md_path
    except:
        return f"Error converting document: {doc_path}"

Step 8. QA chain setup

The QA chain is the heart of our RAG system. Our RAG application combines several components:

Document loading:

Loads the markdown file that we created.
Loads the scraped web data.

Text splitting:

Breaks down the document into smaller pieces.
Maintains context with overlap between chunks.
Ensures efficient processing by the language model.

Vector database:

Creates embeddings for each text chunk.
Stores them in a FAISS index for fast retrieval.
Enables semantic search capabilities.

Language model:

Uses Ollama for embeddings and the watsonx.ai API for text generation.
Maintains conversation history.
Generates contextual responses.

The following setup_qa_chain function sets up this entire RAG pipeline.

def setup_qa_chain(markdown_path: Path, web_pages: list, embeddings_model_name:str = "nomic-embed-text:latest", model_name: str = "granite3.1-dense:8b"):
  # Load and split the document metadata
    loader = UnstructuredMarkdownLoader(str(markdown_path))
    markdown_doc = loader.load()

    loaded_pages = [WebBaseLoader(url).load() for url in web_pages]
    web_page_docs = [item for sublist in loaded_pages for item in sublist]

    documents = markdown_doc + web_page_docs

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    texts = text_splitter.split_documents(documents)

    # Transform knowledge base to vector embeddings stored in a vector store
    embeddings = OllamaEmbeddings(model=embeddings_model_name)
    vectorstore = FAISS.from_documents(texts, embeddings)

    # Initialize LLM
  llm = WatsonxLLM(
        model_id=model_name,
        url=URL,
        apikey=WATSONX_APIKEY,
        project_id=WATSONX_PROJECT_ID,
        params={
            GenParams.DECODING_METHOD: "greedy",
            GenParams.TEMPERATURE: 0,
            GenParams.MIN_NEW_TOKENS: 5,
            GenParams.MAX_NEW_TOKENS: 2000,
            GenParams.REPETITION_PENALTY:1.2
        }
    )

    # Set up conversation memory
    memory = ConversationBufferMemory(
        memory_key="chat_history",
        output_key="answer",
        return_messages=True
    )

    # Create the chain
    qa_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=vectorstore.as_retriever(
            search_kwargs={"k": 10}
        ),
        memory=memory,
        return_source_documents=True
    )

    return qa_chain

Step 9. Set up question-answering interface

Finally, let's create a simple interface for asking questions. This function takes in the chain and user query as parameters. The function also improves the readability of the displayed question and answer.

def ask_question(qa_chain, question: str):
    """Ask a question and display the answer"""
    result = qa_chain.invoke({"question": question})
    display(Markdown(f"**Question:** {question}\n\n**Answer:** {result['answer']}"))

Step 10. Perform question-answering

There are several real-world applications of reasoning tasks. This tutorial serves as a step-by-step guide for using a pretrained AI model to process a New York State RFP and formulate a proposal. The path to our RFP is stored in doc_path. The URLs used for web scraping are from ibm.com and describe the software offerings of IBM relevant to this RFP.

Note: The use of this software solution RFP is for illustrative purposes only. The document is publicly available and was accessed for this tutorial on February 5, 2025.

# Process the RFP document
doc_path = Path("Next-Gen-Software-Solution-RFP.pdf") # Replace with your document path

# Check format and process
doc_format = get_document_format(doc_path)
if doc_format:
md_path = convert_document_to_markdown(doc_path)
else:
print(f"Unsupported document format: {doc_path.suffix}")

urls = ["https://www.ibm.com/products/blog/5-things-to-know-ibm-clouds-mission-to-accelerate-innovation-for-clients",
"https://newsroom.ibm.com/Blog-How-IBM-Cloud-is-Accelerating-Business-Outcomes-with-Gen-AI"]

qa_chain = setup_qa_chain(md_path, urls)
question = "List out the key scope and requirements of the RFP. Then draft a detailed RFP response as though it was written by IBM. Be explicit about the technical approach and technologies using the provided context from IBM's website."
ask_question(qa_chain, question)

Output:

Question: List out the key scope and requirements of the RFP. Then draft a detailed RFP response as though it was written by IBM. Be explicit about the technical approach and technologies using the provided context from IBM's website.

Answer: Okay, so I need to figure out what exactly the user is asking here. They've given me some sections from an RFP document and then asked two things: first, to list the key scope and requirements of the RFP based on the provided context; second, to draft a detailed RFP response as if it were from IBM, making sure to explicitly mention the technical approach and technologies, possibly pulling info from IBM's site since they mentioned that.

Alright, starting with part one—key scope and requirements. Looking through the context snippets, there are several points that stand out. First, under VII. QUESTIONS, it says that any questions related to the RFP should be emailed to a specific address, referencing the relevant pages and sections according to the schedule in Section IV. Also, late questions might not get answers, and all Q&As will be publicly posted. That tells me communication protocols and deadlines are important.

Then, VIII. GENERAL PROVISIONS mentions that each bidder needs to submit a clear, concise proposal focusing on compliance with RFP instructions, completeness, and clarity. So, the proposal has to strictly adhere to guidelines, probably including formatting and content specifics.

Looking further down, under A. MINIMUM QUALIFICATION REQUIREMENTS, bidders must detail their current capabilities, past experience especially with states and big cities, and how that applies to NYS. This indicates that relevant experience and adaptability to NY’s environment are crucial.

There's also something about proper format: creating a named folder and uploading via a Dropbox link. Late submissions aren’t allowed, so timeliness is another requirement.

Section D. PROJECT PLAN/APPROACH requires demonstrating competence and capacity, likely meaning a solid project plan is needed. Similarly, C. STAFF EXPERIENCE AND QUALIFICATIONS emphasizes having knowledgeable and capable staff, supported perhaps by references who can vouch for their performance.

Additionally, the TECHNICAL PROPOSAL includes elements like table of contents, estimated cost, diversity practices, etc., showing that multiple aspects beyond just technical skills are evaluated.

So compiling all that, the key scope seems to involve developing a next-gen software solution tailored to NYS needs, while the main requirements would cover submission guidelines, necessary documentation (like project plans, team qualifications), minimum qualifications (past experience, capability), adherence to formats, timely delivery, and inclusion of various administrative parts.

Now moving to drafting the RFP response as IBM. Since I'm supposed to use context from IBM's website, I'll think about their strengths. IBM often touts AI, cloud solutions, data analytics, blockchain, IoT, and strong consulting services. Their approach usually involves scalable, secure, and innovative tech.

I’d structure the response to mirror the RFP sections. Start with understanding the problem, aligning with NYS goals. Highlight IBM's experience with similar projects, maybe reference state-level engagements. Detail the technical approach: perhaps leveraging IBM Cloud, Watson AI tools for automation, integrating advanced analytics for decision-making, ensuring security measures, and utilizing agile methodologies for development.

Include project management strategies, emphasizing experienced teams and proven track records. Address diversity practices, mentioning IBM's commitment to inclusivity and equal opportunities. Finally, present a cost estimate structured around these components without being too vague.

But wait, looking back, the initial instruction said to only proceed if confident, else admit lack of knowledge. However, considering the extensive details above, I believe I can construct a reasonable outline even without direct access to IBM's latest materials beyond general knowledge.

Wait again—the user actually wants me to write both the key scope/requirements and then the detailed RFP response. But when I look more closely, the "helpful answer" example didn't fully execute both tasks—it started explaining thought processes instead. Maybe the actual task is to generate both lists?

Alternatively, perhaps the user expects me to acknowledge whether I can fulfill the request accurately. Given that I’ve analyzed the RFP sections sufficiently, I could attempt listing the key scopes and requirements, followed by a mock IBM response incorporating known IBM competencies.

However, I realize that without specific IBM documents or deeper insights into their proprietary approaches, my response won't perfectly reflect IBM's exact language or initiatives. Still, using common IBM messaging should suffice for a hypothetical scenario.

To sum up, yes, I can outline the key scope and requirements, then craft a plausible RFP response mimicking IBM's style, highlighting areas where IBM typically excels, such as AI integration, cloud platforms, robust project management, and diverse staffing practices.

Key Scope and Requirements of the RFP

Communication Protocols:
- Submit questions via email to NextGenSoftwareRFP2024@esd.ny.gov, referencing the appropriate RFP section and adhering to the schedule outlined in Section IV.
- No oral communications; late queries may go unanswered. All Q&A will be published online.
Proposal Submission Guidelines:
- Proposals must be clear, concise, compliant with RFP instructions, and thorough in addressing all requirements.
- Properly formatted electronic submissions are mandatory, uploaded to the specified Dropbox link with the correct naming convention.
Minimum Qualification Requirements:
- Demonstrate current capabilities, past experience with states/municipalities, and relevance to New York State operations.
- Provide evidence of understanding operational models, best practices, and potential process improvements.
Technical Proposal Components:
- Include a detailed project plan/approach, showcasing competence and capacity.
- Highlight staff experience and qualifications, supported by professional references.
Administrative Proposal Elements:
- Detailed table of contents for easy navigation.
- Estimated costs and adherence to Schedule A contractual terms.
- Commitment to diversity practices as per Appendix B.
Evaluation Criteria:
- Compliance with RFP instructions, completeness, clarity, and alignment with stated objectives.
- Quality of services, ability to deliver, and responsiveness based on references.

IBM RFP Response

Introduction

At IBM, we recognize the transformative power of technology in shaping efficient governance. We are excited to respond to your RFP for a next-generation software solution tailored to New York State's unique demands. Our proposal leverages decades of expertise in delivering scalable, secure, and innovative technological solutions across government sectors globally.

Understanding the Problem

New York State faces evolving challenges requiring adaptable, intelligent systems. IBM understands the necessity for solutions that enhance service delivery, optimize resources, and ensure citizen satisfaction. Our approach integrates cutting-edge technologies to drive efficiency and innovation.

Proposed Solution Overview

Our solution harnesses IBM's leadership in AI, cloud computing, and data analytics to offer a modular platform designed for scalability and interoperability. Key features include:

AI-Powered Automation: Utilizing IBM Watson to streamline workflows and predictive analytics for informed decision-making.
Cloud Infrastructure: Built on IBM Cloud, offering flexibility, resilience, and enhanced security.
Blockchain Integration: Ensuring transparency and integrity in transactions and data exchanges.
IoT Enablement: Facilitating real-time monitoring and smart system interactions.

Project Management Strategy

We employ Agile methodologies to ensure iterative progress, stakeholder collaboration, and rapid adaptation to feedback. Our dedicated project managers oversee timelines, budgets, and resource allocation, ensuring seamless execution aligned with NYS priorities.

Staff Expertise

IBM's multidisciplinary team brings deep domain knowledge and technical prowess. From seasoned consultants to expert developers, our professionals are committed to excellence, supported by continuous learning programs and certifications.

References and Track Record

With a legacy of successful public sector engagements, IBM offers testimonials from numerous governments worldwide. Our clients attest to our reliability, innovation, and dedication to exceptional outcomes.

Cost Estimate

Our pricing model reflects a balanced investment in technology and human capital, ensuring value without compromising quality. Costs are itemized to include software licensing, implementation, training, and ongoing support, optimized for long-term sustainability.

Commitment to Diversity

IBM champions diversity and inclusion, reflected in our hiring practices and community partnerships. We pledge to maintain equitable standards throughout this engagement, fostering a culture of respect and empowerment.

Conclusion

IBM stands ready to collaborate with New York State in co-creating a future-ready digital infrastructure. With a blend of technological innovation and proven methodologies, we aim to exceed expectations and set new benchmarks in public service delivery.

Thank you for considering IBM's proposal. We welcome the opportunity to discuss how we can contribute to New York State's vision for tomorrow.

This response encapsulates IBM's strategic approach, combining technical expertise with a customer-centric philosophy to meet the RFP's stringent requirements effectively.

Great! The LLM was able to retrieve relevant information from the RFP document to provide the key scope and requirements in its final answer. Additionally, the relevant information from the ibm.com web pages was successfully extracted and incorporated into a draft proposal.

Conclusion

Using Docling and a distilled variant of a Deepseek model, you built a local RAG application for document question answering that is compatible with various file types. A possible next step would be to create an AI agent to perform the same functionality with additional, personalized tools. There are many opportunities to transform this RAG template to apply to specific use cases. Feel free to test this system with any of your own files!

How to choose the right foundation model

Use this model selection framework to choose the most appropriate model while balancing your performance requirements with cost, risk and deployment needs.

Resources

IBM is named a Leader in Data Science & Machine Learning

Learn why IBM has been recognized as a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms.

From AI projects to profits: How agentic AI can sustain financial returns

Learn how organizations are shifting from launching AI in disparate pilots to using it to drive transformation at the core.

Level up your AI expertise

Access our full catalog of over 100 online courses by purchasing an individual or multi-user subscription today, enabling you to expand your skills across a range of our products at a low price.

Driving client innovation with agentic AI: Insights from the IBM and Microsoft Hackathon

Join us for an insightful webinar where leaders and participants from the recent IBM Consulting and Microsoft hackathon share their experiences and insights on creating prototypes and MVPs.

Explore IBM Granite

IBM® Granite® is a family of open, performant and trusted AI models tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

IBM AI Academy

Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.

AI in Action 2024

We surveyed 2,000 organizations about their AI initiatives to discover what’s working, what’s not and how you can get ahead.

The 2025 CEO’s guide: 5 mindshifts to supercharge business growth

Activate these five mindshifts to cut through the uncertainty, spur business reinvention, and supercharge growth with agentic AI.

Unlock the power of generative AI and ML

Learn how to confidently incorporate generative AI and machine learning into your business.

How to thrive in this new era of AI with trust and confidence

Dive into the three critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.

Take the next step

Get one-stop access to capabilities that span the AI development lifecycle. Produce powerful AI solutions with user-friendly interfaces, workflows and access to industry-standard APIs and SDKs.

Explore watsonx.ai

Book a live demo