Tool Calling with Ollama

Author

Data Scientist

Overview

Tool calling in Large Language Models (LLMs) is the ability of the LLM to interact with external tools, services, or APIs to perform tasks. This allows LLMs to extend their functionality, enhancing their ability to handle real-world tasks that may require access to external data, real-time information, or specific applications. When an LLM uses a web search tool, it can call the web to fetch real-time data that aren't available in the model's training data. Other types of tools might include Python for calculations, data analysis, or visualization, or calling a service endpoint for data. Tool calling can make a chatbot more dynamic and adaptable, allowing it to provide more accurate, relevant, and detailed responses based on live data or specialized tasks outside its immediate knowledge base. Popular frameworks for tool-calling include Langchain and now ollama.

Ollama is a platform that offers open-source, local AI models for use on personal devices so that users can run LLMs directly on their computers. Unlike a service like the OpenAI API, there’s no need for an account since the model is on your local machine. Ollama focuses on privacy, performance, and ease of use, enabling users to access and interact with AI models without sending data to external servers. This can be particularly appealing for those concerned about data privacy or who want to avoid the reliance on external APIs. Ollama’s platform is designed to be easy to set up and use, and it supports various models, giving users a range of tools for natural language processing, code generation, and other AI tasks directly on their own hardware. It is well suited to a tool calling architecture because it can access all the capabilities of a local environment including data, programs, and custom software.

In this tutorial you'll learn how to set up tool calling by using ollama to look through a local filesystem, a task which would be difficult to do with a remote LLM. Many ollama models are available for tool calling and building AI agents like Mistral and Llama 3.2, a full list is available at the ollama website. In this case we'll use IBM Granite 3.2 Dense which has tool support. The 2B and 8B models are text-only dense LLMs trained on designed to support tool-based use cases and for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.

The notebook for this tutorial can be downloaded from Github here.

Step 1: Install Ollama

First you’ll download ollama from https://ollama.com/download and install it for your operating system. On macOS this is done via a .dmg file, on Linux via a single shell command, and on Windows with an installer. You may need admin access on your machine in order to run the installer.

You can test that ollama is correctly installed by opening a terminal or command prompt and entering:

ollama -v

Step 2: Install libraries

Next, you'll add the initial imports. This demo will use the ollama python library to communicate with ollama and the pymupdf library to read PDF files in the file system.

!pip install pymupdf

import ollama
import os
import pymupdf

Next you'll pull the model that you'll be using throughout this tutorial. This downloads the model weights from ollama to your local computer and stores them for use without needing to make any remote API calls later on.

!ollama pull granite3.2
!ollama pull granite3.2-vision

Step 3: Define the tools

Now you'll define the tools that the ollama tools instance will have access. Since the intent of the tools is to read files and look through images in the local file system, you'll create two python functions for each of those tools. The first is calledsearch_text_files and it takes a keyword to search for in the local files. For the purposes of this demo, the code only searches for files in a specific folder but it could be extended to include a second parameter that sets which folder the tool will search in.

You could use simple string matching to see whether the keyword is in the document but because ollama makes calling local llms easily,search_text_files will use Granite 3.2 to determine whether the keyword describes the document text. This is done by reading the document into a string calleddocument_text . The function then calls ollama.chat and prompts the model with the following:

"Respond only 'yes' or 'no', do not add any additional information. Is the following text about " + keyword + "? " + document_text

If the model responds 'yes', then the function returns the name of the file that contains the keyword that the user indicated in the prompt. If none of the files seem to contain the information, then the function returns 'None' as a string.

This function may run slowly the first time because ollama will download Granite 3.2 Dense.

def search_text_files(keyword: str) -> str:

directory = os.listdir("./files/")
for fname in directory:

# look through all the files in our directory that aren't hidden files
if os.path.isfile("./files/" + fname) and not fname.startswith('.'):

if(fname.endswith(".pdf")):

document_text = ""
doc = pymupdf.open("./files/" + fname)

for page in doc: # iterate the document pages
document_text += page.get_text() # get plain text (is in UTF-8)

doc.close()

prompt = "Respond only 'yes' or 'no', do not add any additional information. Is the following text about " + keyword + "? " + document_text

res = ollama.chat(
model="granite3.2:8b",
messages=[{'role': 'user', 'content': prompt}]
)

if 'Yes' in res['message']['content']:
return "./files/" + fname

elif(fname.endswith(".txt")):

f = open("./files/" + fname, 'r')
file_content = f.read()

prompt = "Respond only 'yes' or 'no', do not add any additional information. Is the following text about " + keyword + "? " + file_content

res = ollama.chat(
model="granite3.2:8b",
messages=[{'role': 'user', 'content': prompt}]
)

if 'Yes' in res['message']['content']:
f.close()
return "./files/" + fname

return "None"

The second tool is called search_image_files and it takes a keyword to search for in the local photos.The search is done by using the Granite 3.2 Vision image description model via ollama. This model will return a text description of each image file in the folder and search for the keyword in the description. One of the strengths of using ollama is that multi-agent systems can easily be built to call one model with another.

The function returns a string, which is the name of the file whose description contains the keyword that the user indicated in the prompt.

def search_image_files(keyword:str) -> str:

directory = os.listdir("./files/")
image_file_types = ("jpg", "png", "jpeg")

for fname in directory:

if os.path.isfile("./files/" + fname) and not fname.startswith('.') and fname.endswith(image_file_types):
res = ollama.chat(
model="granite3.2-vision",
messages=[
{
'role': 'user',
'content': 'Describe this image in short sentences. Use simple phrases first and then describe it more fully.',
'images': ["./files/" + fname]
}
]
)

if keyword in res['message']['content']:
return "./files/" + fname

return "None"

Step 4: Define the tools for ollama

Now that the functions for ollama to call have been defined, you'll configure the tool information for ollama itself. The first step is to create an object that maps the name of the tool to the functions for ollama function calling:

available_functions = {
'Search inside text files':search_text_files,
'Search inside image files':search_image_files
}

Next, configure a tools array to tell ollama what tools it will have access to and what those tools require. This is an array with one object schema per tool that tells the ollama tool calling framework how to call the tool and what it returns.

In the case of both of the tools that you created earlier, they are functions that require a keyword parameter. Currently only functions are supported although this may change in the future. The description of the function and of the parameter help the model call the tool correctly. The description field for the function of each tool is passed to the LLM when it selects which tool to use. The description of the keyword is passed to the model when it generates the parameters that will be passed to the tool. Both of these are places you may look to fine tune prompts when you create your own tool calling applications with ollama.

# tools don't need to be defined as an object but this helps pass the correct parameters
# to the tool call itself by giving the model a prompt of how the tool is to be used
ollama_tools=[
{
'type': 'function',
'function': {
'name': 'Search inside text files',
'description': 'This tool searches in PDF or plaintext or text files in the local file system for descriptions or mentions of the keyword.',
'parameters': {
'type': 'object',
'properties': {
'keyword': {
'type': 'string',
'description': 'Generate one keyword from the user request to search for in text files',
},
},
'required': ['keyword'],
},
},
},
{
'type': 'function',
'function': {
'name': 'Search inside image files',
'description': 'This tool searches for photos or image files in the local file system for the keyword.',
'parameters': {
'type': 'object',
'properties': {
'keyword': {
'type': 'string',
'description': 'Generate one keyword from the user request to search for in image files',
},
},
'required': ['keyword'],
},
},
},
]

You'll use this tools definition when you call ollama with user input.

Step 5: Pass user input to ollama

Now its time to pass user input to ollama and have it return the results of the tool calls. First, make sure that ollama is running on your system:

# if ollama is not currently running, start it
import subprocess
subprocess.Popen(["ollama","serve"], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)

If Ollama is running, this will return:

<Popen: returncode: None args: ['ollama', 'serve']>

Now ask the user for input. You can also hardcode the input or retrieve from a chat interface depdending on you configure your application. The input function will wait for user input before continuing on.

# input
user_input = input("What would you like to search for?")
print(user_input)

As an example, if the user enters "Information about dogs" this cell will print:

Information about dogs

Now the user query is passed to ollama itself. The messages need a role for the user and the content that the user input. This is passed to ollama using the chat function. The first parameter is the model you want to use, in this case Granite 3.2 Dense, then the message with the user input, and finally the tools array that you configured earlier.

The chat function will generate an output selecting which tool to use and what parameters should be passed to it in the subsequent tool calls.

messages = [{'role': 'user', 'content':user_input}]

response: ollama.ChatResponse = ollama.chat(

# set which model we're using
'granite3.2:8b',

# use the message from the user
messages=messages,

tools=ollama_tools
)

Now that the model has generated tool calls in the output, run all of the tool calls with the parameters that the model generated and check the output. In this application Granite 3.2 Dense is used to generate the final output as well, so the results of the tool calls are added to the initial user input and then passed to the model.

Multiple tool calls may return file matches, so the responses are collected in an array which is then passed to Granite 3.2 to generate a response. The prompt that precedes the data instructs the model how to respond:

If the tool output contains one or more file names, then give the user only the filename found. Do not add additional details.
If the tool output is empty ask the user to try again. Here is the tool output:

The final output is then generated using either the returned file names or

# this is a place holder that to use to see whether the tools return anything
output = []

if response.message.tool_calls:

# There may be multiple tool calls in the response
for tool_call in response.message.tool_calls:

# Ensure the function is available, and then call it
if function_to_call := available_functions.get(tool_call.function.name):
print('Calling tool: ', tool_call.function.name, ' \n with arguments: ', tool_call.function.arguments)
tool_res = function_to_call(**tool_call.function.arguments)

print(" Tool response is " + str(tool_res))

if(str(tool_res) != "None"):
output.append(str(tool_res))
print(tool_call.function.name, ' has output: ', output)
else:
print('Could not find ', tool_call.function.name)

# Now chat with the model using the tool call results
# Add the function response to messages for the model to use
messages.append(response.message)

prompt = '''
If the tool output contains one or more file names,
then give the user only the filename found. Do not add additional details.
If the tool output is empty ask the user to try again. Here is the tool output:
'''

messages.append({'role': 'tool', 'content': prompt + " " + ", ".join(str(x) for x in output)})

# Get a response from model with function outputs
final_response = ollama.chat('granite3.2:8b', messages=messages)
print('Final response:', final_response.message.content)

else:

# the model wasn't able to pick the correct tool from the prompt
print('No tool calls returned from model')

Using the provided files for this tutorial, the prompt "Information about dogs" will return:

Calling tool: Search inside text files
with arguments: {'keyword': 'dogs'}
Tool response is ./files/File4.pdf
Search inside text files has output: ['./files/File4.pdf']
Calling tool: Search inside image files
with arguments: {'keyword': 'dogs'}
Tool response is None
Final response: The keyword "dogs" was found in File4.pdf.

You can see that Granite 3.2 picked the correct keyword from the input, 'dogs', and searched through the files in the folder, finding the keyword in a PDF file. Since LLM results are not purely deterministic, you may get slightly different results with the same prompt or very similar prompts.

Start realizing ROI: A practical guide to agentic AI

Discover ways to get ahead, successfully scaling AI across your business with real results.

Abstract portrayal of AI agent, shown in isometric view, acting as bridge between two systems

Build, run and manage AI agents with watsonx Orchestrate

Resources

AI governance imperative: evolving regulations and emergence of agentic AI

Learn how evolving regulations and the emergence of AI agents are reshaping the need for robust AI governance frameworks.

IDC MarketScape names IBM a Leader in 2025 GenAI evaluation technology

Download the report to learn why IDC MarketScape names IBM a Leader in 2025 GenAI evaluation technology, and how watsonx.governance advances risk management, reporting, and integration.

How AI agents and assistants can benefit your organization

Dive into this comprehensive guide that breaks down key use cases, core capabilities, and step-by-step recommendations to help you choose the right solutions for your business.

Reimagine business productivity with AI agents and assistants

Learn how AI agents and AI assistants can work together to achieve new levels of productivity.

Try watsonx Orchestrate™

Explore how generative AI assistants can lighten your workload and improve productivity.

From AI projects to profits: How agentic AI can sustain financial returns

Learn how organizations are shifting from launching AI in disparate pilots to using it to drive transformation at the core.

Omdia Report on empowered intelligence: The impact of AI agents

Discover how you can unlock the full potential of gen AI with AI agents.

How AI agents will reinvent productivity

Learn ways to use AI to be more creative, efficient and start adapting to a future that involves working closely with AI agents.

Ushering in the agentic enterprise: Putting AI to work across your entire technology estate

Stay updated about the new emerging AI agents, a fundamental tipping point in the AI revolution.

The future of agents, AI energy consumption, Anthropic's computer use and Google watermarking AI-generated text

Stay ahead of the curve with our AI experts on this episode of Mixture of Experts as they dive deep into the future of AI agents and more.

How Comparus is using a "banking assistant"

Comparus used solutions from IBM® watsonx.ai™ and impressively demonstrated the potential of conversational banking as a new interaction model.