Tool calling in Large Language Models (LLMs) is the ability of the LLM to interact with external tools, services, or APIs to perform tasks. This allows LLMs to extend their functionality, enhancing their ability to handle real-world tasks that may require access to external data, real-time information, or specific applications. When an LLM uses a web search tool, it can call the web to fetch real-time data that aren't available in the model's training data. Other types of tools might include Python for calculations, data analysis, or visualization, or calling a service endpoint for data. Tool calling can make a chatbot more dynamic and adaptable, allowing it to provide more accurate, relevant, and detailed responses based on live data or specialized tasks outside its immediate knowledge base. Popular frameworks for tool-calling include Langchain and now ollama.
Ollama is a platform that offers open-source, local AI models for use on personal devices so that users can run LLMs directly on their computers. Unlike a service like the OpenAI API, there’s no need for an account since the model is on your local machine. Ollama focuses on privacy, performance, and ease of use, enabling users to access and interact with AI models without sending data to external servers. This can be particularly appealing for those concerned about data privacy or who want to avoid the reliance on external APIs. Ollama’s platform is designed to be easy to set up and use, and it supports various models, giving users a range of tools for natural language processing, code generation, and other AI tasks directly on their own hardware. It is well suited to a tool calling architecture because it can access all the capabilities of a local environment including data, programs, and custom software.
In this tutorial you'll learn how to set up tool calling by using ollama to look through a local filesystem, a task which would be difficult to do with a remote LLM. Many ollama models are available for tool calling and building AI agents like Mistral and Llama 3.2, a full list is available at the ollama website. In this case we'll use IBM Granite 3.2 Dense which has tool support. The 2B and 8B models are text-only dense LLMs trained on designed to support tool-based use cases and for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.
The notebook for this tutorial can be downloaded from Github here.
First you’ll download ollama from https://ollama.com/download and install it for your operating system. On macOS this is done via a .dmg file, on Linux via a single shell command, and on Windows with an installer. You may need admin access on your machine in order to run the installer.
You can test that ollama is correctly installed by opening a terminal or command prompt and entering:
Next, you'll add the initial imports. This demo will use the ollama python library to communicate with ollama and the pymupdf library to read PDF files in the file system.
Next you'll pull the model that you'll be using throughout this tutorial. This downloads the model weights from ollama to your local computer and stores them for use without needing to make any remote API calls later on.
Now you'll define the tools that the ollama tools instance will have access. Since the intent of the tools is to read files and look through images in the local file system, you'll create two python functions for each of those tools. The first is called
You could use simple string matching to see whether the keyword is in the document but because ollama makes calling local llms easily,
If the model responds 'yes', then the function returns the name of the file that contains the keyword that the user indicated in the prompt. If none of the files seem to contain the information, then the function returns 'None' as a string.
This function may run slowly the first time because ollama will download Granite 3.2 Dense.
The second tool is called
The
function returns a string, which is the name of the file whose
description contains the keyword that the user indicated in the prompt.
Now that the functions for ollama to call have been defined, you'll configure the tool information for ollama itself. The first step is to create an object that maps the name of the tool to the functions for ollama function calling:
Next, configure a tools array to tell ollama what tools it will have access to and what those tools require. This is an array with one object schema per tool that tells the ollama tool calling framework how to call the tool and what it returns.
In the case of both of the tools that you created earlier, they are functions that require a
You'll use this tools definition when you call ollama with user input.
Now its time to pass user input to ollama and have it return the results of the tool calls. First, make sure that ollama is running on your system:
If Ollama is running, this will return:
Now ask the user
for input. You can also hardcode the input or retrieve from a chat
interface depdending on you configure your application. The
As an example, if the user enters "Information about dogs" this cell will print:
Now the user query is passed to ollama itself. The messages need a role for the user and the content that the user input. This is passed to ollama using the
The
Now that the model has generated tool calls in the output, run all of the tool calls with the parameters that the model generated and check the output. In this application Granite 3.2 Dense is used to generate the final output as well, so the results of the tool calls are added to the initial user input and then passed to the model.
Multiple tool calls may return file matches, so the responses are collected in an array which is then passed to Granite 3.2 to generate a response. The prompt that precedes the data instructs the model how to respond:
The final output is then generated using either the returned file names or
Using the provided files for this tutorial, the prompt "Information about dogs" will return:
You can see that Granite 3.2 picked the correct keyword from the input, 'dogs', and searched through the files in the folder, finding the keyword in a PDF file. Since LLM results are not purely deterministic, you may get slightly different results with the same prompt or very similar prompts.
Use out-of-the-box, high-performance generative AI flows to build apps like RAG, Summarization, Classification and effortlessly connect with your business data or integrate into existing software.
Scale productivity with IBM AI agents and assistants, designed for high-impact use cases, driving significant value by helping to improve productivity, customer experience, application modernization and IT operations.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.