Deploying AI services

An AI service is a deployable unit of code that you can use to capture the logic of your generative AI use cases. When your AI services are successfully deployed, you can use the endpoint for inferencing from your application.

Deploying generative AI applications with AI services

While Python functions are the traditional way to deploy machine learning assets, AI services offer a more flexible option to deploy code for generative AI applications like streaming.

Unlike the standard Python function for deploying a predictive machine learning model, which requires input in a fixed schema, an AI service provides flexibility for multiple inputs and allows for customization.

AI services offer a secure solution to deploy your code functions. For example, credentials such as bearer tokens that are required for authentication are generated from task crendentials by the service and the token is made available to the AI service asset. You can use this token to get connection assets, download data assets, and more.

Deploying AI services visually

You can deploy your AI service directly to a deployment space by following a no-code approach from the user interface. Use this approach to create an online or batch deployment for your use case.

For more information, see Deploying AI services visually.

Deploying AI services with code

When you build your generative AI applications from the ground up, you can use an AI service to capture the programming logic of your application, which can be deployed with an endpoint for inferencing. For example, if you build a RAG application with frameworks such as LangChain, LlamaIndex, or more, you can use an AI service to capture the logic for retrieving answers from the vector index in the AI service and deploying the AI service.

For more information, see Deploying AI services with code.

Learn more

Deploying Python functions

Parent topic: Deploying foundation model assets