My IBM Log in Subscribe

Build an AI stylist with IBM Granite using watsonx.ai

26 February 2025

Authors

Anna Gutowska

AI Engineer, Developer Advocate

IBM

Ash Minhas

Manager, Technical Content | AI Advocate

IBM

In this tutorial, you will be guided through how to build a generative AI-powered personal stylist. This tutorial leverages the IBM Granite™ Vision 3.2 large language model (LLM) for processing image input and Granite 3.2 with the latest enhanced reasoning capabilities for formulating customizable outfit ideas.

Introduction

How often do you find yourself thinking, “What should I wear today? I don’t even know where to start with picking items from my closet!” This dilemma is one that many of us share. By using cutting-edge artificial intelligence (AI) models, this no longer needs to be a daunting task.

AI styling: How it works

Our AI-driven solution is composed of the following stages:

  1. The user uploads images of their current wardrobe or even items in their wishlist, one item at a time.
  2. The user selects the following criteria:   
  • Occasion: casual or formal.
  • Time of day: morning, afternoon or evening.
  • Season of the year: winter, spring, summer or fall.
  • Location (for example, a coffee shop).

3. Upon submission of the input, the multimodal Granite Vision 3.2 model iterates over the list of images and returns the following output:

  • Description of the item.
  • Category: shirt, pants or shoes.
  • Occasion: casual or formal.

4. The Granite 3.2 model with enhanced reasoning then serves as a fashion stylist. The LLM uses the Vision model’s output to provide an outfit recommendation that is suitable for the user’s event.

5. The outfit suggestion, a data frame of items that the user uploaded and the images in the described personalized recommendation are all returned to the user.

Prerequisites

You need an IBM Cloud® account to create a watsonx.ai™ project.

Steps

In order to use the watsonx application programming interface (API), you will need to complete the following steps.  Note, you can also access this tutorial on GitHub

Step 1. Set up your environment

  1. Log in to watsonx.ai by using your IBM Cloud account.

  2. Create a watsonx.ai project.

    You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.

Step 2. Set up watsonx.ai Runtime service and API key

  1. Create a watsonx.ai Runtime service instance (choose the Lite plan, which is a free instance).

  2. Generate an API Key.

  3. Associate the watsonx.ai Runtime service to the project that you created in watsonx.ai.

Step 3. Clone the repository (optional)

For a more interactive experience when using this AI tool, clone the GitHub repository and follow the setup instructions in the README.md file within the AI stylist project to launch the Streamlit application on your local machine. Otherwise, if you prefer to follow along step-by-step, create a Jupyter Notebook and continue with this tutorial.

Step 4. Install and import relevant libraries and set up your credentials

We need a few libraries and modules for this tutorial. Make sure to import the following ones; if they're not installed, you can resolve this issue with a quick pip installation.

# Install required packages !pip install -q image ibm-watsonx-ai # Required imports import getpass, os, base64, json from ibm_watsonx_ai import Credentials from ibm_watsonx_ai.foundation_models import ModelInference from PIL import Image

To set our credentials, we need the WATSONX_APIKEY  and WATSONX_PROJECT_ID  you generated in step 1. We will also set theURL serving as the API endpoint.

WATSONX_APIKEY = getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): ") WATSONX_PROJECT_ID = getpass.getpass("Please enter your project ID (hit enter): ") URL = "https://us-south.ml.cloud.ibm.com"

We can use the Credentials  class to encapsulate our passed credentials.

credentials = Credentials(     url=URL,     api_key=WATSONX_APIKEY )

Step 5. Set up the API request for the Granite Vision model

The augment_api_request_body  function takes the user query and image as parameters and augments the body of the API request. We will use this function in each iteration of inferencing the Vision model.

def augment_api_request_body(user_query, image):     messages = [         {             "role": "user",             "content": [{                 "type": "text",                 "text": user_query             },             {                 "type": "image_url",                 "image_url": {                     "url": f"data:image/jpeg;base64,{image}"                 }             }]         }     ] return messages

We can also instantiate the model interface by using the ModelInference  class.

model = ModelInference(     model_id="ibm/granite-vision-3-2-2b",     credentials=credentials,     project_id=WATSONX_PROJECT_ID,     params={         "max_tokens": 400,         "temperature": 0     } )

Step 6. Encode images

To encode our images in a way that is digestible for the LLM, we will encode them to bytes that we then decode to UTF-8 representation. In this case, our images are located in the local images directory. You can find sample images in the AI stylist directory in our GitHub repository.

directory = "images" #directory name images = [] filenames = [] for filename in os.listdir(directory):     if filename.endswith(".jpeg") or filename.endswith(".png"):         filepath = directory + '/' + filename         with open(filepath, "rb") as f:             images.append(base64.b64encode(f.read()).decode('utf-8'))         filenames.append(filename)

Step 7. Categorize input with the Vision model

Now that we have loaded and encoded our images, we can query the Vision model. Our prompt is specific to our desired output to limit the model's creativity as we seek valid JSON output. We will store the description, category and occasion of each image in a list called closet .

user_query = """Provide a description, category, and occasion for the clothing item or shoes in this image.                 Classify the category as shirt, pants, or shoes.                 Classify the occasion as casual or formal.                 Ensure the output is valid JSON. Do not create new categories or occasions. Only use the allowed classifications.                 Your response should be in this schema:                 {                     "description": "<description>",                     "category": "<category>",                     "occasion": "<occasion>"                 }                 """ image_descriptions = [] for i in range(len(images)):     image = images[i]     message = augment_api_request_body(user_query, image)     response = model.chat(messages=message)     result = response['choices'][0]['message']['content']     print(result)     image_descriptions.append(result)

Output:

{
    "description": "A pair of polished brown leather dress shoes with a brogue detailing on the toe box and a classic oxford design.",
    "category": "shoes",
    "occasion": "formal"
}
{
    "description": "A pair of checkered trousers with a houndstooth pattern, featuring a zippered pocket and a button closure at the waist.",
    "category": "pants",
"occasion": "casual"
}
{
    "description": "A light blue, button-up shirt with a smooth texture and a classic collar, suitable for casual to semi-formal occasions.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of khaki pants with a buttoned waistband and a button closure at the front.",
    "category": "pants",
    "occasion": "casual"
}
{
    "description": "A blue plaid shirt with a collar and long sleeves, featuring chest pockets and a button-up front.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of bright orange, short-sleeved t-shirts with a crew neck and a simple design.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of blue suede sneakers with white laces and perforations, suitable for casual wear.",
    "category": "shoes",
    "occasion": "casual"
}

{
    "description": "A pair of red canvas sneakers with white laces, isolated on a white background.",
    "category": "shoes",
    "occasion": "casual"
}
{
    "description": "A pair of grey dress pants with a smooth texture and a classic design, suitable for formal occasions.",
    "category": "pants",
    "occasion": "formal"
}
{
    "description": "A plain white T-shirt with short sleeves and a crew neck, displayed from the front and back.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A black short-sleeved t-shirt with a crew neck and a simple design.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "Black pants with a zippered pocket and a buttoned fly, showing the waistband and pocket details.",
    "category": "pants",
    "occasion": "casual"
}
{
    "description": "A pair of tan leather boots with a chunky sole and a high-top design, suitable for casual wear.",
    "category": "shoes",
    "occasion": "casual"
}

Step 8. Generate outfits with the reasoning model

Now that we have each clothing and shoe item categorized, it will be much easier for the reasoning model to generate an outfit for the selected occasion. Let's instantiate and query the reasoning model.

reasoning_model = ModelInference(     model_id="ibm/granite-3-2-8b-instruct",     credentials=credentials,     project_id=WATSONX_PROJECT_ID )

To align the filenames with the image descriptions, we can enumerate the list of image descriptions and create a list of dictionaries in which we store the description, category, occasion and filename of each item in the respective fields.

# Add filenames to the image descriptions closet = [] for i, desc in enumerate(image_descriptions):     desc_dict = json.loads(desc)     desc_dict['filename'] = filenames[i]     image_descriptions[i] = json.dumps(desc_dict) closet = [json.loads(js) for js in image_descriptions]

Now, let's query the Granite 3.2 model with reasoning to produce an outfit for our specified criteria using the closet  list.

occasion = input("Enter the occasion") #casual or formal (e.g. "casual") time_of_day = input("Enter the time of day") #morning, afternoon or evening (e.g. "morning") location = input("Enter the location") #any location (e.g. "park") season = input("Enter the season") #spring, summer, fall or winter (e.g. "fall") prompt = f"""Use the description, category, and occasion of the clothes in my closet to put together an outfit for a {occasion} {time_of_day} at the {location}. The event takes place in the {season} season. Make sure to return only one shirt, bottoms, and shoes. Use the description, category, and occasion provided. Do not classify the items yourself. Include the file name of each image in your output along with the file extension. Here are the items in my closet: {closet}""" messages = [         {"role": "control",         "content": "thinking"},         {"role": "user",         "content": [                 {"type": "text",                  "text": f"{prompt}"}             ]}         ] outfit = reasoning_model.chat(messages=messages)['choices'][0]['message']['content'] print(outfit)

Output

Here is my thought process:
- The outfit needs to be suitable for a casual morning at the park during fall.
- I will select one shirt, one pair of pants, and one pair of shoes that fit the 'casual' occasion category.
- I will avoid formal or overly dressy items and choose items that are comfortable for park activities.

Here is my response:

For a casual morning at the park in fall, I suggest the following outfit:

1. **Shirt**: A blue plaid shirt with a collar and long sleeves (file: 'image13.jpeg')
- The plaid pattern is classic for fall and goes well with casual park settings. The long sleeves offer some protection against cooler morning temperatures.

2. **Pants**: Khaki pants with a buttoned waistband and a button closure at the front (file: 'image7.jpeg')
- Khaki is a versatile choice that can match the casual vibe and also provide a nice balance with the plaid shirt. It's practical and comfortable for walking around.

3. **Shoes**: A pair of tan leather boots with a chunky sole and high-top design (file: 'image3.jpeg')
- Tan leather boots offer a stylish yet comfortable option. The chunky sole provides good grip and support, ideal for navigating park trails or uneven ground.

This combination provides a relaxed, put-together look suitable for a casual morning outing, while also considering comfort and practicality.

With this generated outfit description, we can also display the clothing items that the model recommends! To do so, we can simply extract the filenames. In case the model mentions the same filename twice, it is important to check whether the image has not already been displayed as we iterate the list of images. We can do so by storing displayed images in the selected_items  list. Finally, we can display the selected items.

selected_items = [] #extract the images of clothing that the model recommends for item, uploaded_file in zip(closet, images):     if item['filename'].lower() in outfit.lower() and not any(key['filename'] == item['filename'] for key in selected_items):         selected_items.append({             'image': uploaded_file,             'category': item['category'],             'filename': item['filename']         }) #display the selected clothing items if len(selected_items) > 0:     for item in selected_items:         display(Image.open(directory + '/' + item['filename']))

Conclusion

In this tutorial, you built a system that uses AI to provide style advice to a user's specific event. Using photos or screenshots of the user's clothing, outfits are customized to meet the specified criteria. The Granite-Vision-3-2-2b model was critical for labeling and categorizing each item. Additionally, the Granite-3-2-8B-instruct model leveraged its reasoning capabilities to generate personalized outfit ideas.

Some next steps for building off this application can include:

  • Customizing outfits to a user's personal style, body type, preferred color palette and more.
  • Broadening the criteria to include jackets and accessories. For example, the system might propose a blazer for a user attending a formal conference in addition to the selected shirt, pants and shoes.
  • Serving as a personal shopper by providing e-commerce product recommendations and pricing that align with the user's unique style and budget.
  • Adding chatbot functionality to ask the LLM questions about each outfit.
  • Providing a virtual try-on experience that uses a user selfie to simulate the final look.
Related solutions

Related solutions

IBM® watsonx.ai®

Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

Explore watsonx.ai
Artificial intelligence solutions

Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
Artificial intelligence (AI) consulting and services

IBM Consulting AI services help reimagine how businesses work with AI for transformation.

Explore AI services
Startup business partners walking through office corridor using digital tablet and discussing work. Businesswoman looking at digital tablet and talking colleague at work.

Think Newsletter

 

The latest AI and tech insights from Think

Sign up today
Take the next step

Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

Explore watsonx.ai Book a live demo