26 February 2025
In this tutorial, you will be guided through how to build a generative AI-powered personal stylist. This tutorial leverages the IBM Granite™ Vision 3.2 large language model (LLM) for processing image input and Granite 3.2 with the latest enhanced reasoning capabilities for formulating customizable outfit ideas.
How often do you find yourself thinking, “What should I wear today? I don’t even know where to start with picking items from my closet!” This dilemma is one that many of us share. By using cutting-edge artificial intelligence (AI) models, this no longer needs to be a daunting task.
Our AI-driven solution is composed of the following stages:
3. Upon submission of the input, the multimodal Granite Vision 3.2 model iterates over the list of images and returns the following output:
4. The Granite 3.2 model with enhanced reasoning then serves as a fashion stylist. The LLM uses the Vision model’s output to provide an outfit recommendation that is suitable for the user’s event.
5. The outfit suggestion, a data frame of items that the user uploaded and the images in the described personalized recommendation are all returned to the user.
You need an IBM Cloud® account to create a watsonx.ai™ project.
In order to use the watsonx application programming interface (API), you will need to complete the following steps. Note, you can also access this tutorial on GitHub.
Log in to watsonx.ai by using your IBM Cloud account.
Create a watsonx.ai project.
You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a watsonx.ai Runtime service instance (choose the Lite plan, which is a free instance).
Generate an API Key.
Associate the watsonx.ai Runtime service to the project that you created in watsonx.ai.
For a more interactive experience when using this AI tool, clone the GitHub repository and follow the setup instructions in the README.md file within the AI stylist project to launch the Streamlit application on your local machine. Otherwise, if you prefer to follow along step-by-step, create a Jupyter Notebook and continue with this tutorial.
We need a few libraries and modules for this tutorial. Make sure to import the following ones; if they're not installed, you can resolve this issue with a quick pip installation.
To set our credentials, we need the
We can use the
The
We can also instantiate the model interface by using the
To encode our images in a way that is digestible for the LLM, we will encode them to bytes that we then decode to UTF-8 representation. In this case, our images are located in the local images directory. You can find sample images in the AI stylist directory in our GitHub repository.
Now that we have loaded and encoded our images, we can query the Vision model. Our prompt is specific to our desired output to limit the model's creativity as we seek valid JSON output. We will store the description, category and occasion of each image in a list called
Output:
{
"description": "A pair of polished brown leather dress shoes with a brogue detailing on the toe box and a classic oxford design.",
"category": "shoes",
"occasion": "formal"
}
{
"description": "A pair of checkered trousers with a houndstooth pattern, featuring a zippered pocket and a button closure at the waist.",
"category": "pants",
"occasion": "casual"
}
{
"description": "A light blue, button-up shirt with a smooth texture and a classic collar, suitable for casual to semi-formal occasions.",
"category": "shirt",
"occasion": "casual"
}
{
"description": "A pair of khaki pants with a buttoned waistband and a button closure at the front.",
"category": "pants",
"occasion": "casual"
}
{
"description": "A blue plaid shirt with a collar and long sleeves, featuring chest pockets and a button-up front.",
"category": "shirt",
"occasion": "casual"
}
{
"description": "A pair of bright orange, short-sleeved t-shirts with a crew neck and a simple design.",
"category": "shirt",
"occasion": "casual"
}
{
"description": "A pair of blue suede sneakers with white laces and perforations, suitable for casual wear.",
"category": "shoes",
"occasion": "casual"
}
{
"description": "A pair of red canvas sneakers with white laces, isolated on a white background.",
"category": "shoes",
"occasion": "casual"
}
{
"description": "A pair of grey dress pants with a smooth texture and a classic design, suitable for formal occasions.",
"category": "pants",
"occasion": "formal"
}
{
"description": "A plain white T-shirt with short sleeves and a crew neck, displayed from the front and back.",
"category": "shirt",
"occasion": "casual"
}
{
"description": "A black short-sleeved t-shirt with a crew neck and a simple design.",
"category": "shirt",
"occasion": "casual"
}
{
"description": "Black pants with a zippered pocket and a buttoned fly, showing the waistband and pocket details.",
"category": "pants",
"occasion": "casual"
}
{
"description": "A pair of tan leather boots with a chunky sole and a high-top design, suitable for casual wear.",
"category": "shoes",
"occasion": "casual"
}
Now that we have each clothing and shoe item categorized, it will be much easier for the reasoning model to generate an outfit for the selected occasion. Let's instantiate and query the reasoning model.
To align the filenames with the image descriptions, we can enumerate the list of image descriptions and create a list of dictionaries in which we store the description, category, occasion and filename of each item in the respective fields.
Now, let's query the Granite 3.2 model with reasoning to produce an outfit for our specified criteria using the
Output:
Here is my thought process:
- The outfit needs to be suitable for a casual morning at the park during fall.
- I will select one shirt, one pair of pants, and one pair of shoes that fit the 'casual' occasion category.
- I will avoid formal or overly dressy items and choose items that are comfortable for park activities.
Here is my response:
For a casual morning at the park in fall, I suggest the following outfit:
1. **Shirt**: A blue plaid shirt with a collar and long sleeves (file: 'image13.jpeg')
- The plaid pattern is classic for fall and goes well with casual park settings. The long sleeves offer some protection against cooler morning temperatures.
2. **Pants**: Khaki pants with a buttoned waistband and a button closure at the front (file: 'image7.jpeg')
- Khaki is a versatile choice that can match the casual vibe and also provide a nice balance with the plaid shirt. It's practical and comfortable for walking around.
3. **Shoes**: A pair of tan leather boots with a chunky sole and high-top design (file: 'image3.jpeg')
- Tan leather boots offer a stylish yet comfortable option. The chunky sole provides good grip and support, ideal for navigating park trails or uneven ground.
This combination provides a relaxed, put-together look suitable for a casual morning outing, while also considering comfort and practicality.
With this generated outfit description, we can also display the clothing items that the model recommends! To do so, we can simply extract the filenames. In case the model mentions the same filename twice, it is important to check whether the image has not already been displayed as we iterate the list of images. We can do so by storing displayed images in the
In this tutorial, you built a system that uses AI to provide style advice to a user's specific event. Using photos or screenshots of the user's clothing, outfits are customized to meet the specified criteria. The Granite-Vision-3-2-2b model was critical for labeling and categorizing each item. Additionally, the Granite-3-2-8B-instruct model leveraged its reasoning capabilities to generate personalized outfit ideas.
Some next steps for building off this application can include: