watsonx.ai로 IBM Granite를 활용한 AI 스타일리스트 만들기

이 튜토리얼에서는 생성형 AI 기반 퍼스널 스타일리스트를 구축하는 방법을 안내합니다. 이 튜토리얼에서는 이미지 입력을 처리하기 위해 IBM Granite Vision 3.2 대규모 언어 모델(LLM)을 사용하고, 맞춤 코디 아이디어를 생성하기 위해 최신 추론 기능이 강화된 Granite 3.2를 활용합니다.

들어가며

“오늘은 무엇을 입어야 하지? 옷장에서 뭘 골라야 할지 모르겠어!”라는 고민을 얼마나 자주 하시나요? 이 문제는 많은 사람들이 공감하는 딜레마입니다. 하지만 최첨단 인공 지능(AI) 모델을 활용하면, 더 이상 이런 고민은 어려운 일이 아닙니다.

AI 스타일링: 작동 방식

IBM의 AI 기반 솔루션은 다음 단계로 구성되어 있습니다.

사용자는 현재 옷장의 이미지 또는 위시리스트에 있는 아이템을 한 번에 한 항목씩 업로드합니다.
사용자는 다음 기준을 선택합니다.

상황: 캐주얼 또는 포멀.
시간: 오전, 오후, 저녁.
계절: 겨울, 봄, 여름, 가을.
위치(예: 커피숍).

3. 입력이 제출되면 멀티모달 Granite Vision 3.2 모델은 이미지 목록을 반복하고 다음 아웃풋을 반환합니다.

아이템 설명.
카테고리: 셔츠, 바지, 신발.
상황: 캐주얼 또는 포멀.

4. 추론 기능이 강화된 Granite 3.2 모델은 패션 스타일리스트 역할을 합니다. LLM은 Vision 모델의 출력을 활용하여 사용자의 상황에 맞는 코디를 추천합니다.

5. 코디 제안, 사용자가 업로드한 아이템 데이터 프레임, 그리고 설명된 개인화 추천 이미지가 모두 사용자에게 반환됩니다.

전제조건

watsonx.ai 프로젝트를 생성하려면 IBM Cloud 계정이 필요합니다.

단계

watsonx 애플리케이션 프로그래밍 인터페이스(API)를 사용하려면 다음 단계를 완료해야 합니다. 참고로 GitHub에서도 튜토리얼을 볼 수 있습니다.

1단계. 환경 설정

IBM® Cloud 계정을 사용하여 watsonx.ai 에 로그인합니다.
watsonx.ai 프로젝트를 생성합니다.

프로젝트 내에서 프로젝트 ID를 가져올 수 있습니다. 관리 탭을 클릭합니다. 그런 다음 일반 페이지의 세부 정보 섹션에서 프로젝트 ID를 복사합니다. 이 튜토리얼에는 이 ID가 필요합니다.

2단계. watsonx.ai 런타임 서비스 및 API 키 설정

watsonx.ai 런타임 서비스 인스턴스를 만듭니다(무료 인스턴스인 Lite 요금제 선택).
API 키를 생성합니다.
watsonx.ai 런타임 서비스를 watsonx.ai에서 생성한 프로젝트에 연결합니다.

3단계. 리포지토리 복제(선택 사항)

AI 도구를 보다 인터랙티브하게 사용하려면, GitHub 저장소를 복제하고 AI 스타일리스트 프로젝트 내 README.md 파일의 설정 지침에 따라 로컬 컴퓨터에서 Streamlit 애플리케이션을 실행합니다. 또는 단계별로 따라하고 싶다면 Jupyter Notebook을 만들어 이 튜토리얼을 계속 진행합니다.

4단계. 관련 라이브러리 설치 및 가져오기, 자격 증명 설정

이 튜토리얼에는 몇 가지 라이브러리와 모듈이 필요합니다. 다음 항목을 가져와야 합니다. 설치되지 않은 경우, 빠른 PIP 설치로 이 문제를 해결할 수 있습니다.

# Install required packages
!pip install -q image ibm-watsonx-ai

# Required imports
import getpass, os, base64, json
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from PIL import Image

자격 증명을 설정하려면 1단계에서 생성한 WATSONX_APIKEYWATSONX_APIKEY 및 WATSONX_PROJECT_ID 자격 증명을 설정하려면 1단계에서 생성한 WATSONX_APIKEY와 WATSONX_PROJECT_ID가 필요합니다. 또한 API 엔드포인트 역할을 하는URL 을 설정합니다.

WATSONX_APIKEY = getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): ")
WATSONX_PROJECT_ID = getpass.getpass("Please enter your project ID (hit enter): ")
URL = "https://us-south.ml.cloud.ibm.com"

그다음Credentials 클래스를 사용하여 전달된 자격 증명을 캡슐화합니다.

credentials = Credentials(
url=URL,
api_key=WATSONX_APIKEY
)

5단계. Granite Vision 모델에 대한 API 요청 설정

augment_api_request_body 함수는 사용자 쿼리와 이미지를 매개변수로 받아 API 요청의 본문을 보강합니다. Vision 모델을 추론하는 각 반복에서 이 함수를 사용합니다.

def augment_api_request_body(user_query, image):
    messages = [
        {
            "role": "user",
            "content": [{
                "type": "text",
                "text": user_query
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image}"
                }
            }]
        }
    ]
return messages

또한 ModelInference 클래스를 사용해 모델 인터페이스를 인스턴스화할 수 있습니다.

model = ModelInference(
    model_id="ibm/granite-vision-3-2-2b",
    credentials=credentials,
    project_id=WATSONX_PROJECT_ID,
    params={
        "max_tokens": 400,
        "temperature": 0
    }
)

6단계. 이미지 인코딩

LLM이 이해할 수 있는 방식으로 이미지를 인코딩하기 위해, 이미지를 먼저 바이트 형태로 인코딩한 후 UTF-8 표현으로 디코딩합니다. 이 경우 이미지는 로컬 이미지 디렉토리에 있습니다. 샘플 이미지는 GitHub 리포지토리의 AI 스타일리스트 디렉토리에서 찾을 수 있습니다.

directory = "images" #directory name
images = []
filenames = []
for filename in os.listdir(directory):
    if filename.endswith(".jpeg") or filename.endswith(".png"):
        filepath = directory + '/' + filename
        with open(filepath, "rb") as f:
            images.append(base64.b64encode(f.read()).decode('utf-8'))
        filenames.append(filename)

7단계. 비전 모델로 입력 분류

이미지를 로드하고 인코딩했으니 이제 Vision 모델을 쿼리할 수 있습니다. 프롬프트는 원하는 아웃풋에 한정되어 있어 유효한 JSON 아웃풋을 찾을 때 모델의 창의성을 제한합니다. 각 이미지의 설명, 카테고리, 상황을 closet이라는 목록에 저장합니다. closet .

user_query = """Provide a description, category, and occasion for the clothing item or shoes in this image.
                Classify the category as shirt, pants, or shoes.
                Classify the occasion as casual or formal.
                Ensure the output is valid JSON. Do not create new categories or occasions. Only use the allowed classifications.
                Your response should be in this schema:
                {
                    "description": "<description>",
                    "category": "<category>",
                    "occasion": "<occasion>"
                }
                """

image_descriptions = []
for i in range(len(images)):
    image = images[i]
    message = augment_api_request_body(user_query, image)
    response = model.chat(messages=message)
result = response['choices'][0]['message']['content']
    print(result)
image_descriptions.append(result)

아웃풋:

{
    "description": "A pair of polished brown leather dress shoes with a brogue detailing on the toe box and a classic oxford design.",
    "category": "shoes",
    "occasion": "formal"
}
{
    "description": "A pair of checkered trousers with a houndstooth pattern, featuring a zippered pocket and a button closure at the waist.",
    "category": "pants",
"occasion": "casual"
}
{
    "description": "A light blue, button-up shirt with a smooth texture and a classic collar, suitable for casual to semi-formal occasions.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of khaki pants with a buttoned waistband and a button closure at the front.",
    "category": "pants",
    "occasion": "casual"
}
{
    "description": "A blue plaid shirt with a collar and long sleeves, featuring chest pockets and a button-up front.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of bright orange, short-sleeved t-shirts with a crew neck and a simple design.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of blue suede sneakers with white laces and perforations, suitable for casual wear.",
    "category": "shoes",
    "occasion": "casual"
}

{
    "description": "A pair of red canvas sneakers with white laces, isolated on a white background.",
    "category": "shoes",
    "occasion": "casual"
}
{
    "description": "A pair of grey dress pants with a smooth texture and a classic design, suitable for formal occasions.",
    "category": "pants",
    "occasion": "formal"
}
{
    "description": "A plain white T-shirt with short sleeves and a crew neck, displayed from the front and back.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A black short-sleeved t-shirt with a crew neck and a simple design.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "Black pants with a zippered pocket and a buttoned fly, showing the waistband and pocket details.",
    "category": "pants",
    "occasion": "casual"
}
{
    "description": "A pair of tan leather boots with a chunky sole and a high-top design, suitable for casual wear.",
    "category": "shoes",
    "occasion": "casual"
}

8단계. 추론 모델을 사용하여 코디 생성

각 의류 및 신발 품목이 분류되었으니 이제 추론 모델이 선택한 상황에 맞는 코디를 생성하는 것이 훨씬 쉬워집니다. 추론 모델을 인스턴스화하고 쿼리해 봅시다.

reasoning_model = ModelInference(
    model_id="ibm/granite-3-2-8b-instruct",
    credentials=credentials,
    project_id=WATSONX_PROJECT_ID
)

파일 이름을 이미지 설명과 일치시키기 위해, 이미지 설명 목록을 열거하고 각 아이템의 설명, 카테고리, 상황, 파일 이름을 각 필드에 저장하는 사전 목록을 만들 수 있습니다.

# Add filenames to the image descriptions
closet = []
for i, desc in enumerate(image_descriptions):
    desc_dict = json.loads(desc)
    desc_dict['filename'] = filenames[i]
    image_descriptions[i] = json.dumps(desc_dict)

closet = [json.loads(js) for js in image_descriptions]

이제 추론을 통해 Granite 3.2 모델을 쿼리해closet 목록을 사용하여 지정된 기준에 맞는 복장을 생성합니다.

occasion = input("Enter the occasion") #casual or formal (e.g. "casual")
time_of_day = input("Enter the time of day") #morning, afternoon or evening (e.g. "morning")
location = input("Enter the location") #any location (e.g. "park")
season = input("Enter the season") #spring, summer, fall or winter (e.g. "fall")

prompt = f"""Use the description, category, and occasion of the clothes in my closet to put together an outfit for a {occasion} {time_of_day} at the {location}. The event takes place in the {season} season. Make sure to return only one shirt, bottoms, and shoes. Use the description, category, and occasion provided. Do not classify the items yourself. Include the file name of each image in your output along with the file extension. Here are the items in my closet: {closet}"""

messages = [
        {"role": "control",
        "content": "thinking"},
        {"role": "user",
        "content": [
                {"type": "text",
"text": f"{prompt}"}
]}
        ]
outfit = reasoning_model.chat(messages=messages)['choices'][0]['message']['content']
print(outfit)

아웃풋:

제 생각은 다음과 같습니다.
- 가을 공원에서 보내는 캐주얼한 아침에 어울리는 옷차림이어야 합니다.
- '캐주얼' 상황 카테고리에 맞는 셔츠 한 벌, 바지 한 벌, 신발 한 켤레를 선택합니다.
- 격식을 차리거나 지나치게 화려한 옷은 피하고, 공원 활동에 편안한 아이템을 고릅니다.

제 답변은 다음과 같습니다:

가을 공원에서 캐주얼한 아침을 보내려면 다음과 같은 코디를 추천합니다:

1. **셔츠**: 칼라가 있고 소매가 긴 파란색 체크무늬 셔츠(파일: 'image13.jpeg')
- 체크무늬 패턴은 가을에 어울리는 클래식한 패턴으로 캐주얼한 공원 환경과 잘 어울립니다. 긴 소매는 아침의 쌀쌀한 기온으로부터 어느 정도 보호해 줍니다.

2. **바지**: 허리 밴드와 앞쪽 단추 여밈이 있는 카키색 팬츠(파일: 'image7.jpeg')
- 카키색은 캐주얼한 분위기에 어울릴 뿐만 아니라 체크무늬 셔츠와도 조화를 이루는 다용도 아이템입니다. 걷기에 실용적이고 편안합니다.

3. **신발**: 청키한 밑창과 하이탑 디자인이 돋보이는 탄색 가죽 부츠(파일: 'image3.jpeg')
- 탄색 가죽 부츠는 스타일리시하면서도 편안한 옵션입니다. 두툼한 밑창은 좋은 그립감과 지지력을 제공하며, 공원 산책로나 울퉁불퉁한 지면을 걷는 데 적합합니다.

이 조합은 캐주얼한 아침 나들이에 어울리는 편안하면서도 단정한 룩을 제공하며, 편안함과 실용성까지 고려한 스타일입니다.

이렇게 생성된 스타일 설명을 통해 모델이 추천하는 의상 아이템도 표시할 수 있습니다! 그렇게 하려면 파일 이름을 추출하기만 하면 됩니다. 모델이 동일한 파일 이름을 두 번 언급하는 경우, 이미지 목록을 반복하면서 이미지가 이미 표시되지 않았는지 확인하는 것이 중요합니다. 그렇게 하려면 selected_items 목록에 표시된 이미지를 저장하면 됩니다. 마지막으로 선택한 아이템을 표시할 수 있습니다.

selected_items = []
#extract the images of clothing that the model recommends
for item, uploaded_file in zip(closet, images):
    if item['filename'].lower() in outfit.lower() and not any(key['filename'] == item['filename'] for key in selected_items):
        selected_items.append({
            'image': uploaded_file,
            'category': item['category'],
            'filename': item['filename']
        })

#display the selected clothing items
if len(selected_items) > 0:
    for item in selected_items:
        display(Image.open(directory + '/' + item['filename']))

결론

이 튜토리얼에서는 AI를 사용하여 사용자의 특정 상황에 맞춰 스타일링 조언을 제공하는 시스템을 구축했습니다. 사용자의 코디 사진이나 스크린샷을 사용하여 지정된 기준에 맞게 의상이 맞춤화됩니다. Granite-Vision-3-2-2b 모델은 각 아이템에 레이블을 지정하고 분류하는 데 매우 중요했습니다. 또한 Granite-3-2-8B-Instruct 모델은 추론 기능을 활용해 맞춤형 복장 아이디어를 생성했습니다.

이 애플리케이션을 기반으로 추가로 시도해볼 수 있는 단계는 다음과 같습니다.

사용자의 개인 스타일, 신체 유형, 선호하는 색상 팔레트 등에 맞춰 스타일을 맞춤화합니다.
재킷과 액세서리를 포함하도록 기준을 확대합니다. 예를 들어, 시스템은 공식 회의에 참석하는 사용자에게 선택한 셔츠, 바지, 신발 외에 블레이저를 제안할 수 있습니다.
사용자의 스타일과 예산에 맞는 이커머스 제품 추천과 가격 정보를 제공해 개인 맞춤 쇼핑 가이드처럼 활용할 수 있습니다.
각 코디에 대해 LLM에 질문할 수 있는 챗봇 기능을 추가합니다.
사용자 셀카를 활용해 최종 모습을 시뮬레이션하는 가상 시착 경험을 제공합니다.