IBM Graniteでwatsonx.aiを使ったAIスタイリストを構築

このチュートリアルでは、生成AIを活用したパーソナル・スタイリストを構築する方法について説明します。このチュートリアルでは、画像インプットを処理するためのIBM® Granite™ Vision 3.2 大規模言語モデル（LLM）と、カスタマイズ可能なアウトフィットのアイデアを策定するための最新の機能を強化したGranite 3.2を活用します。

はじめに

「今日、何を着ようか？」と考えることがどのくらい多いでしょうか。クローゼットのどこから始めればよいのかさえわかりません！」このジレンマは、私たちの多くが共有しているものです。最先端の人工知能（AI）モデルを使用することで、これはもはや困難な作業ではありません。

AIによるスタイリング：その仕組み

IBMのAI駆動型ソリューションは、次の段階別に構成されています。

ユーザーは、現在持っている衣服やウィッシュリスト内の画像などを一度に1アイテムずつアップロードします。
ユーザーは次の基準を選択します。

用途：カジュアルまたはフォーマル。
時間帯：朝、午後または夕方。
季節：冬、春、夏または秋。
場所（コーヒーショップなど）。

3. インプットが送信されると、マルチモーダルGranite Vision 3.2モデルは画像リストを反復処理し、以下のアウトプットを返します。

商品の説明
カテゴリ：シャツ、パンツ、靴。
用途：カジュアルまたはフォーマル。

4. その後、推論力が強化されたGranite 3.2モデルがファッション・スタイリストとしての役割を果たします。LLMは、Visionモデルのアウトプットを使用して、ユーザーのイベントに適した服装の推奨事項を提供します。

5. 服装の提案、ユーザーがアップロードしたアイテムのデータ・フレーム、および記述のパーソナライズされた推奨事項内の画像は、すべてユーザーに返されます。

前提条件

watsonx.ai™プロジェクトを作成するにはIBM Cloud®のアカウントが必要です。

手順

watsonxのアプリケーション・プログラミング・インターフェース（API）を使用するには、以下の手順を完了する必要があります。このチュートリアルはGitHubでもご覧いただけます。

ステップ1. 環境を設定する

IBM Cloudアカウントを使用して、watsonx.aiにログインします。
watsonx.aiプロジェクトを作成します。

プロジェクトIDはプロジェクト内から取得できます。[管理]タブをクリックし、[一般]ページの[詳細]セクションからプロジェクトIDをコピーしてください。このチュートリアルではこのIDが必要になります。

ステップ2. watsonx.aiランタイム・サービスとAPIキーを設定する

watsonx.aiランタイム・サービス・インスタンスを作成します（無料インスタンスであるLiteプランを選択します）。
APIキーを生成します。
watsonx.aiで作成したプロジェクトにwatsonx.aiランタイム・サービスを関連付けます。

ステップ3. リポジトリーのクローンを作成する（オプション）

このAIツールを使用する際に、よりインタラクティブなエクスペリエンスを実現するには、GitHubリポジトリーのクローンを作成し、AIスタイリスト・プロジェクト内のREADME.mdファイルのセットアップ手順に従って、ローカル・マシン上でStreamlitアプリケーションを起動してください。ステップ・バイ・ステップで進めたい場合は、Jupyter Notebookを作成し、チュートリアルを進めてください。

ステップ4. 関連ライブラリーをインストールしてインポートし、認証情報を設定する

このチュートリアルには、いくつかのライブラリーとモジュールが必要です。以下を必ずインポートしてください。それらがインストールされていない場合は、pipをクイックインストールすることで問題が解決されます。

# Install required packages
!pip install -q image ibm-watsonx-ai

# Required imports
import getpass, os, base64, json
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from PIL import Image

認証情報を設定するには、ステップ1.で生成したWATSONX_APIKEY およびWATSONX_PROJECT_ID ステップ1で生成しました。また、APIエンドポイントとして機能するURL も設定します。

WATSONX_APIKEY = getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): ")
WATSONX_PROJECT_ID = getpass.getpass("Please enter your project ID (hit enter): ")
URL = "https://us-south.ml.cloud.ibm.com"

正常な認証情報をカプセル化するには、Credentials クラスを使用できます。

credentials = Credentials(
url=URL,
api_key=WATSONX_APIKEY
)

ステップ5. Granite VisionモデルのAPIリクエストを設定する

config/agents.yamlaugment_api_request_body 関数はユーザー・クエリと画像をパラメーターとして受け取り、API リクエストの本文を拡張します。この関数は、ビジョン・モデルに対する推論の各反復で使用します。

def augment_api_request_body(user_query, image):
    messages = [
        {
            "role": "user",
            "content": [{
                "type": "text",
                "text": user_query
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image}"
                }
            }]
        }
    ]
return messages

また、ModelInference クラスを使用してモデル・インターフェースをインスタンス化することもできます。

model = ModelInference(
    model_id="ibm/granite-vision-3-2-2b",
    credentials=credentials,
    project_id=WATSONX_PROJECT_ID,
    params={
        "max_tokens": 400,
        "temperature": 0
    }
)

ステップ6. 画像をエンコードする

LLMで読みやすい方法で画像をエンコードするには、画像をバイトにエンコードしてから、UTF-8表現にデコードします。この場合、画像はローカルの画像ディレクトリーにあります。サンプル画像は、GitHubリポジトリーのAIスタイリストディレクトリで見つけることができます。

directory = "images" #directory name
images = []
filenames = []
for filename in os.listdir(directory):
    if filename.endswith(".jpeg") or filename.endswith(".png"):
        filepath = directory + '/' + filename
        with open(filepath, "rb") as f:
            images.append(base64.b64encode(f.read()).decode('utf-8'))
        filenames.append(filename)

ステップ7. Visionモデルでインプットを分類する

画像をロードしてエンコードしたので、Visionモデルにクエリを実行できます。私たちのプロンプトは、私たちが有効なJSONのアウトプットを求める際にモデルの創造性を制限するために、希望するアウトプットに特化しています。各画像の説明、カテゴリー、場面を次のリストに保存します：closet .

user_query = """Provide a description, category, and occasion for the clothing item or shoes in this image.
                Classify the category as shirt, pants, or shoes.
                Classify the occasion as casual or formal.
                Ensure the output is valid JSON. Do not create new categories or occasions. Only use the allowed classifications.
                Your response should be in this schema:
                {
                    "description": "<description>",
                    "category": "<category>",
                    "occasion": "<occasion>"
                }
                """

image_descriptions = []
for i in range(len(images)):
    image = images[i]
    message = augment_api_request_body(user_query, image)
    response = model.chat(messages=message)
result = response['choices'][0]['message']['content']
    print(result)
image_descriptions.append(result)

アウトプット：

{
    "description": "A pair of polished brown leather dress shoes with a brogue detailing on the toe box and a classic oxford design.",
    "category": "shoes",
    "occasion": "formal"
}
{
    "description": "A pair of checkered trousers with a houndstooth pattern, featuring a zippered pocket and a button closure at the waist.",
    "category": "pants",
"occasion": "casual"
}
{
    "description": "A light blue, button-up shirt with a smooth texture and a classic collar, suitable for casual to semi-formal occasions.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of khaki pants with a buttoned waistband and a button closure at the front.",
    "category": "pants",
    "occasion": "casual"
}
{
    "description": "A blue plaid shirt with a collar and long sleeves, featuring chest pockets and a button-up front.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of bright orange, short-sleeved t-shirts with a crew neck and a simple design.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A pair of blue suede sneakers with white laces and perforations, suitable for casual wear.",
    "category": "shoes",
    "occasion": "casual"
}

{
    "description": "A pair of red canvas sneakers with white laces, isolated on a white background.",
    "category": "shoes",
    "occasion": "casual"
}
{
    "description": "A pair of grey dress pants with a smooth texture and a classic design, suitable for formal occasions.",
    "category": "pants",
    "occasion": "formal"
}
{
    "description": "A plain white T-shirt with short sleeves and a crew neck, displayed from the front and back.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "A black short-sleeved t-shirt with a crew neck and a simple design.",
    "category": "shirt",
    "occasion": "casual"
}
{
    "description": "Black pants with a zippered pocket and a buttoned fly, showing the waistband and pocket details.",
    "category": "pants",
    "occasion": "casual"
}
{
    "description": "A pair of tan leather boots with a chunky sole and a high-top design, suitable for casual wear.",
    "category": "shoes",
    "occasion": "casual"
}

ステップ8. 推論モデルを使用してコーディネートを作成する

これで、衣服や靴の各アイテムが分類されたため、推論モデルが選択した機会に合わせた服装を生成するのがはるかに簡単になります。推論モデルをインスタンス化してクエリーしてみましょう

reasoning_model = ModelInference(
    model_id="ibm/granite-3-2-8b-instruct",
    credentials=credentials,
    project_id=WATSONX_PROJECT_ID
)

ファイル名を画像の説明と一致させるには、画像の説明のリストを列挙し、各フィールドの各項目の説明、カテゴリー、場面、ファイル名を格納する辞書のリストを作成できます。

# Add filenames to the image descriptions
closet = []
for i, desc in enumerate(image_descriptions):
    desc_dict = json.loads(desc)
    desc_dict['filename'] = filenames[i]
    image_descriptions[i] = json.dumps(desc_dict)

closet = [json.loads(js) for js in image_descriptions]

ここで、Granite 3.2モデルに推論を含めてクエリーし、指定された基準を満たす服を作成してみましょう。作成には、closet のリストを使用します。

occasion = input("Enter the occasion") #casual or formal (e.g. "casual")
time_of_day = input("Enter the time of day") #morning, afternoon or evening (e.g. "morning")
location = input("Enter the location") #any location (e.g. "park")
season = input("Enter the season") #spring, summer, fall or winter (e.g. "fall")

prompt = f"""Use the description, category, and occasion of the clothes in my closet to put together an outfit for a {occasion} {time_of_day} at the {location}. The event takes place in the {season} season. Make sure to return only one shirt, bottoms, and shoes. Use the description, category, and occasion provided. Do not classify the items yourself. Include the file name of each image in your output along with the file extension. Here are the items in my closet: {closet}"""

messages = [
        {"role": "control",
        "content": "thinking"},
        {"role": "user",
        "content": [
                {"type": "text",
"text": f"{prompt}"}
]}
        ]
outfit = reasoning_model.chat(messages=messages)['choices'][0]['message']['content']
print(outfit)

アウトプット：

私の思考プロセスは次のとおりです。
- 秋の公園でのカジュアルな朝に適した服装が必要です。
-「カジュアル」な場面のカテゴリに合うシャツ1枚、パンツ1本、靴1足を選択します。
- フォーマルなアイテムや過度に着飾ったアイテムを避け、公園でのアクティビティーに快適なアイテムを選択します。

私の答えは次のとおりです。

秋の公園でのカジュアルな朝には、次の服装を提案します：

1. **シャツ**：カラー付きロングスリーブブルーチェックシャツ（ファイル：「image13.jpeg」）
- チェックパターンは秋の定番であり、カジュアルな公園シーンにぴったりです。長袖は、朝の肌寒い気温からある程度守ってくれます。

2. **パンツ**：ウエストバンドにボタンがあり、フロントボタン留めのカーキ色のパンツ (ファイル：「image7.jpeg」）
- カーキはカジュアルな雰囲気にマッチし、チェックシャツとのバランスにも優れた万能な色です。実用的で快適に歩くことができます。

3. **靴**: 厚底のハイカットデザインのタンレザーブーツ（ファイル：「image3.jpeg」）
- タンレザーブーツはスタイリッシュでありながら快適な選択肢です。厚底のソールはグリップ力とサポート力に優れ、公園の小道や凹凸がある地面を歩くのに最適です。

この組み合わせは、快適さと実用性を考慮しながら、カジュアルな朝のお出かけに適したリラックスしたまとまりのあるルックスになっています。

生成された衣服の説明を使用して、モデルが推奨する衣服も表示できます。これを行うには、ファイル名を抽出するだけで済みます。モデルが同じファイル名について2回言及している場合は、画像リストを反復する際に、その画像がまだ表示されていないかどうかを確認することが重要です。これは、表示済みの画像をselected_items リスト内に保管することで実行できます。最後に、選択した項目を表示できます。

selected_items = []
#extract the images of clothing that the model recommends
for item, uploaded_file in zip(closet, images):
    if item['filename'].lower() in outfit.lower() and not any(key['filename'] == item['filename'] for key in selected_items):
        selected_items.append({
            'image': uploaded_file,
            'category': item['category'],
            'filename': item['filename']
        })

#display the selected clothing items
if len(selected_items) > 0:
    for item in selected_items:
        display(Image.open(directory + '/' + item['filename']))

まとめ

このチュートリアルでは、AIを使用してユーザーの特定のイベントにスタイリングに関するアドバイスを提供するシステムを構築しました。ユーザーの衣服の写真または製品の画面を使用して、指定された基準を満たすようにカスタマイズされます。Granite-Vision-3-2-2bモデルは、各項目のラベル付けと分類において重要でした。さらに、Granite-3-2-8B-instructモデルは推論機能を活用して、パーソナライズされた服装のアイデアを生成しました。

このアプリケーションを構築するための次のステップには、次のようなものがあります。

ユーザーの個人的なスタイル、体型、好みのカラー・パレットなどに合わせて衣服をカスタマイズする。
ファッションやアクセサリーなどの基準を広げる。たとえば、システムは、選択したシャツ、パンツ、靴に加えて、フォーマルな会議に出席するユーザー向けに、ブレザーを提案する場合があります。
ユーザー独自のスタイルと予算に合わせたeコマース製品の推奨事項と料金体系を提供することで、個人的な買い物客としてサービスを提供します。
LLMに各服装について質問するチャットボット機能を追加する。
ユーザーの自撮りを使用して最終的な仕上がりをシミュレートするバーチャル試着体験を提供します。