作者

Data Scientist

使用 LoRA 对 Granite 进行微调

低秩适应 (LoRA) 是一种高效的微调方法，它减少了可训练参数的数量，提高了训练速度和资源使用率，同时保持了相同的输出质量。而不是更新神经网络中的所有参数在微调过程中，LoRA 会冻结原始的预训练权重，并添加小的、可训练的低秩矩阵，以接近新任务所需的变化。这种方法基于这样一个假设，即适应过程中的权重更新具有较低的“内在等级”。

LoRa 的另一个优点是，由于预训练的权重被冻结，生成的适配器轻便易携，且易于存储。

在本教程中，您将使用 LLaMa Factory。LLaMa Factory 是一种大型语言模型 (LLM) 训练和微调低代码与无代码平台，允许用户在自定义数据集上调整 LLM、评估性能并提供模型。它同时拥有易于使用的 Web 用户界面和 CLI，并支持 100 多种 LLM。该平台支持 Alpaca 和 ShareGPT 格式的数据集。LLaMa Factory 并不是微调 LLM 的唯一方法，PEFT 参数高效微调库是更新大型模型的另一种选择。PEFT 能够执行量化 LoRA (QLoRA)，进一步压缩微调模型。在本教程中，您将使用非量化版本的 Granite 3.3。

尽管 LLaMa Factory 可以在不使用大量计算资源的情况下运行，但它确实需要 GPU 和大量的内存资源。在本教程中，您将使用 watsonx® 上的 LLaMa Factory 为生成的适配器提供 GPU 资源和存储空间。

配置

Watson Studio 配置

a. 使用 IBM Cloud® 账户登录 watsonx.ai®。

b. 创建 watsonx.ai 项目。请在项目中记录您的项目 ID > 管理 > 基本信息 > 项目 ID。
您需要此 ID 用于本教程。

c. 创建 watsonx.ai 运行时服务实例。在本教程中，您需要创建付费实例才能访问 GPU。

d. 生成 watsonx 应用程序编程接口（API 密钥）。

e. 将 watsonx.ai 运行时服务关联到您在 watsonx.ai 中创建的项目。

Cloud Object Storage

a. 要为笔记本创建 Cloud Object Storage，请访问 https://cloud.ibm.com/，然后选择“创建实例”。

b. 这将打开一个创建对话框，您可以在其中选择定价套餐。对于本教程，选择标准套餐即可。

c. 然后，为 Cloud Object Storage 实例命名。

d. 创建实例后，返回“项目”并选择“新建资产”，然后选择“连接到数据源”。

显示用于 Cloud Object Storage 的 watsonx.data® 连接的图片

配置 Cloud Object Storage 的数据连接

e. 选择“Cloud Object Storage”

f.在接下来的对话框中，按名称选择您在步骤 a-d 中创建的实例。

g. 选择“创建”。

# 创建 Jupyter Notebook

创建 Jupyter Notebook。

a. 选择项目环境中的“资产”选项卡。b. 点击新建资产。

c. 在左侧面板中选择“使用模型”选项。

d. 点击“使用 Python 和 R notebook 处理数据和模型”。

e. 在“名称”字段中输入笔记本的名称。选择 Python 运行时 23.1（4 个虚拟 CPU，16 GB 内存）来定义配置。

f. 选择“创建”。

设置

接下来，需在运行环境中安装依赖包。首先，安装 Llama-Factory 用于生成低秩适配器，随后安装 Pandas 用于将数据集格式化为 Alpaca 格式。

!pip install -q llamafactory 2>/dev/null
# pandas needed to format the dataset
!pip install -q --upgrade pandas 2>/dev/null

###检查 GPU 环境

接下来，需确认你的 watsonx 环境已提供与 PyTorch 兼容的 GPU——这是使用 LLaMa-Factory 的必要条件。

import torch

try:
  assert torch.cuda.is_available() is True
except AssertionError:
  print("No GPU found, please set up a GPU before using LLaMA Factory.")

若上述代码片段未输出 "No GPU found"，则说明环境配置无误，可继续后续操作。

接下来，需导入相关库，用于数据处理以及创建训练所需的 LLaMa Factory 配置文件。

# Import libraries
import pandas as pd
import json
import yaml

下载并处理 MedReason 数据集

本教程中，你将使用 MedReason (https://github.com/UCSC-VLAA/MedReason) 数据集的部分数据。MedReason 是一个大规模高质量医疗推理数据集，其设计目标是助力大型语言模型实现可解释性医疗问题求解。MedReason 数据集的核心在于聚焦模型的推理过程，并验证模型所采用的思维链；同时在本场景中，该数据集还具备另一重要价值——其数据时效性强，未被纳入 IBM® Granite 3.3 模型的原始训练数据，这一特性使其成为理想的微调数据集。

Granite 3.3 模型专为微调训练设计，本教程中的数据集微调与模型训练流程均通过 LLaMa Factory 执行。Granite 系列模型具备高效微调特性，即便在计算资源有限的环境下，也能实现高效微调。你将从 GitHub 加载精选的 MedReason 数据集：

from datasets import load_dataset

training = pd.read_json("https://raw.githubusercontent.com/UCSC-VLAA/MedReason/refs/heads/main/eval_data/medbullets_op4.jsonl", lines=True)

LLaMa Factory 要求数据集需预先格式化为 Alpaca 或 ShareGPT 格式。因此，我们需按照 Alpaca 格式规范 (dataset to contain instruction, input and output fields according to the [Alpaca format)，将原始法律数据集的问答字段重新格式化为包含指令、输入、输出三类字段的结构。

Alpaca 是一种用于描述指令、用户输入及系统输出的 JSON 格式，具体示例如下：

{
    "instruction": "user instruction (required)",
    "input": "user input (optional)",
    "output": "model response (required)",
    "system": "system prompt (optional)",
}

由于 MedReason 数据集未采用 Alpaca 格式，你需在下一个代码单元格中创建该格式的数据集：

!mkdir -p data

# Format Med Dataset to Alpaca Format
formatted_data = [
    {
        "instruction": row["question"] + str(row["options"]),
        "input": "",
        "output": row["answer"]
    }
    for _, row in training.iterrows()
]

# output formatted MedReason dataset
with open("data/med.json", "w", encoding="utf-8") as f:
  json.dump(formatted_data, f, indent=2, ensure_ascii=False)

LLaMA Factory 需通过特定文件来识别数据集的加载方式，以满足训练需求。该文件必须位于路径 data/dataset_info.json 下。因此，我们需创建一份 dataset_info.json 文件，其中需包含已格式化的新医疗数据集的存储路径，以便 LLaMA Factory 命令行工具 (CLI) 能够访问该数据集。关于 dataset_info.json 文件的详细说明，请参阅官方文档。LLaMA Factory 代码仓库中已内置部分可用数据集，但由于本教程使用的是自定义数据集，因此需将该数据集信息添加至上述 JSON 文件中。

# "med" will be the identifier for the dataset 
# which points to the local file that contains the dataset
dataset_info = {
  "med": {
    "file_name": "med.json",
  }
}

# Create dataset_info.json with legal dataset so can reference with llama factory
with open("data/dataset_info.json", "w", encoding="utf-8") as f:
  json.dump(dataset_info, f, indent=2, ensure_ascii=False)

现已将 Alpaca 格式的 JSON 数据对象保存至运行环境中，你已完成训练前的准备工作，可启动模型训练。

微调

下一步是设置训练配置，然后将配置写入 LLaMa-Factory 用于运行训练的 YAML 文件。

现在，您将在 MedReason 数据集的子集上运行监督微调 (SFT)。LLaMa Factory 支持多种不同类型的训练。一些最常用的包括：

预训练：通过使用大量数据集对模型进行初始训练，以生成对基本语言和想法的响应。
监督微调 (SFT)：利用注释数据对模型进行额外训练，以提高特定功能或特定主题的精确度。
奖励建模：模型获取如何实现特定激励或奖励的知识，为其输出近端策略优化 (PPO) 提供依据。
训练：一种强化学习 (RL) 技术，通过策略梯度技术进一步完善模型，以提高其在特定环境中的有效性。

用于配置 LoRA 的设置有很多，但最重要和最常用的有以下几种：

学习率 (LR)： 学习率决定了每个模型参数在每次迭代训练中的更新幅度。较高的 LR 可以允许较大的更新，加快收敛速度，但也有可能出现过冲或在最优解附近振荡的情况。LR 越低，收敛速度越慢，但收敛越稳定，从而降低了最优解附近不稳定的风险。
loraplus/_lr_ratio：这一步设置学习率的比率。通常情况下，它应 > 1，但 loraplus_lr_ratio 的最佳选择取决于模型和任务。作为指导原则，当任务难度较大，模型需要更新其功能才能很好地学习时，loraplus_lr_ratio 应该更大。在本例中，使学习率略小于典型的 LoRA 学习率（例如 2 倍）会有所帮助。
有效批次大小：正确配置批次大小对于平衡训练稳定性与所用 GPU 的 VRAM 限制至关重要。有效批次大小由 per_device_train_batch_size * gradient_accumulation_steps 的乘积决定。较大的有效批次大小通常会使训练更顺畅、更稳定，但也可能需要比 GPU 包含更多的 VRAM。有效批次越小，差异越大。

这是配置训练的代码：

# setup training configurations
args = dict(
  stage="sft",  # do supervised fine-tuning
  do_train=True,  # we're actually training
  model_name_or_path="ibm-granite/granite-3.3-2b-instruct",  # use IBM Granite 3.3 2b instruct model
  dataset="med",  # use medical datasets we created
  template="granite3",   # use granite3 prompt template
  finetuning_type="lora", # use LoRA adapters to save memory
  lora_target="all",  # attach LoRA adapters to all linear layers
  loraplus_lr_ratio=16.0,  # use LoRA+ algorithm with lambda=16.0
  output_dir="granite3_lora",  # the path to save LoRA adapters
  per_device_train_batch_size=4,  # the batch size
  gradient_accumulation_steps=2,  # the gradient accumulation steps
  learning_rate=1e-4,  # the learning rate
  num_train_epochs=3.0, # the epochs of training
  max_samples=500,  # use 500 examples in each dataset
  fp16=True,  # use float16 mixed precision training
  report_to="none", # disable wandb logging
)

# create training config file to run with llama factory
with open("train_granite3_lora_med.yaml", "w", encoding="utf-8") as file:
  yaml.dump(args, file, indent=2)

下一个代码单元格将执行模型训练，运行时间最长可达 10 分钟：

!llamafactory-cli train train_granite3_lora_med.yaml;

使用 Cloud Object Storage

接下来，你需创建两个方法，分别用于从 IBM Cloud Object Storage 上传数据和下载数据：

from ibm_botocore.client import Config
import ibm_boto3

def upload_file_cos(credentials, local_file_name, key):  
    cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])
    try:
        res=cos.upload_file(Filename=local_file_name, Bucket=credentials['BUCKET'],Key=key)
    except Exception as e:
        print(Exception, e)
    else:
        print(' File Uploaded')


def download_file_cos(credentials,local_file_name,key):  
    cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])
    try:
        res=cos.download_file(Bucket=credentials['BUCKET'],Key=key,Filename=local_file_name)
    except Exception as e:
        print(Exception, e)
    else:
        print('File Downloaded')

下一个代码单元格包含 Cloud Object Storage 的访问凭据。

在你的笔记本中，点击右上角的 Code Snippets 标签页。此操作将打开一个菜单，其中包含多个可生成代码片段的选项。选择 "Read Data"（读取数据）：

在 Watson Studio 中使用准备好的代码片段

此步骤将打开一个菜单，以选择数据文件。如果您尚未将任何内容上传到 Cloud Object Storage 实例，则需要上传一些内容来生成凭据，这些内容可以是经典的数据集，如 wine.csv。

在 Watson Studio 中选择数据资产

点击“选择”后，您可以在“加载为”选项下生成凭据片段。选择“在单元格中插入代码”：

在 Watson Studio 中插入生成的代码片段

此步骤将生成如下格式的代码单元格，其中包含自动生成的正确 ID 与端点访问凭据：

# @hidden_cell
# The following code contains metadata for a file in your project storage.
# You might want to remove secret properties before you share your notebook.

storage_metadata = {
    'IAM_SERVICE_ID': '',
    'IBM_API_KEY_ID': '',
    'ENDPOINT': '',
    'IBM_AUTH_ENDPOINT': '',
    'BUCKET': '',
    'FILE': ''
}

现在需处理包含适配器文件及适配器相关信息的压缩包：

!zip -r "granite3_lora.zip" "granite3_lora"

请验证压缩包是否创建成功：

!ls

推理

现在进入推理阶段。本次推理将基于 Hugging Face 的生成能力实现，该框架提供了 model.generate() 方法，可借助 PyTorch 完成文本生成。

本教程将以 MedReason 数据集中的一道医疗问题为例，向基础模型发起查询。由于基础模型是基于大规模多样化数据集训练的通用模型，其无法回答该问题属于正常情况。

首先，设置推理配置：

# setup inference configurations
args = dict(
  model_name_or_path="ibm-granite/granite-3.3-2b-instruct",  # use IBM Granite 3.3 2b instruct model
  template="granite3",  # set to the same one used in training, template for constructing prompts
  infer_backend="huggingface"  # choices： [huggingface, vllm]
)

# create inference config file to run with llama factory
with open("inference_config.yaml", "w", encoding="utf-8") as file:
  yaml.dump(args, file, indent=2)

现在，你可以向聊天机器人提出一个来自 MedReason 数据集的问题：

from llamafactory.chat import ChatModel
chat_model = ChatModel(args)
messages = []

# run inference chatbot
question = '''
A 1-year-old girl is brought to a neurologist due to increasing seizure frequency over the past 2 months. 
She recently underwent a neurology evaluation which revealed hypsarrhythmia on electroencephalography (EEG) with a mix of slow waves, multifocal spikes, and asynchrony. 
Her parents have noticed the patient occasionally stiffens and spreads her arms at home. She was born at 38-weeks gestational age without complications. 
She has no other medical problems. Her medications consist of lamotrigine and valproic acid. Her temperature is 98.3\u00b0F (36.8\u00b0C), blood pressure is 90/75 mmHg, pulse is 94/min, and respirations are 22/min. 
Physical exam reveals innumerable hypopigmented macules on the skin and an irregularly shaped, thickened, and elevated plaque on the lower back. 
Which of the following is most strongly associated with this patient's condition?"
"A": "Cardiac rhabdomyoma", "B": "Glaucoma", "C": "Optic glioma", "D": "Polyostotic fibrous dysplasia"
'''

基于该问题创建一条新消息，并将其传入基础模型：

messages.append({"role": "user", "content": question})

response = ""
for new_text in chat_model.stream_chat(messages):
    response += new_text

print(response)
messages.append({"role": "assistant", "content": response})

以下是 Granite 3.3 基础模型的输出示例：

User：

A 1-year-old girl is brought to a neurologist due to increasing seizure frequency over the past 2 months. 
She recently underwent a neurology evaluation which revealed hypsarrhythmia on electroencephalography (EEG) with a mix of slow waves, multifocal spikes, and asynchrony. 
Her parents have noticed the patient occasionally stiffens and spreads her arms at home. She was born at 38-weeks gestational age without complications. 
She has no other medical problems. Her medications consist of lamotrigine and valproic acid. Her temperature is 98.3\F (36.8\C), blood pressure is 90/75 mmHg, pulse is 94/min, and respirations are 22/min. 
Physical exam reveals innumerable hypopigmented macules on the skin and an irregularly shaped, thickened, and elevated plaque on the lower back. 
Which of the following is most strongly associated with this patient's condition?"
"A": "Cardiac rhabdomyoma", "B": "Glaucoma", "C": "Optic glioma", "D": "Polyostotic fibrous dysplasia"

Response：

The most strongly associated condition with this patient's condition is "C": "Optic glioma".

The patient's symptoms of hypsarrhythmia on EEG, seizure frequency increase, and the presence of hypopigmented macules and a thickened plaque on the lower back are indicative of a neurological disorder. Optic glioma is a type of brain tumor that can present with these symptoms, including seizures and visual disturbances.

Option A, "Cardiac rhabdomyoma", typically presents with cardiac involvement and is not associated with the described EEG findings or skin manifestations.

Option B, "Glaucoma", is an eye disease that can lead to vision loss but is not associated with the EEG findings or skin lesions described.

Option D, "Polyostotic fibrous dysplasia", is a bone disorder characterized by multiple bone lesions and is not associated with the neurological symptoms and EEG findings presented.

Therefore, based on the clinical presentation, the most likely diagnosis is an optic glioma.

来自数据集的正确响应是：

answer: Cardiac rhabdomyoma

因此，该基础模型无法生成正确的答案。

##基于 LoRA 微调适配器执行推理

我们将通过分别运行基础模型与 LoRA 微调适配器，对两者的推理结果进行对比。随后，我们将提出相同问题，以验证通过医疗数据集微调后，模型对医疗类问题的理解与回答能力是否得到提升。

若你在当前会话中已完成 LoRA 微调，则无需执行下一个代码单元格。但如果是重新打开该 Jupyter Notebook，且不想重新训练，你可以从自己的 COS 实例中下载已微调完成的适配器。

download_file_cos(credentials, "granite3_lora.zip", "granite3_lora.zip")
!unzip granite3_lora.zip

现在你需要配置聊天模型的相关参数，使其能够集成该适配器。

# setup inference configurations
args = dict(
  model_name_or_path="ibm-granite/granite-3.3-2b-instruct",  # use IBM Granite 3.3 2b instruct model
  adapter_name_or_path="granite3_lora", # load the saved LoRA adapters
  template="granite3", # set to the same one used in training, template for constructing prompts
  finetuning_type="lora", # which fine-tuning technique used in training
  infer_backend="huggingface" # choices： [huggingface, vllm]
)

# create inference config file to run with llama factory
with open("inference_config.yaml", "w", encoding="utf-8") as file:
  yaml.dump(args, file, indent=2)


from llamafactory.chat import ChatModel
chat_model = ChatModel(args)

现在我们可向微调后的模型发起相同的推理测试：

messages = []

# run inference chatbot
question = '''
A 1-year-old girl is brought to a neurologist due to increasing seizure frequency over the past 2 months. 
She recently underwent a neurology evaluation which revealed hypsarrhythmia on electroencephalography (EEG) with a mix of slow waves, multifocal spikes, and asynchrony. 
Her parents have noticed the patient occasionally stiffens and spreads her arms at home. She was born at 38-weeks gestational age without complications. 
She has no other medical problems. Her medications consist of lamotrigine and valproic acid. Her temperature is 98.3\u00b0F (36.8\u00b0C), blood pressure is 90/75 mmHg, pulse is 94/min, and respirations are 22/min. 
Physical exam reveals innumerable hypopigmented macules on the skin and an irregularly shaped, thickened, and elevated plaque on the lower back. 
Which of the following is most strongly associated with this patient's condition?"
"A": "Cardiac rhabdomyoma", "B": "Glaucoma", "C": "Optic glioma", "D": "Polyostotic fibrous dysplasia"
'''

基于该问题创建一条新消息，并将其传入微调后的模型：

messages.append({"role": "user", "content": question})

response = ""
for new_text in chat_model.stream_chat(messages):
    response += new_text

print(response)
messages.append({"role": "assistant", "content": response})
Cardiac rhabdomyoma

微调模型的样本输出：

User：

A 1-year-old girl is brought to a neurologist due to increasing seizure frequency over the past 2 months. 
She recently underwent a neurology evaluation which revealed hypsarrhythmia on electroencephalography (EEG) with a mix of slow waves, multifocal spikes, and asynchrony. 
Her parents have noticed the patient occasionally stiffens and spreads her arms at home. She was born at 38-weeks gestational age without complications. 
She has no other medical problems. Her medications consist of lamotrigine and valproic acid. Her temperature is 98.3\u00b0F (36.8\u00b0C), blood pressure is 90/75 mmHg, pulse is 94/min, and respirations are 22/min. 
Physical exam reveals innumerable hypopigmented macules on the skin and an irregularly shaped, thickened, and elevated plaque on the lower back. 
Which of the following is most strongly associated with this patient's condition?"
"A": "Cardiac rhabdomyoma", "B": "Glaucoma", "C": "Optic glioma", "D": "Polyostotic fibrous dysplasia"

Response：

Cardiac rhabdomyoma

得益于已训练完成的适配器，此次模型成功生成了正确答案。

需注意的一点是，该模型不再输出推理过程。该结果的成因在于：用于 LoRA 微调的数据集仅将正确答案设定为模型的预期输出。微调不仅可用于向模型补充新信息，还能指导模型掌握特定的响应方式。

摘要

在本教程中，LoRA 使用新的医学知识和详细的响应模板，对IBM Granite-3.3-2b-Instruct 模型进行了微调。您已了解 Granite 3.3 的学习能力，即使使用小型模型和数据集中的有限样本也是如此。

解锁生成式 AI + 机器学习的强大功能

了解如何自信地将生成式 AI 和机器学习融入您的业务中。

使用 Granite LLM 对 LoRa 进行微调

作者

使用 LoRA 对 Granite 进行微调

配置

# 创建 Jupyter Notebook

设置

下载并处理 MedReason 数据集

微调

使用 Cloud Object Storage

推理

微调模型的样本输出：

摘要

资源