Now it's time to run inference. The inference will be backed by HuggingFace generation, which provides a model.generate() method for text generation by using PyTorch.

This tutorial shows asking the base model a medical question pulled from the MedReason dataset. It's reasonable that the base model might not be able to answer this question because it is a general-purpose model trained on large, diverse datasets.

First, set up the inference configurations:

# setup inference configurations args = dict( model_name_or_path="ibm-granite/granite-3.3-2b-instruct", # use IBM Granite 3.3 2b instruct model template="granite3", # set to the same one used in training, template for constructing prompts infer_backend="huggingface" # choices： [huggingface, vllm] ) # create inference config file to run with llama factory with open("inference_config.yaml", "w", encoding="utf-8") as file: yaml.dump(args, file, indent=2)

Now you’ll ask the chatbot one of the questions from the MedReason dataset:

from llamafactory.chat import ChatModel chat_model = ChatModel(args) messages = [] # run inference chatbot question = ''' A 1-year-old girl is brought to a neurologist due to increasing seizure frequency over the past 2 months. She recently underwent a neurology evaluation which revealed hypsarrhythmia on electroencephalography (EEG) with a mix of slow waves, multifocal spikes, and asynchrony. Her parents have noticed the patient occasionally stiffens and spreads her arms at home. She was born at 38-weeks gestational age without complications. She has no other medical problems. Her medications consist of lamotrigine and valproic acid. Her temperature is 98.3\u00b0F (36.8\u00b0C), blood pressure is 90/75 mmHg, pulse is 94/min, and respirations are 22/min. Physical exam reveals innumerable hypopigmented macules on the skin and an irregularly shaped, thickened, and elevated plaque on the lower back. Which of the following is most strongly associated with this patient's condition?" "A": "Cardiac rhabdomyoma", "B": "Glaucoma", "C": "Optic glioma", "D": "Polyostotic fibrous dysplasia" ''' messages.append({"role": "user", "content": question}) response = "" for new_text in chat_model.stream_chat(messages): response += new_text print(response) messages.append({"role": "assistant", "content": response})

Here's the sample output from the base Granite 3.3 model:

User:

A 1-year-old girl is brought to a neurologist due to increasing seizure frequency over the past 2 months. She recently underwent a neurology evaluation which revealed hypsarrhythmia on electroencephalography (EEG) with a mix of slow waves, multifocal spikes, and asynchrony. Her parents have noticed the patient occasionally stiffens and spreads her arms at home. She was born at 38-weeks gestational age without complications. She has no other medical problems. Her medications consist of lamotrigine and valproic acid. Her temperature is 98.3\F (36.8\C), blood pressure is 90/75 mmHg, pulse is 94/min, and respirations are 22/min. Physical exam reveals innumerable hypopigmented macules on the skin and an irregularly shaped, thickened, and elevated plaque on the lower back. Which of the following is most strongly associated with this patient's condition?" "A": "Cardiac rhabdomyoma", "B": "Glaucoma", "C": "Optic glioma", "D": "Polyostotic fibrous dysplasia"

Response:

The most strongly associated condition with this patient's condition is "C": "Optic glioma". The patient's symptoms of hypsarrhythmia on EEG, seizure frequency increase, and the presence of hypopigmented macules and a thickened plaque on the lower back are indicative of a neurological disorder. Optic glioma is a type of brain tumor that can present with these symptoms, including seizures and visual disturbances. Option A, "Cardiac rhabdomyoma", typically presents with cardiac involvement and is not associated with the described EEG findings or skin manifestations. Option B, "Glaucoma", is an eye disease that can lead to vision loss but is not associated with the EEG findings or skin lesions described. Option D, "Polyostotic fibrous dysplasia", is a bone disorder characterized by multiple bone lesions and is not associated with the neurological symptoms and EEG findings presented. Therefore, based on the clinical presentation, the most likely diagnosis is an optic glioma.

The correct response from the dataset is:

answer: Cardiac rhabdomyoma

So the base model doesn't generate the correct answer.

Infer with the LoRA fine-tuned adapter

We compare the results by running against the base model and the LoRA tuned adapter. We then ask the same question to see how tuning with the legal dataset allowed the model to better understand and answer medical questions.

The following cell won't be necessary if you've performed LoRA in the same session. However, if you're coming back to the Jupyter Notebook and don't want to retrain, you can download the fine-tuned adapters from your COS Instance.

download_file_cos(credentials, "granite3_lora.zip", "granite3_lora.zip") !unzip granite3_lora.zip

Now you'll configure the options for the ChatModel so that it will incorporate the adapters.

# setup inference configurations args = dict( model_name_or_path="ibm-granite/granite-3.3-2b-instruct", # use IBM Granite 3.3 2b instruct model adapter_name_or_path="granite3_lora", # load the saved LoRA adapters template="granite3", # set to the same one used in training, template for constructing prompts finetuning_type="lora", # which fine-tuning technique used in training infer_backend="huggingface" # choices： [huggingface, vllm] ) # create inference config file to run with llama factory with open("inference_config.yaml", "w", encoding="utf-8") as file: yaml.dump(args, file, indent=2) from llamafactory.chat import ChatModel chat_model = ChatModel(args)

Now we can test the same reasoning challenge to the fine-tuned model:

messages = [] # run inference chatbot question = ''' A 1-year-old girl is brought to a neurologist due to increasing seizure frequency over the past 2 months. She recently underwent a neurology evaluation which revealed hypsarrhythmia on electroencephalography (EEG) with a mix of slow waves, multifocal spikes, and asynchrony. Her parents have noticed the patient occasionally stiffens and spreads her arms at home. She was born at 38-weeks gestational age without complications. She has no other medical problems. Her medications consist of lamotrigine and valproic acid. Her temperature is 98.3\u00b0F (36.8\u00b0C), blood pressure is 90/75 mmHg, pulse is 94/min, and respirations are 22/min. Physical exam reveals innumerable hypopigmented macules on the skin and an irregularly shaped, thickened, and elevated plaque on the lower back. Which of the following is most strongly associated with this patient's condition?" "A": "Cardiac rhabdomyoma", "B": "Glaucoma", "C": "Optic glioma", "D": "Polyostotic fibrous dysplasia" ''' messages.append({"role": "user", "content": question}) response = "" for new_text in chat_model.stream_chat(messages): response += new_text print(response) messages.append({"role": "assistant", "content": response}) Cardiac rhabdomyoma

Sample output from fine-tuned model:

User:

A 1-year-old girl is brought to a neurologist due to increasing seizure frequency over the past 2 months. She recently underwent a neurology evaluation which revealed hypsarrhythmia on electroencephalography (EEG) with a mix of slow waves, multifocal spikes, and asynchrony. Her parents have noticed the patient occasionally stiffens and spreads her arms at home. She was born at 38-weeks gestational age without complications. She has no other medical problems. Her medications consist of lamotrigine and valproic acid. Her temperature is 98.3\u00b0F (36.8\u00b0C), blood pressure is 90/75 mmHg, pulse is 94/min, and respirations are 22/min. Physical exam reveals innumerable hypopigmented macules on the skin and an irregularly shaped, thickened, and elevated plaque on the lower back. Which of the following is most strongly associated with this patient's condition?" "A": "Cardiac rhabdomyoma", "B": "Glaucoma", "C": "Optic glioma", "D": "Polyostotic fibrous dysplasia"

Response:

Cardiac rhabdomyoma

This time the model did generate the correct answer, thanks to the trained adapter.

One aspect to note, the model no longer responds with its reasoning. This result is because the dataset that was used for LoRA has only the correct answer as the expected model output. LoRA fine-tuning can be used to both provide new information but also to instruct the model how to respond.