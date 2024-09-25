Now, we can loop through our images to see the text descriptions produced by the model in response to the query, "What is happening in this image?"

for i in range(len(encoded_images)):

image = encoded_images[i]

user_query = "What is happening in this image?"

request_body = augment_api_request_body(user_query, image)

response = requests.post(

credentials.get("url"),

headers=headers,

json=request_body

)

if response.status_code != 200:

raise Exception("Non-200 response: " + str(response.text))

data = response.json()

print(data['choices'][0]['message']['content'])

Output:

The image depicts a bustling city street, with a busy road and sidewalks lined with tall buildings, trees, and streetlights. The street is filled with cars, taxis, and pedestrians, creating a vibrant and dynamic atmosphere. The scene is set against a backdrop of towering skyscrapers and bustling city life, capturing the energy and activity of urban living.

This image shows a woman running in the street. The woman is wearing a yellow hoodie, black capri leggings, and black sneakers. She has a white headphone around her neck and her brown hair is in a ponytail. The woman appears to be running in the street, with her right leg extended behind her and her left leg bent in front of her. Her arms are bent at the elbows, with her right arm extended behind her and her left arm extended in front of her. In the background, there is a large white building with a row of windows and doors. The building appears to be an industrial or commercial structure, possibly a warehouse or office building. The street in front of the building is empty, with no other people or vehicles visible. The overall atmosphere of the image suggests that the woman is engaged in some form of physical activity or exercise, possibly jogging or running for fitness or recreation.

The image depicts a flooded area, with water covering the ground and surrounding buildings. The water is dark brown and appears to be deep, with some areas reaching up to the roofs of the buildings. There are several buildings visible in the image, including what appears to be a house, a barn, and some smaller structures. The buildings are all partially submerged in the water, with some of them appearing to be damaged or destroyed. In the background, there are fields and crops that are also flooded. The fields are covered in water, and the crops are bent over or lying flat on the ground. There are also some trees and other vegetation visible in the background, but they appear to be struggling to survive in the flooded conditions. Overall, the image suggests that a severe flood has occurred in this area, causing significant damage to the buildings and crops. The floodwaters appear to be deep and widespread, and it is likely that the area will take some time to recover from the disaster.

This image shows a close-up of a nutrition label on a food product, with a person's finger pointing to the label. The label is white with black text and lists various nutritional information, including serving size, calories, fat content, cholesterol, sodium, carbohydrates, dietary fiber, and vitamins. The label also includes a table with nutritional values based on a 2,000 calorie diet. The background of the image is dark gray, suggesting that it may be a product photo or advertisement for the food item. Overall, the image appears to be intended to inform consumers about the nutritional content of the product and help them make informed purchasing decisions.

The Llama 3.2-90b-vision-instruct model was able to successfully caption each image in significant detail.