Model Capabilities
Image Understanding
When sending images, it is advised to not store request/response history on the server. Otherwise the request may fail. See Disable storing previous request/response on server.
Some models allow images in the input. The model will consider the image context when generating the response.
Constructing the message body - difference from text-only prompt
The request message to image understanding is similar to text-only prompt. The main difference is that instead of text input:
JSON
[
{
"role": "user",
"content": "What is in this image?"
}
]
We send in content as a list of objects:
JSON
[
{
"role": "user",
"content": [
{
"type": "input_image",
"image_url": "data:image/jpeg;base64,<base64_image_string>",
"detail": "high"
},
{
"type": "input_text",
"text": "What is in this image?"
}
]
}
]
The image_url.url can also be the image's url on the Internet.
Image understanding example
import os
from xai_sdk import Client
from xai_sdk.chat import user, image
client = Client(
api_key=os.getenv("XAI_API_KEY"),
management_api_key=os.getenv("XAI_MANAGEMENT_API_KEY"),
timeout=3600,
)
image_url = "https://science.nasa.gov/wp-content/uploads/2023/09/web-first-images-release.png"
chat = client.chat.create(model="grok-4-1-fast-reasoning")
chat.append(
user(
"What's in this image?",
image(image_url=image_url, detail="high"),
)
)
response = chat.sample()
print(response)
# The response ID that can be used to continue the conversation later
print(response.id)
Image input general limits
- Maximum image size:
20MiB - Maximum number of images: No limit
- Supported image file types:
jpg/jpegorpng. - Any image/text input order is accepted (e.g. text prompt can precede image prompt)