Skip to content

Multimodal Input

Some models, like Image Recognition models, support multimodal inputs, allowing you to combine text with media files. The following example demonstrates providing an image:

POST /v1/chat/completions

curl https://api-platform.ope.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $YOUR_API_KEY" \
  -d '{
    "model": "$MODEL_ID",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {"url": "$URL"}     # Image URL or Data URL (base64)
          },
          {
            "type": "image_url",
            "image_url": {"url": "$URL"}     # Supports multiple inputs
          },
          {
            "type": "text",
            "text": "Explain the meaning of this image."
          }
        ]
      }
    ]
  }'
# First, install the OpenAI library:
# pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="$YOUR_API_KEY",
    base_url="[https://api-platform.ope.ai/v1/](https://api-platform.ope.ai/v1/)"
)

completion = client.chat.completions.create(
    model="$MODEL_ID",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": [
            {
                "type": "image_url",
                "image_url": {"url": $URL}    # Image URL or Data URL (base64)
            },
            {
                "type": "image_url",
                "image_url": {"url": $URL}    # Supports multiple inputs
            },
            {
                "type": "text",
                "text": "Explain the meaning of these images."
            }
        ]}
    ]
)
print(completion.choices[0].message)