Chatm2 / documentation_gemini /image_generation_with_gemini.md
kuro223's picture
21
5853bf1
|
raw
history blame
6.1 kB

Image generation with Gemini

Source: https://ai.google.dev/gemini-api/docs/image-generation


Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. All generated images include a SynthID watermark.

Image generation may not be available in all regions and countries, review our Gemini models page for more information.

Note: You can also generate images with Imagen, our specialized image generation model. See the When to use Imagen section for details on how to choose between Gemini and Imagen.

Image generation (text-to-image)

The following code demonstrates how to generate an image based on a descriptive prompt. You must include responseModalities: ["TEXT", "IMAGE"] in your configuration. Image-only output is not supported with these models.

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64

client = genai.Client()

contents = ('Hi, can you create a 3d rendered image of a pig '
            'with wings and a top hat flying over a happy '
            'futuristic scifi city with lots of greenery?')

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=contents,
    config=types.GenerateContentConfig(
      response_modalities=['TEXT', 'IMAGE']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.save('gemini-native-image.png')
    image.show()

AI-generated image of a fantastical flying pig AI-generated image of a fantastical flying pig

Image editing (text-and-image-to-image)

To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the image input section.

from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

import PIL.Image

image = PIL.Image.open('/path/to/image.png')

client = genai.Client()

text_input = ('Hi, This is a picture of me.'
            'Can you add a llama next to me?',)

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=[text_input, image],
    config=types.GenerateContentConfig(
      response_modalities=['TEXT', 'IMAGE']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.show()

Other image generation modes

Gemini supports other image interaction modes based on prompt structure and context, including:

  • Text to image(s) and text (interleaved): Outputs images with related text.
    • Example prompt: "Generate an illustrated recipe for a paella."
  • Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text.
    • Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?"
  • Multi-turn image editing (chat): Keep generating / editing images conversationally.
    • Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow."

Limitations

  • For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
  • Image generation does not support audio or video inputs.
  • Image generation may not always trigger:
    • The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image").
    • The model may stop generating partway through. Try again or try a different prompt.
  • When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
  • There are some regions/countries where Image generation is not available. See Models for more information.

When to use Imagen

In addition to using Gemini's built-in image generation capabilities, you can also access Imagen, our specialized image generation model, through the Gemini API.

Choose Gemini when:

  • You need contextually relevant images that leverage world knowledge and reasoning.
  • Seamlessly blending text and images is important.
  • You want accurate visuals embedded within long text sequences.
  • You want to edit images conversationally while maintaining context.

Choose Imagen when:

  • Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities.
  • Performing specialized editing tasks like product background updates or image upscaling.
  • Infusing branding, style, or generating logos and product designs.

Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time.

What's next