File size: 6,098 Bytes
5853bf1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# Image generation with Gemini

Source: <https://ai.google.dev/gemini-api/docs/image-generation>

---

Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. All generated images include a [SynthID watermark](/responsible/docs/safeguards/synthid).

Image generation may not be available in all regions and countries, review our [Gemini models](/gemini-api/docs/models#gemini-2.0-flash-preview-image-generation) page for more information.

**Note:** You can also generate images with [Imagen](/gemini-api/docs/imagen), our specialized image generation model. See the When to use Imagen section for details on how to choose between Gemini and Imagen.

## Image generation (text-to-image)

The following code demonstrates how to generate an image based on a descriptive prompt. You must include `responseModalities`: `["TEXT", "IMAGE"]` in your configuration. Image-only output is not supported with these models.
    
    
    from google import genai
    from google.genai import types
    from PIL import Image
    from io import BytesIO
    import base64
    
    client = genai.Client()
    
    contents = ('Hi, can you create a 3d rendered image of a pig '
                'with wings and a top hat flying over a happy '
                'futuristic scifi city with lots of greenery?')
    
    response = client.models.generate_content(
        model="gemini-2.0-flash-preview-image-generation",
        contents=contents,
        config=types.GenerateContentConfig(
          response_modalities=['TEXT', 'IMAGE']
        )
    )
    
    for part in response.candidates[0].content.parts:
      if part.text is not None:
        print(part.text)
      elif part.inline_data is not None:
        image = Image.open(BytesIO((part.inline_data.data)))
        image.save('gemini-native-image.png')
        image.show()
    

![AI-generated image of a fantastical flying pig](/static/gemini-api/docs/images/flying-pig.png) AI-generated image of a fantastical flying pig

## Image editing (text-and-image-to-image)

To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the [image input](/gemini-api/docs/image-understanding#image-input) section.
    
    
    from google import genai
    from google.genai import types
    from PIL import Image
    from io import BytesIO
    
    import PIL.Image
    
    image = PIL.Image.open('/path/to/image.png')
    
    client = genai.Client()
    
    text_input = ('Hi, This is a picture of me.'
                'Can you add a llama next to me?',)
    
    response = client.models.generate_content(
        model="gemini-2.0-flash-preview-image-generation",
        contents=[text_input, image],
        config=types.GenerateContentConfig(
          response_modalities=['TEXT', 'IMAGE']
        )
    )
    
    for part in response.candidates[0].content.parts:
      if part.text is not None:
        print(part.text)
      elif part.inline_data is not None:
        image = Image.open(BytesIO((part.inline_data.data)))
        image.show()
    

## Other image generation modes

Gemini supports other image interaction modes based on prompt structure and context, including:

  * **Text to image(s) and text (interleaved):** Outputs images with related text. 
    * Example prompt: "Generate an illustrated recipe for a paella."
  * **Image(s) and text to image(s) and text (interleaved)** : Uses input images and text to create new related images and text. 
    * Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?"
  * **Multi-turn image editing (chat):** Keep generating / editing images conversationally. 
    * Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow."



## Limitations

  * For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
  * Image generation does not support audio or video inputs.
  * Image generation may not always trigger: 
    * The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image").
    * The model may stop generating partway through. Try again or try a different prompt.
  * When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
  * There are some regions/countries where Image generation is not available. See [Models](/gemini-api/docs/models) for more information.



## When to use Imagen

In addition to using Gemini's built-in image generation capabilities, you can also access [Imagen](/gemini-api/docs/imagen), our specialized image generation model, through the Gemini API.

Choose **Gemini** when:

  * You need contextually relevant images that leverage world knowledge and reasoning.
  * Seamlessly blending text and images is important.
  * You want accurate visuals embedded within long text sequences.
  * You want to edit images conversationally while maintaining context.



Choose **Imagen** when:

  * Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities.
  * Performing specialized editing tasks like product background updates or image upscaling.
  * Infusing branding, style, or generating logos and product designs.



Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time.

## What's next

  * Check out the [Veo guide](/gemini-api/docs/video) to learn how to generate videos with the Gemini API.
  * To learn more about Gemini models, see [Gemini models](/gemini-api/docs/models/gemini) and [Experimental models](/gemini-api/docs/models/experimental-models).