File size: 6,098 Bytes
5853bf1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# Image generation with Gemini
Source: <https://ai.google.dev/gemini-api/docs/image-generation>
---
Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. All generated images include a [SynthID watermark](/responsible/docs/safeguards/synthid).
Image generation may not be available in all regions and countries, review our [Gemini models](/gemini-api/docs/models#gemini-2.0-flash-preview-image-generation) page for more information.
**Note:** You can also generate images with [Imagen](/gemini-api/docs/imagen), our specialized image generation model. See the When to use Imagen section for details on how to choose between Gemini and Imagen.
## Image generation (text-to-image)
The following code demonstrates how to generate an image based on a descriptive prompt. You must include `responseModalities`: `["TEXT", "IMAGE"]` in your configuration. Image-only output is not supported with these models.
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64
client = genai.Client()
contents = ('Hi, can you create a 3d rendered image of a pig '
'with wings and a top hat flying over a happy '
'futuristic scifi city with lots of greenery?')
response = client.models.generate_content(
model="gemini-2.0-flash-preview-image-generation",
contents=contents,
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE']
)
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO((part.inline_data.data)))
image.save('gemini-native-image.png')
image.show()
 AI-generated image of a fantastical flying pig
## Image editing (text-and-image-to-image)
To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the [image input](/gemini-api/docs/image-understanding#image-input) section.
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import PIL.Image
image = PIL.Image.open('/path/to/image.png')
client = genai.Client()
text_input = ('Hi, This is a picture of me.'
'Can you add a llama next to me?',)
response = client.models.generate_content(
model="gemini-2.0-flash-preview-image-generation",
contents=[text_input, image],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE']
)
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO((part.inline_data.data)))
image.show()
## Other image generation modes
Gemini supports other image interaction modes based on prompt structure and context, including:
* **Text to image(s) and text (interleaved):** Outputs images with related text.
* Example prompt: "Generate an illustrated recipe for a paella."
* **Image(s) and text to image(s) and text (interleaved)** : Uses input images and text to create new related images and text.
* Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?"
* **Multi-turn image editing (chat):** Keep generating / editing images conversationally.
* Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow."
## Limitations
* For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
* Image generation does not support audio or video inputs.
* Image generation may not always trigger:
* The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image").
* The model may stop generating partway through. Try again or try a different prompt.
* When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
* There are some regions/countries where Image generation is not available. See [Models](/gemini-api/docs/models) for more information.
## When to use Imagen
In addition to using Gemini's built-in image generation capabilities, you can also access [Imagen](/gemini-api/docs/imagen), our specialized image generation model, through the Gemini API.
Choose **Gemini** when:
* You need contextually relevant images that leverage world knowledge and reasoning.
* Seamlessly blending text and images is important.
* You want accurate visuals embedded within long text sequences.
* You want to edit images conversationally while maintaining context.
Choose **Imagen** when:
* Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities.
* Performing specialized editing tasks like product background updates or image upscaling.
* Infusing branding, style, or generating logos and product designs.
Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time.
## What's next
* Check out the [Veo guide](/gemini-api/docs/video) to learn how to generate videos with the Gemini API.
* To learn more about Gemini models, see [Gemini models](/gemini-api/docs/models/gemini) and [Experimental models](/gemini-api/docs/models/experimental-models).
|