Chatm2 / documentation_gemini /image_generation_with_gemini.md
kuro223's picture
21
5853bf1
|
raw
history blame
6.1 kB
# Image generation with Gemini
Source: <https://ai.google.dev/gemini-api/docs/image-generation>
---
Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. All generated images include a [SynthID watermark](/responsible/docs/safeguards/synthid).
Image generation may not be available in all regions and countries, review our [Gemini models](/gemini-api/docs/models#gemini-2.0-flash-preview-image-generation) page for more information.
**Note:** You can also generate images with [Imagen](/gemini-api/docs/imagen), our specialized image generation model. See the When to use Imagen section for details on how to choose between Gemini and Imagen.
## Image generation (text-to-image)
The following code demonstrates how to generate an image based on a descriptive prompt. You must include `responseModalities`: `["TEXT", "IMAGE"]` in your configuration. Image-only output is not supported with these models.
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64
client = genai.Client()
contents = ('Hi, can you create a 3d rendered image of a pig '
'with wings and a top hat flying over a happy '
'futuristic scifi city with lots of greenery?')
response = client.models.generate_content(
model="gemini-2.0-flash-preview-image-generation",
contents=contents,
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE']
)
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO((part.inline_data.data)))
image.save('gemini-native-image.png')
image.show()
![AI-generated image of a fantastical flying pig](/static/gemini-api/docs/images/flying-pig.png) AI-generated image of a fantastical flying pig
## Image editing (text-and-image-to-image)
To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the [image input](/gemini-api/docs/image-understanding#image-input) section.
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import PIL.Image
image = PIL.Image.open('/path/to/image.png')
client = genai.Client()
text_input = ('Hi, This is a picture of me.'
'Can you add a llama next to me?',)
response = client.models.generate_content(
model="gemini-2.0-flash-preview-image-generation",
contents=[text_input, image],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE']
)
)
for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO((part.inline_data.data)))
image.show()
## Other image generation modes
Gemini supports other image interaction modes based on prompt structure and context, including:
* **Text to image(s) and text (interleaved):** Outputs images with related text.
* Example prompt: "Generate an illustrated recipe for a paella."
* **Image(s) and text to image(s) and text (interleaved)** : Uses input images and text to create new related images and text.
* Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?"
* **Multi-turn image editing (chat):** Keep generating / editing images conversationally.
* Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow."
## Limitations
* For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
* Image generation does not support audio or video inputs.
* Image generation may not always trigger:
* The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image").
* The model may stop generating partway through. Try again or try a different prompt.
* When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
* There are some regions/countries where Image generation is not available. See [Models](/gemini-api/docs/models) for more information.
## When to use Imagen
In addition to using Gemini's built-in image generation capabilities, you can also access [Imagen](/gemini-api/docs/imagen), our specialized image generation model, through the Gemini API.
Choose **Gemini** when:
* You need contextually relevant images that leverage world knowledge and reasoning.
* Seamlessly blending text and images is important.
* You want accurate visuals embedded within long text sequences.
* You want to edit images conversationally while maintaining context.
Choose **Imagen** when:
* Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities.
* Performing specialized editing tasks like product background updates or image upscaling.
* Infusing branding, style, or generating logos and product designs.
Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time.
## What's next
* Check out the [Veo guide](/gemini-api/docs/video) to learn how to generate videos with the Gemini API.
* To learn more about Gemini models, see [Gemini models](/gemini-api/docs/models/gemini) and [Experimental models](/gemini-api/docs/models/experimental-models).