Chatm2

Sleeping

App Files Files Community

Chatm2 / documentation_gemini /image_generation_with_gemini.md

kuro223

5853bf1 about 2 months ago

preview code

raw

history blame

6.1 kB

	# Image generation with Gemini

	Source: <https://ai.google.dev/gemini-api/docs/image-generation>

	---

	Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. All generated images include a [SynthID watermark](/responsible/docs/safeguards/synthid).

	Image generation may not be available in all regions and countries, review our [Gemini models](/gemini-api/docs/models#gemini-2.0-flash-preview-image-generation) page for more information.

	Note: You can also generate images with [Imagen](/gemini-api/docs/imagen), our specialized image generation model. See the When to use Imagen section for details on how to choose between Gemini and Imagen.

	## Image generation (text-to-image)

	The following code demonstrates how to generate an image based on a descriptive prompt. You must include `responseModalities`: `["TEXT", "IMAGE"]` in your configuration. Image-only output is not supported with these models.


	from google import genai
	from google.genai import types
	from PIL import Image
	from io import BytesIO
	import base64

	client = genai.Client()

	contents = ('Hi, can you create a 3d rendered image of a pig '
	'with wings and a top hat flying over a happy '
	'futuristic scifi city with lots of greenery?')

	response = client.models.generate_content(
	model="gemini-2.0-flash-preview-image-generation",
	contents=contents,
	config=types.GenerateContentConfig(
	response_modalities=['TEXT', 'IMAGE']
	)
	)

	for part in response.candidates[0].content.parts:
	if part.text is not None:
	print(part.text)
	elif part.inline_data is not None:
	image = Image.open(BytesIO((part.inline_data.data)))
	image.save('gemini-native-image.png')
	image.show()


	![AI-generated image of a fantastical flying pig](/static/gemini-api/docs/images/flying-pig.png) AI-generated image of a fantastical flying pig

	## Image editing (text-and-image-to-image)

	To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the [image input](/gemini-api/docs/image-understanding#image-input) section.


	from google import genai
	from google.genai import types
	from PIL import Image
	from io import BytesIO

	import PIL.Image

	image = PIL.Image.open('/path/to/image.png')

	client = genai.Client()

	text_input = ('Hi, This is a picture of me.'
	'Can you add a llama next to me?',)

	response = client.models.generate_content(
	model="gemini-2.0-flash-preview-image-generation",
	contents=[text_input, image],
	config=types.GenerateContentConfig(
	response_modalities=['TEXT', 'IMAGE']
	)
	)

	for part in response.candidates[0].content.parts:
	if part.text is not None:
	print(part.text)
	elif part.inline_data is not None:
	image = Image.open(BytesIO((part.inline_data.data)))
	image.show()


	## Other image generation modes

	Gemini supports other image interaction modes based on prompt structure and context, including:

	* Text to image(s) and text (interleaved): Outputs images with related text.
	* Example prompt: "Generate an illustrated recipe for a paella."
	* Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text.
	* Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?"
	* Multi-turn image editing (chat): Keep generating / editing images conversationally.
	* Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow."



	## Limitations

	* For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
	* Image generation does not support audio or video inputs.
	* Image generation may not always trigger:
	* The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image").
	* The model may stop generating partway through. Try again or try a different prompt.
	* When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text.
	* There are some regions/countries where Image generation is not available. See [Models](/gemini-api/docs/models) for more information.



	## When to use Imagen

	In addition to using Gemini's built-in image generation capabilities, you can also access [Imagen](/gemini-api/docs/imagen), our specialized image generation model, through the Gemini API.

	Choose Gemini when:

	* You need contextually relevant images that leverage world knowledge and reasoning.
	* Seamlessly blending text and images is important.
	* You want accurate visuals embedded within long text sequences.
	* You want to edit images conversationally while maintaining context.



	Choose Imagen when:

	* Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities.
	* Performing specialized editing tasks like product background updates or image upscaling.
	* Infusing branding, style, or generating logos and product designs.



	Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time.

	## What's next

	* Check out the [Veo guide](/gemini-api/docs/video) to learn how to generate videos with the Gemini API.
	* To learn more about Gemini models, see [Gemini models](/gemini-api/docs/models/gemini) and [Experimental models](/gemini-api/docs/models/experimental-models).