OpenAI releases image generation in the API

2 days ago/304 comments/openai.com

When this was up yesterday I complained that the refusal rate was super high especially on government and military shaped tasks, and that this would only push contractors to use CN-developed open source models for work that could then be compromised.

Today I'm discovering there is a tier of API access with virtually no content moderation available to companies working in that space. I have no idea how to go about requesting that tier of access, but have spoken to 4 different defense contractors in the last day who seem to already be using it.

2 days ago by vasco

Turns out AI alignment just means "align to the customer current subscription plan", and not protecting the world. Classic.

2 days ago by ben_w

"Alignment with who?" has always been a problem. An AI is a proxy for a reward function, a reward function is a proxy for what the coder was trying to express, what the coder was trying to express is a proxy for what the PM put on the ticket, what the PM put on the ticket is a proxy for what the CEO said, what the CEO said is a proxy for shareholder interests, shareholder interests are a proxy for economic growth, economic growth is a proxy for government interests.

("There was an old lady who swallowed a fly, …")

Each of those proxies can have an alignment failure with the adjacent level(s).

And RLHF involves training one AI to learn human preferences, as a proxy for what "good" is, in order to be the reward function that trains the actual LLM (or other model, but I've only heard of RLHF being used to train LLMs)

2 days ago by babyent

Ethics “concerns” from for-profit companies is 100% marketing and 0% real.

Do people actually fall for these lol? Yes they do and it works to raise interest and get additional funding.

2 days ago by bilbo0s

More accurate to call it “alignment for plebes and not for the masters of the plebes”. Which I think we all kind of expect coming from the leaders of our society. That’s the way human societies have always worked.

I’m sure access to military grade tech is only one small slice in the set of advantages the masters get over the mastered in any human society.

2 days ago by wahnfrieden

That’s ahistorical see Dawn of Humanity for rebuttal to naturalness of imposed hierarchy

2 days ago by sebzim4500

I mean, obviously? AI alignment has always meant alignment with the creator of the model.

Trying to align OpenAI etc. with the rest of humanity is a completely different problem.

2 days ago by consumer451

I've always thought that if a corporate lab achieves AGI and it starts spitting out crazy ideas such as "corporations should be taxed," we won't be hearing about AGI for a while longer due to "alignment issues."

2 days ago by refulgentis

It's "tier 5", I've had an account since the 3.0 days so I can't be positive I'm not grandfathered in, but, my understanding is as long as you have a non-trivial amount of spend for a few months you'll have that access.

(fwiw for anyone curious how to implement it, it's the 'moderation' parameter in the JSON request you'll send, I missed it for a few hours because it wasn't in Dalle-3)

2 days ago by dunkmaster

API shows either auto or low available. Is there another secret value with even lower restrictions?

2 days ago by refulgentis

Not that I know of.

I just took any indication that the parent post meant absolute zero moderation as them being a bit loose with their words and excitable with how they understand things, there were some signs:

1. it's unlikely they completed an API integration quickly enough to have an opinion on military / defense image generation moderation yesterday, so they're almost certainly speaking about ChatGPT. (this is additionally confirmed by image generation requiring tier 5 anyway, which they would have been aware of if they had integrated)

2. The military / defense use cases for image generation are not provided (and the steelman'd version in other comments is nonsensical, i.e. we can quickly validate you can still generate kanban boards or wireframes of ships)

3. The poster passively disclaims being in military / defense themself (grep "in that space")

4. it is hard to envision cases of #2 that do not require universal moderation for OpenAI's sake, i.e. lets say their thought process is along the lines of: defense/military ~= what I think of as CIA ~= black ops ~= image manipulation on social media, thus, the time I said "please edit this photo of the ayatollah to have him eating pig and say I hate allah" means its overmoderated for defense use cases

5. It's unlikely openai wants to be anywhere near PR resulting from #4. Assuming there is a super secret defense tier that allows this, it's at the very least, unlikely that the poster's defense contractor friends were blabbing about about the exclusive completely unmoderated access they had, to the poster, within hours of release. They're pretty serious about that secrecy stuff!

6. It is unlikely the lack of ability to generate images using GPT Image 1 would drive the military to Chinese models (there aren't Chinese LLMs that do this! even if they were, there's plenty of good ol' American diffusion models!)

2 days ago by samtp

What's a good use case for a defense contractor to generate AI images besides to include in presentations?

2 days ago by aigen000

Fabricating evidence of weapons of mass destruction in some developing nation.

I kid, more real world use cases would be for concept images for a new product or marketing campaigns.

2 days ago by toasteros

...you can do that with a pencil, though.

What an impossibly weird thing to "need" an LLM for.

2 days ago by subroutine

Think of all the trivial ways an image generator could be used in business, and there is likely a similar use-case among the DoD and its contractors (e.g. create a cartoon image of a ship for a naval training aid; make a data dashboard wireframe concept for a decision aid).

2 days ago by golergka

Input one image of a known military installation and one civilian building. Prompt to generate a similar _civilian_ building, but resembling that military installation in some way: similar structure, similar colors, similar lighting.

Then include this image in a dataset of another net with marker "civilian". Train that new neural net better so that it does lower false positive rate when asked "is this target military".

2 days ago by aprilthird2021

You'll never get promoted thinking like that! Mark them all "military", munitions sales will soar!

2 days ago by cuuupid

The very simple use case is generating mock targets. In movies they make it seem like they use mannequin style targets or traditional concentric circles but those are infeasible and unrealistic respectively. There's an entire modeling industry here and being able to replace that with infinitely diverse AI-generated targets is valuable!

2 days ago by throwaway314155

> 4 different defense contractors in the last day

Now I'm just wondering what the hell defense contractors need image generation for that isn't obviously horrifying...

2 days ago by Aeolun

“Generate me a crowd of civilians with one terrorist in.”

“Please move them to some desert, not the empire state building.”

“The civilians are supposed to have turbans, not ballcaps.”

2 days ago by ziml77

That's very outdated, they're absolutely supposed to be at the Empire State Building with baseball caps now. See: ICE arrests and Trump's comment on needing more El Salvadoran prison space for "the homegrowns"

2 days ago by vFunct

Show me a tunnel underneath a building in the desert filled with small arms weapons with a poster on the wall with a map of the United States and a label written with sharpie saying “Bad guys here”. Also add various Arabic lettering on the weapons.

2 days ago by undefined

[deleted]

2 days ago by renewiltord

They make presentations. Most of their work is presentations with diagrams. Icons.

2 days ago by johnyzee

I wanted to try this in the image playground, but I was told I have to add a payment method. When adding this, I was told I would also have to pay a minimum of $5. Did this. Then when trying to generate an image, I was told I would have to do "verification" of my organization (?). OK, I chose 'personal'. I was then told I have to complete the verification though some third party partner of OpenAI, which included giving permission to process my biometric information. Yeah, I don't want to try this that bad, but now I already paid you and have to struggle to figure out how to get my money back. Horrible UX.

2 days ago by rideontime

Chargeback. Yes, this may result in your being banned from purchasing any OpenAI services in the future; I would see this as an added benefit to prevent making the same mistake again.

2 days ago by vizzah

Be aware that OpenAI API credits expire after a year. I've added $5 year ago expecting to use the API, but only consumed $.02 or something. The API started throwing out "Too many requests" HTTP error when I needed it again and ooops!.. there were nothing left. All credit has gone.

Wouldn't have expected that from a honest player.

2 days ago by funwares

Big thanks for the heads up, I had no idea about this.

It looks like I will not be able to get any prepaid money back [0] so I will be careful not to put any further money on it.

I guess I better start using some of the more expensive APIs to make it worth the $20 I prepaid.

[0] https://openai.com/policies/service-credit-terms/

4. "All sales of Services, including sales of prepaid Services, are final. Service Credits are not refundable and expire one year after the date of purchase or issuance if not used, unless otherwise specified at the time of purchase."

2 days ago by tezza

For the curious I generated the same prompt for each of the quality types. ‘Auto’, ‘low’, ‘medium’, ‘high’.

Prompt: “a cute dog hugs a cute cat”

https://x.com/terrylurie/status/1915161141489136095

I also then showed a couple of DALL:E 3 images for comparison in a comment

2 days ago by echelon

> a cute dog hugs a cute cat

This prompt is best served by Midjourney, Flux, Stable Diffusion. It'll be far cheaper, and chances are it'll also look a lot better.

The place where gpt-image-1 shines if if you want to do a prompt like:

"a cute dog hugs a cute cat, they're both standing on top of an algebra equation (y=$2x^{2}-3x-2$). Use the first reference image I uploaded as a source for the style of the dog. Same breed, same markings. The cat can contrast in fur color. Use the second reference image I uploaded as a guide for the background, but change the lighting to sunset. Also, solve the equation for x."

gpt-image-1 doesn't make the best images, and it isn't cheap, and it isn't fast, but it's incredibly -- almost insanely -- powerful. It feels like ComfyUI got packed up into an LLM and provided as a natural language service.

2 days ago by stavros

I wonder if we can use gpt-image-1 outputs, with some noise, as inputs to diffusion models, so GPT takes care of adherence and the diffusion model improves the quality. Does anyone know whether that's at all possible?

2 days ago by AuryGlenz

Sure. I suppose with API support 3 hours ago someone probably made a Comfy node all of 2 hours ago. From there you can either just do a low denoise or use one of the many IP-Adapter type things out there.

2 days ago by levzzz

yes it's what a lot of people have been doing with newer models which have better prompt adherence, passing them through older models with better aesthetics

2 days ago by MoonGhost

Not bad. Photo forums will be soon full of them. Slightly edited to remove metadata and make them look like human made.

2 days ago by latexr

> the same prompt for each of the quality types. ‘Auto’, ‘low’, ‘medium’, ‘high’.

“Auto” is just whatever the best quality is for a model. So in this case it’s the same as “high”.

2 days ago by whywhywhywhy

Crazy even photos have the OpenAI yellow color grade

2 days ago by alasano

I built a local playground for it if anyone is interested (your openai org needs to be verified btw..)

https://github.com/Alasano/gpt-image-1-playground

Openai's Playground doesn't expose all the API options.

Mine covers all options, has built in mask creation and cost tracking as well.

2 days ago by film42

I generated 5 images in the playground. One using a text-only prompt and 4 using images from my phone. I spent $0.85 which isn't bad for a fun round of Studio Ghibli portraits for the family group chat, but too expensive to be used in a customer facing product.

2 days ago by sumedh

> but too expensive to be used in a customer facing product.

Enhance headshots for putting on Linkedin.

2 days ago by salomonk_mur

It doesn't keep facial details in the generation. The generated person resembles you but is definitely not you.

2 days ago by anshumankmr

Yeah its very eerie. Though sometimes its very close, like dangerously I feel, I tried once myself and the background was unrealistic (the prompt was me giving a keynote speech for a vision board ) but I looked like... me.

2 days ago by bamboozled

Can't wait to meet people in person who look nothing like their profile pictures on linkedin :)

2 days ago by martin_a

I already did. Looked in the mirror just an hour ago. Strange guy, very tired, never seen him before.

2 days ago by BOOSTERHIDROGEN

is it good?

2 days ago by stavros

No, it can't do detail well, AFAIK the images are produced at a lower resolution and then upscaled. This might be specific to the ChatGPT version, however, for cost cutting.

2 days ago by Imnimo

I'm curious what the applications are where people need to generate hundreds or thousands of these images. I like making Ghibli-esque versions of family photos as much as the next person, but I don't need to make them in volume. As far as I can recall, every time I've used image generation, it's been one-off things that I'm happy to do in the ChatGPT UI.

2 days ago by minimaxir

As usual for AI startups nowadays, using this API you can create a downstream wrapper for image generation with bespoke prompts.

A pro/con of the multimodal image generation approach (with an actually good text encoder) is that it rewards intense prompt engineering moreso than others, and if there is a use case that can generate more than $0.17/image in revenue, that's positive marginal profit.

2 days ago by theptip

An obvious one is for video games, interactive fiction, that sort of thing. AI dungeon with visuals could be pretty interesting.

2 days ago by brian-armstrong

It's too expensive for that unless you had a pretty generous subscription fee. I think local models are probably best suited for gaming where a decent GPU is already likely present.

2 days ago by theptip

I think there is a niche for both. Local LLMs are orders of magnitude smaller, so you could imagine cloud bursting for the difficult/important work like generating character portraits.

That said it’ll be 10-20x cheaper in a year at which point I don’t think you care about price for this workflow in 2D games.

2 days ago by austhrow743

I use the api because i don’t use chatgpt enough to justify the cost of their UI offering.

2 days ago by reducemore

I’ve built a daily image-based puzzle that’s fully automated, and have been using flux to generate images. I’ve found sometimes they’re just not good enough, so have been doing some manual curation. But, with this new API, I’ll see if it can run by itself again.

3 days ago by minimaxir

Pricing-wise, this API is going to be hard to justify the value unless you really can get value out of providing references. A generated `medium` 1024x1024 is $0.04/image, which is in the same cost class as Imagen 3 and Flux 1.1 Pro. Testing from their new playground (https://platform.openai.com/playground/images), the medium images are indeed lower quality than either of of two competitor models and still takes 15+ seconds to generate: https://x.com/minimaxir/status/1915114021466017830

Prompting the model is also substantially more different and difficult than traditional models, unsurprisingly given the way the model works. The traditional image tricks don't work out-of-the-box and I'm struggling to get something that works without significant prompt augmentation (which is what I suspect was used for the ChatGPT image generations)

2 days ago by raincole

ChatGPT's prompt adherence is light years ahead of all the others. I won't even call Flux/Midjoueny its competitors. ChatGPT image gen is practically a one-of-its-kind unique product on the market: the only usable AI image editor for people without image editing experience.

I think in terms of image generation, ChatGPT is the biggest leap since Stable Diffusion's release. LoRA/ControlNet/Flux are forgettable in comparison.

2 days ago by thegeomaster

Well, there's also gemini-2.0-flash-exp-image-generation. Also autoregressive/transfusion based.

2 days ago by Yiling-J

gemini-2.0-flash-exp-image-generation doesn’t perform as well as GPT-4o's image generation, as mentioned in section 5.1 of this paper: https://arxiv.org/pdf/2504.02782. However based on my test, for certain types of images such as realistic recipe images, the results are quite good. You can see some examples here: https://github.com/Yiling-J/tablepilot/tree/main/examples/10...

2 days ago by thefourthchime

Such a good name....

2 days ago by raincole

It's quite bad now, but I have no doubt that Google will catch up.

The AI field looks awfully like {OpenAI, Google, The Irrelevent}.

2 days ago by yousif_123123

It's also good but clearly not close still. Maybe Gemini 2.5 or 3 will have better image gen.

2 days ago by echelon

I'd go out on a limb and say that even your praise of gpt-image-1 is underselling its true potential. This model is as remarkable as when ChatGPT first entered the market. People are sleeping on its capabilities. It's a replacement for ComfyUI and potentially most of Adobe in time.

Now for the bad part: I don't think Black Forest Labs, StabilityAI, MidJourney, or any of the others can compete with this. They probably don't have the money to train something this large and sophisticated. We might be stuck with OpenAI and Google (soon) for providing advanced multimodal image models.

Maybe we'll get lucky and one of the large Chinese tech companies will drop a model with this power. But I doubt it.

This might be the first OpenAI product with an extreme moat.

2 days ago by raincole

> Now for the bad part: I don't think Black Forest Labs, StabilityAI, MidJourney, or any of the others can compete with this.

Yeah. I'm a tad sad about it. I once thought the SD ecosystem proves open-source won when it comes to image gen (a naive idea, I know). It turns out big corps won hard in this regard.

2 days ago by soared

This is a take so incredulous it doesn’t seem credible.

2 days ago by stavros

I can confirm, ChatGPT's prompt adherence is so incredibly good, it gets even really small details right, to a level that diffusion-based generators couldn't even dream of.

2 days ago by mediaman

It is correct, the shift from diffusion to transformers is a very, very big difference.

2 days ago by abhpro

Also chiming in to say you're wrong, I mean they're correct

2 days ago by tacoooooooo

its 100% the correct take

2 days ago by adamhowell

So, I've long dreamed of building an AI-powered https://iconfinder.com.

I started Accomplice v1 back in 2021 with this goal in mind and raised some VC money but it was too early.

Now, with these latest imagen-3.0-generate-002 (Gemini) and gpt-image-1 (OpenAI) models – especially this API release from OpenAI – I've been able to resurrect Accomplice as a little side project.

Accomplice v2 (https://accomplice.ai) is just getting started back up again – I honestly decided to rebuild it only a couple weeks ago in preparation for today once I saw ChatGPT's new image model – but so far 1,000s of free to download PNGs (and any SVGs that have already been vectorized are free too (costs a credit to vectorize)).

I generate new icons every few minutes from a huge list of "useful icons" I've built. Will be 100% pay-as-you-go. And for a credit, paid users can vectorize any PNGs they like, tweak them using AI, upload their own images to vectorize and download, or create their own icons (with my prompt injections baked in to get you good icon results)

Do multi-modal models make something like this obsolete? I honestly am not sure. In my experience with Accomplice v1, a lot of users didn't know what to do with a blank textarea, so the thinking here is there's value in doing some of the work for them upfront with a large searchable archive. Would love to hear others' thoughts.

But I'm having fun again either way.

2 days ago by stavros

That looks interesting, but I don't know how useful single icons can be. For me, the really useful part would be to get a suite of icons that all have a consistent visual style. Bonus points if I can prompt the model to generate more icons with that same style.

2 days ago by throwup238

Recraft has a style feature where you give some images. I wonder if that would work for icons. You can also try giving an image of a bunch of icons to ChatGPT and have it generate more, then vectorize them.

2 days ago by egypturnash

[flagged]

2 days ago by tough

It seems to me like this is a new hybrid product for -vibe coders- beacuse otherwise the -wrapping- of prompting/improving a prompt with an LLM before hitting the text2image model can certainly be done as you say cheaper if you just run it yourself.

maybe OpenAI thinks model business is over and they need to start sherlocking all the way from the top to final apps (Thus their interest on buying out cursor, finally ending up with windsurf)

Idk this feels like a new offering between a full raw API and a final product where you abstract some of it for a few cents, and they're basically bundling their SOTA llm models with their image models for extra margin

2 days ago by vineyardmike

> It seems to me like this is a new hybrid product for -vibe coders- beacuse otherwise the -wrapping- of prompting/improving a prompt with an LLM before hitting the text2image model can certainly be done as you say cheaper if you just run it yourself.

In case you didn’t know, it’s not just wrapping in an LLM. The image model they’re referencing is a model that’s directly integrated into the LLM for functionality. It’s not possible to extract, because the LLM outputs tokens which are part of the image itself.

That said, they’re definitely trying to focus on building products over raw models now. They want to be a consumer subscription instead of commodity model provider.

2 days ago by tough

Right! I forgot the new model was a multi-modal one generating image outputs from both image and text inputs, i guess this is good and price will come down eventually.

waiting for some FOSS multi-modal model to come out eventually too

great to see openAI expanding into making actual usable products i guess

2 days ago by spilldahill

yeah, the integration is the real shift here. by embedding image generation into the LLM’s token stream, it’s no longer a pipeline of separate systems but a single unified model interface. that unlocks new use cases where you can reason, plan, and render all in one flow. it’s not just about replacing diffusion models, it’s about making generation part of a broader agentic loop. pricing will drop over time, but the shift in how you build with this is the more interesting part.

2 days ago by furyofantares

I find prompting the model substantially easier than traditional models, is it really more difficult or are you just used to traditional models?

I suspect what I'll do with the API is iterate at medium quality and then generate a high quality image when I'm done.

2 days ago by badmonster

Usage of gpt-image-1 is priced per token, with separate pricing for text and image tokens:

Text input tokens (prompt text): $5 per 1M tokens Image input tokens (input images): $10 per 1M tokens Image output tokens (generated images): $40 per 1M tokens

In practice, this translates to roughly $0.02, $0.07, and $0.19 per generated image for low, medium, and high-quality square images, respectively.

that's a bit pricy for a startup.

2 days ago by m4thfr34k

Isn't there also a cost per image? The pricing page shows $0.25 for a high quality 1536x1024 image. 25 cents per image is ... steep lol

2 days ago by BoorishBears

Cost per image is based on output tokens (because they're output tokens)

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.

Home About GitHub Kaggle

AI Blog Deep Learning Apps Security Checklist

Bookmarks Hacker News My Stack