In April 2022 DALL-E2 was dropped. Followed by Imagen, Parti, Midjourney, and Stable Diffusion. OpenAI’s DALL-E2 signed up a million users in just 2.5 months. In a cascading race, Meta announced its “Make-a-Video” program, and Google soon Phenaki. The entire art creators industry was given a systemic electroshock provoking ethical convulsions and a sustainability coma.
In October 2022 another set of collaborative text-to-video tools were released, and in April 2023 the video-to-video Runway came to iOS devices. Ever since, creative multi-modal combinations are proliferating at high-speed. Disarmingly easy and fast to produce, images and videos are the results of effortless multi-prompting. The outputs depict an overproduction of fantastic, gothic and dystopian imagery on the one hand, and a disastrous carbon emission and energy consumption on the other hand. Both evolving at exponential rates.
The Gothic Imagery Boom
Over the past two years, an astronomical amount of genAI produced imagery has been generated. We are not addressing the future content disaster of 90% of content on the internet being artificially generated within a few years by bots or AI systems, nor copyright infringements, but we rather turn to at the type of images and movies fabricated by AI digital arts tool.
By 2025 Gartner 2022 estimates “that 10% of all data produced, and 30% of all outbound marketing messages from large brands, will be from Generative AI.”
And numbers are mind-blowing: the AI-generating game tool Scenario produces 100,000 images daily using generative AI on AWS (Amazon). The AI image generator Craiyon is generating over 10 million images a day. These are only two randomly chosen examples…
And, the synthetic content production is shockingly cost-effective: $0.03 to generate an AI image using DALL-E2, in seconds.
Beyond this fast-paced productivity boom, we decided to concentrate on the kind of images produced.
Upon research we found a net proliferation of sci-fi, surrealist, gothic, cyberpunk, ethereal, futuristic , otherworldly, hence dystopian, apocalyptic and post-apocalyptic imagery.
To track the popular genAI images we analyzed the most popular prompt categories in order to identify leading digital imagery categories.
In this context it is important to recall that Midjourney’s algorithm uses a process called diffusion models. These models work by adding random “noise” to training data, and then learn to recover the data by removing this noise. The model repeats this process until it has an image that matches the prompt.
This is different from large language models such as ChatGPT. GPT models are transformer models trained on unlabelled text data, which is analyze to identify language patterns and produce “plausible” human-like responses to the prompts.
This distinction is important to understand the massive production of images or videos using generative AI tools, and its energy consumption and carbon footprint.
In prompts we trust?
To identify categories and types of the zillions of images produced we decided to turn to prompts. Since prompts are key to this type of digital art generation.
Running on two separate nets where the first trained neural network pairs an image with text, a the second network generates an image that matches with the prompt.
Output images span a wide range of imaginary styles, from naturalistic, hyperreal, surreal, fantasy, fantastic, gothic to dystopian, futuristic, apocalyptic or alien. The first proof is to be found in featured headlines on websites and in the mainstream press. Qualitatives range from “terrifying and insane to weird and mind-bending” – just hovering over a few.
- Which AI created the most terrifying art?
- The insane potential of AI-Generated Art and design.
- Fantasy, Victorian Goth. Weird wonderful art.
- Mind-bending art
Beyond showing a clear trend in genAI imagery production , we suspect it be more than a trend.
Would it be the expression, the mirroring of the ambiant Zeitgeist?
Discover by yourself the most popular prompts as featured by DALL-E2 or Midjourney.
Where imagery meets imagination: Expression of the Zeitgeist?
After several years of pandemic fears and restrictions, ongoing wars and a lingering climate crisis, the global mood of international societies seems to be worn out and showing signs of post-traumatic stress disorder. Living in a risk-bathed environment, people’s imaginary looks gloomier than ever. And the risk porous general societal mind is nurtured by risk enhancing technology, and its subsequent ubiquitous rhetoric.
The paper “Typology_of_Risks_of_Generative_Text-to-Image_Mode” by Charlotte BIRD, Eddie UNGLESS, and Atoosa KASIRZADEH outlines major innovations of genAI tools but equally points to dooming risks and harms associated with modern text-to-image generative models.
The authors developed a stunning “taxonomy of risks across six key stakeholder groups, and identified 22 distinct risk types, spanning issues from data bias to malicious use.”
The typology addresses ethical risks and harms grouped in three categories:
I. Discrimination and Exclusion;
II. Harmful Misuse;
III. Misinformation and Disinformation.
David Gunkel – author of the book, “The Machine Question: Critical Perspectives on AI, Robots and Ethics” (MIT 2012).
Of course, the apocalyptic rhetoric surrounding AI is nothing new. As David Gunkel states in his article “Apocalyptic rhetoric about AI distracts from more immediate, pressing concerns”
“The problem is not the impending ‘robot apocalypse’ that has been a staple of science fiction since the middle of the twentieth century, but the fact that we – and especially some of our leading scientists and technology innovators – understand and pose the challenge of AI in these rather stark and extreme terms.”
So the question remains: is the proliferation of a genre-specific imagery the result of a wider contamination and nudging of ubiquitous apocalyptic media reporting, or the expression of the ambient Zeitgeist?
The following table lists textual prompts comprising evocative and oriented adjectives
|S.No.||Midjourney Image Prompts|
|1.||“beautiful pale cyberpunk female with heavy black eyeliner, blue eyes, shaved side haircut, hyper detail, cinematic lighting, magic neon, dark red city –ar 9:16 –testp –upbeta”|
|2.||“earth reviving after human extinction, a new beginning, nature taking over buildings, animal kingdom, harmony, peace, earth balanced –version 3 –s 1250 –uplight –ar 4:3 –no text, blur”|
|3.||“photo of a ultra realistic sailing ship, dramatic light, pale sunrise, cinematic lighting, battered, low angle, trending on artstation, 4k, hyper realistic, focused, extreme details, unreal engine 5, cinematic, masterpiece, art by studio ghibli, intricate artwork by john william turner”|
|4.||“ethereal Bohemian Waxwing bird, Bombycilla garrulus :: intricate details, ornate, detailed illustration, octane render :: Johanna Rupprecht style, William Morris style :: trending on artstation –ar 9:16”|
|5.||“Create an image of a futuristic city skyline at night, with a focus on advanced transportation systems and towering skyscrapers. Incorporate elements such as drones, flying cars, and high-speed trains.”|
|6.||“Generate an image of a post-apocalyptic landscape, featuring a mix of natural and man-made elements. Include crumbling buildings, overgrown vegetation, and a sense of danger and abandonment.”|
|7.||“Create a surrealist image of a dreamlike forest, featuring twisted trees, floating islands, and otherworldly creatures. Use vibrant colors and an ethereal atmosphere to evoke a sense of mystery and wonder.”|
|8.||Generate an image of a cyberpunk-inspired metropolis, featuring neon lights, holographic advertisements, and a bustling, futuristic atmosphere. Incorporate elements such as augmented reality interfaces, advanced robotics, and cybernetic enhancements.”|
|9.||“space suit with boots, futuristic, character design, cinematic lightning, epic fantasy, hyper realistic, detail 8k –ar 9:16”|
|10.||“Create an image of a surreal underwater world, featuring strange sea creatures, coral reefs, and otherworldly landscapes. Use vibrant colors and an ethereal atmosphere to evoke a sense of mystery and wonder.”|
|11.||“Generate an image of a post-apocalyptic city, featuring crumbling buildings, overgrown vegetation, and a sense of danger and abandonment. Incorporate elements such as advanced robotics, cybernetic enhancements, and advanced transportation systems.”|
|12.||“Create an image of a futuristic city skyline at night, featuring advanced transportation systems, towering skyscrapers, and a sense of rapid technological advancement. Incorporate elements such as drones, flying cars, and high-speed trains.”|
|13.||“Generate an image of a surreal otherworldly landscape, featuring twisted trees, floating islands, and otherworldly creatures. Use vibrant colors and an ethereal atmosphere to evoke a sense of mystery and wonder.”|
|14.||“Create an image of a cyberpunk-inspired metropolis, featuring neon lights, holographic advertisements, and a bustling, futuristic atmosphere. Incorporate elements such as augmented reality interfaces, advanced robotics, and cybernetic enhancements.”|
|15.||“Generate an image of a post-apocalyptic landscape, featuring a mix of natural and man-made elements. Include crumbling buildings, overgrown vegetation, and a sense of danger and abandonment. Incorporate elements such as advanced robotics, cybernetic enhancements, and advanced transportation systems.”|
|“ossuary cemetary segmented shelves overgrown, graveyard, vertical shelves, zdzisław beksiński, hr giger, mystical occult symbol in real life, high detail, green fog –ar 9:16 –iw 1”|
The Rendering Energy Doom
Beyond the training energy-consumption there are the rendering and data storage energy consumption of multi-modal diffusion-based genAI arts tools gone mainstream. We here find undoubtedly both ethics and sustainability concerns.
Fact is that each image and video can have thousands of versions differentiated by only minimal colour, size, shape, tone and voice alterations. With maximal and exponential impacts on our ecosystem and carbon emission. This endless data production boom clearly lifts a red flag.
However when investigating the “green data” world it is important to underline the extreme difficulty to find publicly available data on energy consumption and carbon emission of diffusion-based models.
In a recent Wired article “The Generative AI Race Has a Dirty Secret” energy consumption and carbon emission were addressed for large language models like ChatGPT.
“Integrating large language models into search engines could mean a fivefold increase in computing power and huge carbon emissions.”
A third-party analysis by researchers estimated that the training of GPT-3, which ChatGPT is partly based on, led to emissions of more than 550 tons of carbon dioxide – the same amount as a single person taking 550 round trips between New York and San Francisco.
According to Sam Altman, CEO of Openai, a single prompting costs “probably single-digits cents” and in the worst case 0,09€/request. Which is three times the cost of its generation ($0.03).
One single ChatGPT request equates to 60 smartphone charges of 5Wh per charge.
Nevertheless most available calculations only focus on the prompting or training processes, and neglect the energy-intensive storage of data and mostly dark data in data centers.
In the HBR article “How to Make Generative AI Greener“, Ajay Kumar Tom Davenport explain how “generative AI models are generated by “hyperscale” (very large) cloud providers with thousands of servers that produce major carbon footprints” and how these data centers account for around 7% of Denmark’s and 2.8% of the United States’ electricity use.
And explain why art generating diffusion models are worse in energy consumption compared to large language models. Hardware being the reason, since diffusion models run on graphics processing unit (GPU) chips. “These require 10 -15 times the energy a traditional CPU needs”.
The reluctance of major players to disclose energy-sensitive data is well-known. Thus much opaqueness and black-boxing around those numbers.
Even if emission reports on DALL-E2 and Midjourney were extremely hard to find, we could locate one paper dealing explicitly with diffusion models.
“Climate Implications of Diffusion-based Generative Visual AI Systems and their Mass Adoption” by Vanessa Utz and Steve DiPaola, School of Interactive Arts and Technology (SIAT), Simon Fraser University, Canada.
According to the authors, four main genAI art tools, DallE-2, Midjourney, Stable Diffusion and Artbreeder alone produce over 20 million images per day (Kelly 2022; Pennington 2022).
With ad-hoc calculations on the energy consumption, by first concentrating on Stable Diffusion, and then extrapolating to several models. Supposing and underscoring their estimations to be high underestimations.
“Based on our assumption that the 10 million users of Stable Diffusion (as confirmed by StabilityAI) run the system for approximately 1.5 hours per day on a RTX 3090, will lead to a yearly energy consumption of approximately 1.92 TWh.” Stable Diffusion: This is similar to the total electricity consumption of the West African nation Mauritania in 2021
For five popular systems Stable Diffusion (including DreamStudio), Midjourney, DallE-2, LensaAI, and Dream have approximately 48.5 million users, the energy consumption would be the total electricity consumption of Kenya in
2021 was 9.1 TWh.
However there are limitations to these calculations since there is a huge lack of transparency when it comes to data made available to researchers and the general public.
Fact is that these models are in no regards whatsoever green. So can these power-hungry models be made more environmental-friendly?
Can eco-friendly images be produced?
There are different approaches to the treatment of images which impact the energy and carbon emission results. For example DALL-E 2’s diffusion model works on full-size images. Stable Diffusion, on the other hand, uses a technique called “latent diffusion”. This means Stable Diffusion requires less computing power to work. Unlike DALL-E 2, which runs on OpenAI’s powerful servers, Stable Diffusion can run on personal computers.
However making images more eco-friendly should not just be an one-sided responsibility of developers and vendors.
We all need to recognize that we are responsible too. The overconsumption by users needs to addressed and o be scrutinized. In this case each and everybody has to take on individual responsibility and contribute to reducing the data waste.