Getting Creative with AI - By Elizabeth Shubov
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Getting Creative with AI By Elizabeth Shubov Advances in artificial intelligence (AI) have prompted innovations in fields ranging from DNA sequencing to chess. Recently, much commentary has focused on progress in AI- driven artistic creation. AI can now produce photo-realistic images capable of bending reality and artistic content impossible to distinguish from human-created works. Soon, we may find that AI has created or inspired much of the content we consume, including news stories, literature, and even music. These innovations, which warrant both excitement and trepidation, affirm that technologies are not just engines of productivity for rote tasks but have the power to expand our creative and problem-solving capacities – and literally to depict and “see” things in new ways. To understand the technology behind the state of AI image art today, and learn something about its viability, I set out to create my own AI-powered images. I’ve documented my experience and tried to explain the technology in simple terms to break down the technological barriers. OpenAI took the internet by storm a few months ago releasing their groundbreaking art-generating model called DALLE-2. Last week, Google released Imagen, a model they claim is even more advanced. Both models produce photo-realistic images with little to no evidence of machine creation. Where prior models typically had trouble forming things like human and animal faces, these new models are a profound technological leap forward. This image was created using DALLE- 2 with the text prompt “A Shibu Inu dog wearing a beret and black turtleneck.” The features of the dog’s face are remarkably realistic and uniform. There is no smudging or smearing around the transitional areas, as was typical with earlier AI art. One could easily assume this is a real dog - that is, unless you have ever tried to take a picture of a real dog in a hat.
Not to be outdone, Google released the image below last week showing a sunglass- wearing, bike-riding dog generated from a text prompt. Since Imagen has not been released broadly yet it is unclear how consistent the results are, but the images are quite impressive, nonetheless. These images are so realistic, powerful, and intriguing that, for some, they stoke fears over the ways AI technology might replace human image artists, confuse a consuming public about what is “real” content and what is “synthetic” content, or lead to new and immersive ways for people to exploit and harm each other. All of these are critical questions. For purposes of this piece, however, I am focused on how the technology works and how to work with it. From there, we can discuss policy and governance, but playing with it and learning how it works was an important journey too, with important lessons. How Does AI Generate Image Art? There are several different ways AI can generate image art. Much of the AI image art available over the last five years has been generated using deep learning systems called generative adversarial networks (GANs). In this type of deep learning network, the system is trained using competing AI that learns from each other. In simple terms, using real image data fed into a system, a “generator” and a “discriminator” work together to create new images based on either user input, random input, or input generated using language models such as GPT-3. As the training process continues, AI learns to discriminate against images of poor quality or inaccurate subject matter. The networks take negative feedback from the bad images and positive feedback from the better ones and continue to learn from this process to create higher fidelity images. Moving beyond GANs, popular AI image art algorithms such as DALLE-2, Imagen, Disco Diffusion, and Midjourney are diffusion model deep learning networks. In a diffusion model, AI is provided an image, and noise or static is added until the image is unrecognizable. The training network takes the image and tries to remove the noise to _____________________________________________________________________________________ Getting Creative with AI 2
recreate the original image. Through each step of the process, it learns how to create more cohesive images. These models are typically slower due to the many steps in the denoising process but are promising in the stability of the models and ability to produce ultra-realistic images. Along with these different types of neural networks, deep learning systems used to classify images may be used to double-check that the image output matches the original text prompt. Guided by natural language processing (NLP), one such network referred to as CLIP (or Contrastive Language-Image Pre-Training) acts as a bridge between text and images. CLIP “views” the image, compares it to other images, comes up with descriptors for the image, and then compares that to the original text input used to create the image. If they do not match, the system will use this negative feedback to try to reform the image until the image matches the text or discard errant images. The results of these combined neural networks are mind-blowing machine-created images such as those released by OpenAI showing a raccoon astronaut and now Twitter-famous avocado armchair. _____________________________________________________________________________________ Getting Creative with AI 3
Images similarly created and released by Google include a raccoon in a space helmet and a cactus wearing sunglasses in the desert. Accessing and Generating AI Art Some larger models that create hyper-realistic images are restricted in access, citing proprietary issues and concerns over the generation of salacious or malicious content. OpenAI, for example, states in their content policy that it is an experimental research platform and that the images and content created must remain G-rated, non-deceptive, not targeted at individuals, and not used for commercial purposes. (Cue questions about how well those policies can be enforced and where the line will fall between good clean fun and harm and deception; when and how consumers should be informed when they are being served AI-generated images; how AI-generated images may be IP-protected (or not); when and how consumers can opt into or out of more challenging content; or myriad other challenges.) Other image-generating algorithms are widely available online either through websites that charge a small platform fee or in open-source repositories. Varying degrees of coding and math knowledge are necessary depending on the type of generator one is using. However, on some sites, one can create images based solely on text input and setting the number of iterations to evolve the images. _____________________________________________________________________________________ Getting Creative with AI 4
Exploring Creative Potential Using AI In January, I started experimenting with AI-generated images in order to create art. Some of the images I created using a GAN-based model known as VQGAN+CLIP are up in a public gallery on Spatial.io, which you can check out on any device or using a VR headset. Since it was shortly before Valentine’s Day, I used the heart theme to learn about the creation process. Most of the early images went directly into the recycling bin. However, in time, I discovered how to work with algorithms to produce better art. As illustrated by the differences in the images below, the order, type, and descriptors used made a huge difference in the quality of the work produced. This is one of the earlier images with the prompt “valentine overjoyed.” While it is interesting, it lacks cohesion, is disjointed, and does not look very polished. This one using the prompt “broken heart” is fractured and includes both valentine and real anatomical hearts. _____________________________________________________________________________________ Getting Creative with AI 5
In this later image I used the prompt “robotic long stem rose coming out of the road with an oil painting in steampunk engine.” The image has a steampunk feel and you can identify a rose in the image with its robotic counterpart. Improvements are also noticeable in “abstract lotus as a deep neural network in the water with bright colors.” And “fireworks over a boat on the ocean with northern lights.” _____________________________________________________________________________________ Getting Creative with AI 6
Using the prompt “earth on fire in Unreal Engine 3D shading shadow depth” you see more photorealistic molten lava burning on the earth in the foreground and an earthly room in the background. AI is trained on databases of images and styles, so incorporating those into the keywords drastically improved the outputs. Through the process of learning to work with AI to create recognizable and cohesive images, I found that my idea of creativity changed. Learning how to work with AI became a crucial part of the creative process. Despite many frustrating failed attempts, this exercise opened by creative outlets and pushed my boundaries in unexpected ways. Rather than replace my creativity, AI helped supplement and expand it. What does this mean for the future? As we look at processes in the workplace and our lives that can be improved with AI, we should not discount the potential this technology has to unleash creativity in people. It is not just art that can be autonomously created, but also music, literature, and news. As AI gets better at generating content, we can continue to look for ways to use it to increase productivity or spur our own creativity. Healthy skepticism is normal and necessary when examining new technologies, and some of these questions and risks are serious and essential to sort out. At the same time, we can have a touch of optimism and appreciate the possibilities of the future. After spending this time creating with AI, I remain hopeful that human artists will continue to evolve with and without AI, and that AI art capabilities will help push humans to new heights. Written By Elizabeth Shubov. Elizabeth is an Emerging Technology Consultant, Attorney and an Advisor with The Cantellus Group. These blogs by TCG Advisors express their views and insights. The strength and beauty of our team is that we encompass many opinions and perspectives, some of which will align, and some which may not. These pieces are selected for their thoughtfulness, clarity, and humor. We hope you enjoy them and that they start conversations! _____________________________________________________________________________________ Getting Creative with AI 7
You can also read