If we look beyond the eye-popping headlines made by ChatGPT, generative AI is still in its infancy. You can see why: Recently, ChatGPT told a user that it takes “9 women 1 month to make 1 baby”. Unfortunately, it got the math all mixed up. This is just one among the scores of booboos it has made. Agreed, generative AI has unprecedented potential to transform the world, but the data it is trained on, sourced from the Internet, is poisoned with biases, flaws, and false information. Generative AI will make mistakes, the consequences of which could be disastrous. For example, it could produce convincing-sounding misinformation for a high school student’s essay, and when discovered by teachers, the student would fail. The technology also raises ethical questions about using information, data, and creativity — much of it protected by copyright laws — that belongs to someone else.
Today, Generative Pre-Trained Transformers (GPT) and Lage Language Models (LLMs) such as Stable Diffusion and DALL-E 2 are using text to generate art; AudioLDM and Moûsai are using latent diffusion models to generate music from text; and Opus AI is generating video games from text. In one stunning development, researchers Yu Takagi and Shinji Nishimoto from the Graduate School of Frontier Biosciences at Osaka University reconstructed high-resolution visual images from MRI scan data using Stable Diffusion. This is image-to-image generation — one step beyond text-to-text generation. Microsoft’s Kosmos-1, that is being pegged as a multimodal model, uses text, audio, images, and video as inputs to solve visual puzzles. In its paper, Language Is Not All You Need: Aligning Perception with Language Models, the Microsoft team shows several examples of the power of Multimodal Large Language Models (MLLM). As an example, the model can look at a picture of two women tennis players, one of them a blond, and answer the question, “What is the hairstyle of the blond called?” Answer: Ponytail (correct!).
The problem is that the training data for these models comes from somewhere, and using it without permission poses serious ethical questions. It is the equivalent of stealing personal property. Artists are, rightly, up in arms against models that use their work as inputs. Designer, illustrator, and educator Steven Zapata discusses the elusive accountability of such models in his video, The End of Art: An Argument Against Image AIs. Says Zapata, “The performance of the model would not be possible without all of the data fed into it — much of it copyrighted.” Don’t miss his video because it presents some of modern technology’s most challenging questions. He also uses the opportunity to dive deep into how some technology companies, valued at billions of dollars, use “complex shells of for-profit and non-profit companies making it difficult to pinpoint where any wrongdoing occurred.”
This presents three clear problems:
1. The dangers of using biased, flawed, and false information as input
2. Using AI-generated content and the hazard of passing it off as your own
3. The ethics of data laundering — or using someone else’s data/ creativity to manufacture, run and maintain AI products
The technology company that first solves these three challenges will emerge as the winner. Ethical AI that places integrity above all is the area that is about to attract the next major investments. Keep a look out for where venture capital is headed next in AI. My bet is toward ethical AI.
OpenAI, the company that created ChatGPT and DALL-E 2, is on its way to solving at least one of these problems. In 2021, the company worked with Georgetown University’s Center for Security and Emerging Technology and the Stanford Internet Observatory to see how LLMs can be misused. OpenAI has an AI classifier capable of identifying when text is written by AI. This, says OpenAI, can mitigate the effects of an “automated misinformation campaign” or “using AI for academic dishonesty” and trying to pass off an “AI chatbot as a human.”
There are several levels at which AI researchers and practitioners will need to operate. First, they must build models that can filter misinformation or use only ethically sourced and non-biased data (preventing over-representation of variables). Second, generative AI models must be released with more care, even restricting usage, if necessary, to screened, authorized, and identifiable entities. Finally, government and media must play a more profound role in educating users. This need not be the last step. Instead, it should be the first step in generative AI adoption.
We already see some of these changes; it is just that they should be happening more quickly. Labeled Faces in the Wild, a data set of thousands of faces used by ML and data scientists to train facial recognition models, now comes with a clear disclaimer: “Many groups are not well represented in LFW. For example, there are very few children, no babies, very few people over the age of 80, and a relatively small proportion of women. In addition, many ethnicities have very minor representation or none at all.” In the coming months, AI that uses ethically sourced data that does not adversely impact society and can be transparently shared will rise to the top.