WASHINGTON, February 3, 2023 — Just two months after its viral launch, ChatGPT reached 100 million monthly users in January, reportedly making it the fastest-growing consumer application in history — and raising concerns, both internal and external, about the lack of regulation for generative artificial intelligence.
Many of the potential problems with generative AI models stem from the datasets used to train them. The models will reflect whatever biases, inaccuracies and otherwise harmful content was present in their training data, but too much dataset filtering can detract from performance.
OpenAI has grappled with these concerns for years while developing powerful, publicly available tools such as DALL·E — an AI system that generates realistic images and original art from text descriptions, said Anna Makanju, OpenAI’s head of public policy, a Federal Communications Bar Association event on Friday.
“We knew right off the bat that nonconsensual sexual imagery was going to be a problem, so we thought, ‘Why don’t we just try to go through the dataset and remove any sexual imagery so people can’t generate it,’” Makanju said. “And when we did that, the model could no longer generate women, because it turns out most of the visual images that are available to train a dataset on women are sexual in nature.”
Despite rigorous testing before ChatGPT’s release, early users quickly discovered ways to evade some of the guardrails intended to prevent harmful uses.
The model would not generate offensive content in response to direct requests, but one user found a loophole by asking it to write from the perspective of someone holding racist views — resulting in several paragraphs of explicitly racist text. When some users asked ChatGPT to write code using race and gender to determine whether someone would be a good scientist, the bot replied with a function that only selected white men. Still others were able to use the tool to generate phishing emails and malicious code.
OpenAI quickly responded with adjustments to the model’s filtering algorithms, as well as increased monitoring.
“So far, the approach we’ve taken is we just try to stay away from areas that can be controversial, and we ask the model not to speak to those areas,” Makanju said.
The company has also attempted to limit certain high-impact uses, such as automated hiring. “We don’t feel like at this point we know enough about how our systems function and biases that may impact employment, or if there’s enough accuracy for there to be an automated decision about hiring without a human in the loop,” Makanju explained.
However, Makanju noted that future generative language models will likely reach a point where users can significantly customize them based on personal worldviews. At that point, strong guardrails will need to be in place to prevent the model from behaving in certain harmful ways — for example, encouraging self-harm or giving incorrect medical advice.
Those guardrails should probably be established by external bodies or government agencies, Makanju said. “We recognize that we — a pretty small company in Silicon Valley — are not the best place to make a decision of how this will be used in every single domain, as hard as we try to think about it.”
Little AI regulation currently exists
So far, the U.S. has very little legislation governing the use of AI, although some states regulate automated hiring tools. On Jan. 26, the National Institute of Standards and Technology released the first version of its voluntary AI risk management framework, developed at the direction of Congress.
This regulatory crawl is being rapidly outpaced by the speed of generative AI research. Google reportedly declared a “code red” in response to ChatGPT’s release, speeding the development of multiple AI tools. Chinese tech company Baidu is planning to launch its own AI chatbot in March.
Not every company will respond to harmful uses as quickly as OpenAI, and some may not even attempt to stop them, said Claire Leibowicz, head of AI and media integrity at the Partnership on AI. PAI is a nonprofit coalition that develops tools recommendations for AI governance.
Various private organizations, including PAI, have laid out their own ethical frameworks and policy recommendations. There is ongoing discussion about the extent to which these organizations, government agencies and tech companies should be determining AI regulation, Leibowicz said.
“What I’m interested in is, who’s involved in that risk calculus?” she asked. “How are we making those decisions? What types of actual affected communities are we talking to in order to make that calculus? Or is it a group of engineers sitting in a room trying to forecast for the whole world?”
Leibowicz advocated for transparency measures such as requiring standardized “nutrition labels” that would disclose the training dataset for any given AI model — a proposal similar to the label mandate announced in November for internet service providers.
A regulatory framework should be implemented while these technologies are still being created, rather than in response to a future crisis, Makanju said. “It’s very clear that this technology is going to be incorporated into every industry in some way in the coming years, and I worry a little bit about where we are right now in getting there.”
Originally posted on February 3, 2023 @ 4:50 pm