If the screwdriver were invented by the tech industry today, then it would be widely deployed for a variety of tasks, including hammering nails. Since the debut of ChatGPT, there has been a growing fervor and backlash against large language models (LLMs). Indeed, many adaptations of the technology seem misappropriated, and its capabilities are overhyped, given its frequent lack of veracity. This is not to say there are not many great uses for an LLM, but you should answer some key questions before going full bore.
Is an LLM going to be better or at least equal to human responses?
Does anyone like those customer service chatbots that don’t answer any question that isn’t already on the website’s front page? On the other hand, talking to a person in customer service who just reads a script and isn’t empowered to help is equally frustrating. Any deployment of an LLM should test whether it is equal or better to the chatbot or human responses it is replacing.
What is the liability exposure?
In our litigious society, any new process or technology should be evaluated against its potential for legal exposure. There are obvious places for caution, like medical, law, or finance, but what about an LLM-generated answer that directs people to a policy or to advice that is misleading, inappropriate, or worse? Bad company policies often result in class action lawsuits. By increasing the scale of customer interactions, an improperly trained or constrained LLM could create even greater unintended liability.
Is an LLM actually cheaper?
Sure, it is easy to measure your subscription and use of a general LLM like ChatGPT, but more specific custom systems can have higher costs beyond just the compute power. What about the staff and other infrastructure to maintain and debug the system? You can hire quite a few customer service reps for the price of one AI expert. Additionally, ChatGPT and similar services seem to be subsidized by investment at the moment. Presumably at some point they will want to turn a profit, and then your cost could go up. Is that LLM actually cheaper and will it stay so for the life of your system?
How will you maintain it?
Most enterprise LLM systems will be custom-trained in specific data sets. A disadvantage to the neural networks on which LLMs rely is that they are notoriously difficult to debug. As the technology progresses, LLMs may develop the ability to revise, erase, or “unlearn” something false that it has learned. But for now, unlearning can be quite difficult. What is your process or procedure for regularly updating the LLM, and eliminating bad responses?
What is your testing process?
A key benefit of an LLM is that you don’t have to anticipate every possible permutation of a question in order for the model to provide a credible answer. However, the word “credible” doesn’t mean correct. At least the most common questions and various permutations should be tested. If your LLM will be replacing a human or existing machine process, the questions people are asking today would be a good data set to start with.
There is an old proverb of dubious provenance that translates roughly to “slow down I’m in a hurry.” Not everything will be a great use case for LLMs and there is ample evidence that enthusiasm is outstripping capabilities. However, by measuring quality and economy, and coming up with some decent maintenance and testing procedures, you can make LLMs a valuable tool in many different use cases.
Copyright © 2024 IDG Communications, Inc.