How will ChatGPT impact my job or agency mission? That question is top-of-mind for many of our government clients, and for good reason—generative AI tools like ChatGPT, powered by large language model technology (LLMs), have promised to revolutionize the way we interact with vast amounts of data, ranging from text and audio to images and video. With potential mission applications such as augmenting customer service and streamlining data analysis, the allure is understandable.
But like all revolutionary technologies, the reality is more nuanced. In our work helping federal agencies explore the potential of generative AI, we recommend a measured approach to incorporating this tech into your operations. Here are four observations from the front lines.
4 reasons to approach LLM government use cases with caution
LLMs are not built to give the right answer. They’re built to give you an answer that sounds right. If you ask one, “What is the capital of the United States?” it won’t tell you “Washington, D.C.” because it knows that’s the answer. It tells you that because it sounds right. LLMs are only as good as the data they’ve analyzed and the context and prompt they’re given, and sometimes they’ll give an incorrect answer just for the sake of providing an answer—called “hallucinations.” For example, if you ask an LLM, “Who are the first six members of this team?” after having it analyze an “About Us” page with only five team members on it, the LLM could make up a sixth name in response. These boundaries have to be mitigated with checks, balances, and a human to validate the answer. (Editor’s Note: For more on this topic, learn how we used an LLM to support the mission of HIV.gov.)
Never put information from an LLM out into the world without looking at it, because there is a chance that something’s not right.
It is not a zero-sum game. LLMs are augmentation technology, not designed to replace humans with AI. If you try to replace humans with them, you'll run into problems—some will be technical, like getting the wrong answers, but some will be emotional, like anxiety in the workplace.
LLMs won’t take jobs away, but leaders need to be aware of the anxiety and its cause so it can be addressed quickly. Workforce training and upskilling can help.
LLMs have bias and ethical blind spots. For example, if an LLM is analyzing survey responses, it could give more prominence to responses that are worded well and downplay those that aren’t.
Bias and blind spots can be mitigated: First, assume the LLM is giving slanted responses and program prompts to counteract it; second, try to select an LLM model that has ingrained safeguards against ethical bias; and third, never let AI-generated answers be the final draft—always have them reviewed by an expert.
Security and sustainability are still key. All LLMs use the same fundamental technology, but the infrastructure and security differ. Instead of using an LLM that stores data on external servers, like ChatGPT, agencies can use more private versions like Azure or AWS to access the LLM from within their secure virtual network. There’s even an option to host an LLM model within an organization’s own physical space.
Organizations should consult their legal departments, or even form a fusion team with legal, the CIO, and CTO, before trying an LLM—especially since the user agreements are not always clear.
Furthermore, agencies need to be mindful of the environmental impact that comes with training LLMs. These models are resource-intensive, and agency leaders should carefully evaluate the sustainability of their large-scale deployment—aiming to strike a balance between using existing models and striving for new innovations.
LLMs bring in technology, cyber, legal, ethics, and sustainability stakeholders, which can be difficult when everyone needs to agree on how to move forward. Some organizations may consider banning LLMs completely, but they’re here to stay—instead of restricting them, organizations should focus on educating people to use them responsibly and ethically.
New technologies will always disrupt, but the speed of innovation with LLMs is unheard of: It took Netflix nine months to onboard 1 million users, whereas ChatGPT only took five days, and we’ve already seen several iterations since its launch in November 2022. That's a lot to keep up with, so it’s important to be empathetic as you educate employees and explore the art of the possible. Learn more about our digital transformation work.