Can LLMs Provide References for Their Work? An Experiment

Phil Windley // Thu May 30 08:52:00 2024 // chatgpt genai llm prompt+engineering

Summary

We know that LLMs hallucinate. And we know that we can overcome some of that problem with better prompts. But can LLMs provide references for their work? I conducted an experiment.

Recently my friend Scott Mace asked this on X in response to a post about the evolution of software and how LLMs will likely follow that same trajectory:

Yes, but how trustworthy will it be? Can I at least see the links it used to create its response?
From X
Referenced 2024-05-21T18:09:06-0400

The short answer is "no", but the real answer is a bit more nuanced because of how LLMs like ChatGPT work. First the short answer: LLMs don't understand what they're saying so they can't reference the material that it used to create a response. They don't work like that. They don't go out, do a bunch of research and then create a response. Instead lots (and I mean LOTS) of material is fed into the program that creates the model, a multi-dimensional matrix of word orders. Then when you ask a question ("give it prompt" in LLM-speak), it processes the question and generates text based on the probabilities of words in its model¹. There's no way it can satisfy Scott directly because the model doesn't—can't, as far as I know—include the references that it used to create the text because it's generating it on the fly from material that has been thoroughly blended together.

But, in fairness, that's not what people do most of the time either. If you asked me over lunch or at the end of a tech presentation to describe zero trust and its pros and cons, I'd start talking. I wouldn't start looking up things on the web and somehow weave them into text complete with references. I'd go look for references if you asked me to back up something I said. So, let's try that with ChatGPT 4o. I gave it the following prompt:

You are a writer for a website focusing on technology. Your specialty is security. Write a 150 word explanation of zero trust with three pros and three cons.

ChaptGPT answered with this text:

Zero Trust is a security model based on the principle of "never trust, always verify." Unlike traditional security models that assume everything inside the network is safe, Zero Trust assumes that threats can exist both inside and outside the network. It requires strict identity verification for every user and device attempting to access resources, regardless of their location.

Pros:

Enhanced Security: By continuously verifying users and devices, Zero Trust reduces the risk of breaches and unauthorized access.

Minimized Attack Surface: It limits access to only what is necessary, reducing potential entry points for attackers.

Improved Compliance: Helps meet regulatory requirements by ensuring strict access controls and logging all access attempts.

Cons:

Complex Implementation: Deploying Zero Trust can be complex and time-consuming, requiring a significant overhaul of existing systems and processes.

Increased Costs: The initial investment in technology and training can be high.

User Experience Impact: Continuous verification can lead to friction and inconvenience for users if not implemented smoothly.

Zero Trust is essential for modern cybersecurity, but it requires careful planning and resources to implement effectively.

That's pretty good. But what if we need some references to other material to back up the pros and cons? I asked ChatGPT to supply them:

Now repeat that task, but this time include references as hyperlinks to back up each pro and con.

ChaptGPT answered with this text:

Zero Trust is a security model based on the principle of "never trust, always verify." Unlike traditional security models that assume everything inside the network is safe, Zero Trust assumes that threats can exist both inside and outside the network. It requires strict identity verification for every user and device attempting to access resources, regardless of their location.

Pros:

Enhanced Security: By continuously verifying users and devices, Zero Trust reduces the risk of breaches and unauthorized access (CSO Online).

Minimized Attack Surface: It limits access to only what is necessary, reducing potential entry points for attackers (NIST).

Improved Compliance: Helps meet regulatory requirements by ensuring strict access controls and logging all access attempts (Gartner).

Cons:

Complex Implementation: Deploying Zero Trust can be complex and time-consuming, requiring a significant overhaul of existing systems and processes (Forrester).

Increased Costs: The initial investment in technology and training can be high (ZDNet).

User Experience Impact: Continuous verification can lead to friction and inconvenience for users if not implemented smoothly (TechRepublic).

Zero Trust is essential for modern cybersecurity, but it requires careful planning and resources to implement effectively.

That looks great! Nice references for each bullet. There's just one problem. All the links, except the NIST reference, are dead. I don't know if they're wrong, made up, or stale. I tried finding content on a few of the sites by searching and came up dry. As I heard Johannes Ernst say once, ChatGPT is a "large language model" not a "large fact model" and this bears that out.

Still, asking for references or evidence for statements is something people are going to want. So, I tried a different tack. This time I asked this:

Why do you say complex implementation is a problem for zero trust? Can you give me a few references on that I can read myself?

ChatGPT responded with this:

Implementing a Zero Trust security model can be challenging for several reasons:

Overall Complexity: Zero Trust requires a comprehensive understanding of your network's intricacies, including users, devices, applications, and data. This can involve significant changes to existing infrastructure, which can be disruptive and costly. Organizations need to be prepared for the extensive planning and resources required for successful implementation (Help Net Security) (Enterprise Networking Planet).

Legacy Systems: Many organizations have legacy systems that are not designed to operate within a Zero Trust framework. These systems might require different security measures or even complete replacement, adding to the complexity and cost (Help Net Security) (Enterprise Networking Planet)

Cultural Shift: Zero Trust is not just a technological shift but also a cultural one. It requires continuous verification and strict access controls, which can lead to user frustration and resistance if not managed properly. Ensuring user buy-in and minimizing disruption to workflows through training and communication is essential (Deloitte United States) (Guidehouse).

Resource Intensive: Implementing and maintaining Zero Trust can put a strain on IT resources due to the need for continuous monitoring and management. The need for advanced security tools and ongoing updates further adds to the resource demands (Help Net Security) (Enterprise Networking Planet).

Stakeholder Alignment: Achieving buy-in from all stakeholders and aligning them with the Zero Trust principles is critical. This requires a coordinated effort to educate stakeholders, prioritize gaps, and ensure consistent understanding and application of Zero Trust principles across the organization (Deloitte United States) (Guidehouse).

These challenges highlight the importance of a phased and well-planned approach to Zero Trust implementation, considering both technological and human factors. For more detailed insights, you can refer to articles from Help Net Security and Guidehouse.

Much better. The links in the bullets all work, but the ones at the end don't. The articles it references aren’t very detailed, so I wouldn’t say they’re stellar references, but at least there’s a pointer to outside material that discusses similar challenges. So, partial success.

I've found that when you're using ChatGPT to do research, treating it like a librarian and asking questions that follow up are the best way to gain confidence that the material is correct and find the supporting material, just as I did above. I'm sure you could wrap additional helper functions around an LLM to make getting references easier. But for now interrogation through chat is the best way to get comfortable with the results.

Notes

Yeah, this is really simplified and thus wrong, but it's directionally correct. For a better explanation, see What Is ChatGPT Doing … and Why Does It Work?.

Photo Credit: Older Man Talking to Robot Librarian from DALL-E (public domain) Prompt: draw me a rectangular picture of an older man talking with a robot librarian. Use earth tones.

Please leave comments using the Hypothes.is sidebar.

Last modified: Fri May 31 08:32:29 2024.