ChatGPT unreliability

jcarucci · June 12, 2023, 1:54pm

So I’ve been doing an experiment where queries I would normally do on Google, I go to ChatGPT first. I’m surprised at how often ChatGPT is wrong.

I’m a big baseball fan and I was talking with a friend about over .400 hitters and how it has been so long since anyone has done it (1941, Ted Williams). So I asked ChatGPT for the list: “How many hitters hit over .400 in a season?” It’s response was to give a list of 4:

I knew the list was longer than that, so I made the same query on Google and got the real answer which is 42:

I’m also a software developer, so I’ve asked ChatGPT to write some code and it’s pretty impressive that it can pretty quickly spit out boilerplate code, but it is often wrong. You can’t really trust ChatGPT, it still requires expertise to figure out if it is giving you a BS answer or not.

What is all the hype about? Why are people worried about an AI that can’t simple facts straight? How is that going to replace humans when it can’t even replace Google?

ryanjamurphy · June 12, 2023, 2:15pm

These kinds of queries are not what chatbots should be used for, at least not yet.

These machines understand nothing about the world. They are simply probabilistic generators of text, seeded by a bit of randomness.

When asked a question, they know what an answer looks like.

In some kinds of questions, the shape of a right answer is meaningless if the contents of the shape are wrong.

However, there are many kinds of questions where there is no right answer, and the shape of the answer is all that matters. That’s what these things are (currently) for.

The above in a flowchart:

rkaplan · June 12, 2023, 2:57pm

You will get much better results using Bing AI Chat for those sorts of questions rather than ChatGPT.

ChatGPT is better for editing suggestions.

jcarucci · June 12, 2023, 3:52pm

How useful is that? I manage a team of developers, when someone first starts, I check all of their work. But as time goes on, trust builds and the best developers are the ones I know I can assign a task to and know they will do it right. The worst are the ones who I never trust and I always have to check their work. It sounds like ChatGPT is the second kind of developer, the kind you’d rather do the work yourself because you never know when it will just make stuff up.

webwalrus · June 12, 2023, 4:04pm

In many situations, the second part of your sentence (“you never know when it will just make stuff up”) can be true without the first part (“you’d rather do the work yourself”) being true.

Asking ChatGPT for a dozen points regarding a book you’re working on could be useful. If you’re a subject matter expert, you might realize one of them isn’t important - but numbers 9 and 11 might be something you hadn’t considered.

Asking ChatGPT for an argument against the position you’re taking in an op-ed might give you additional insight to work from in your article.

It’s a tool for augmenting your work, not a person to replace a worker. Just like a drill doesn’t turn me into a professional carpenter - but a drill saves a professional carpenter a ton of time if used properly.

Bmosbacker · June 12, 2023, 4:22pm

I’ve found that ChatGPT can be useful as a catalyst for articles and presentations. If I provide enough detail in my query, it will often generate a useful outline or topic ideas as a starting point. It will sometimes offer points I’d not considered. It certainly cannot do the work for me, nor would I want it to.

webwalrus · June 12, 2023, 4:56pm

ChatGPT Prompt: give me six bullet points for an article about why chatgpt is unreliable

Response:

Lack of Contextual Understanding: One of the major reasons why ChatGPT can be unreliable is its limited ability to grasp the context of a conversation. It often generates responses based on patterns it has learned from training data, rather than fully understanding the nuances of the conversation at hand. This can lead to inaccurate or irrelevant responses.
Inconsistent and Contradictory Answers: ChatGPT’s responses can vary widely depending on how a question is phrased or the order in which information is provided. This inconsistency can make it difficult to rely on the model for accurate and coherent answers. Users may receive different responses to the same question, leading to confusion and mistrust.
Propensity for Biased or Inappropriate Content: Since ChatGPT learns from vast amounts of text data available on the internet, it can sometimes generate biased or inappropriate responses. This is particularly true when it comes to sensitive topics such as race, gender, or politics. Without careful monitoring and filtering, the model’s responses can reinforce stereotypes or propagate misinformation.
Limited Fact-Checking Capabilities: While ChatGPT can provide information on a wide range of topics, it lacks the ability to fact-check its responses. It may inadvertently present inaccurate or outdated information as factual, without the ability to verify the correctness of its statements. Users must independently fact-check any information provided by the model.
Vulnerability to Adversarial Inputs: ChatGPT is susceptible to manipulation through carefully crafted inputs. Adversaries can exploit the model’s weaknesses and biases by intentionally phrasing questions or providing information in a way that leads to misleading or harmful responses. This vulnerability makes it unreliable in situations where the model’s output can have significant consequences.
Lack of Accountability and Transparency: As an AI model, ChatGPT lacks transparency in its decision-making process. It can be challenging to understand how the model arrived at a specific response or to identify any underlying biases or errors. Additionally, there is limited accountability for the model’s actions, making it difficult to address or rectify any issues that arise from its unreliability.

jcarucci · June 12, 2023, 5:06pm

This is fantastic!

(side note) I originally wrote “This is great!” but it was below the 20 char minimum. So I asked ChatGPT to give me a more than 20 character alternative to: this is great. It responded “This is fantastic!”, which if you notice is only 18 characters. So not being able to count is another flaw of ChatGPT!

Ulli · June 12, 2023, 5:10pm

I am surprised, that you are surprised!
I thought it was at least in this community not a secret, that ChatGPT is simply not able, not constructed, and not intended, to give you correct answers onto your questions!?

A Hammer is a very useful Toy, but you could not use it, to saw Wood…!

jcarucci · June 12, 2023, 5:15pm

Yeah, I guess I have a different definition of intelligence. “Making stuff up if I don’t know the answer” is not how I would describe a person I consider intelligent.

Ulli · June 12, 2023, 5:17pm

Yes, that is why I already wrote in an other Thread, that there is no AI out there!
ChatGPT is no AI, it is a LLM!
Simply a clever written software with a single purpose…

jcarucci · June 12, 2023, 5:28pm

You are right, Bing works much better. It even gives you options “More Creative”, “More Precise”, so I chose more precise and asked my over .400 question there and it gave the right answer, plus links to verify, this is cool!

rkaplan · June 12, 2023, 5:29pm

@webwalrus

Or alternatively:

give me three bullet points for an article about why chatgpt is unreliable and three more about why chatgpt is reliable

ChatGPT

Why ChatGPT is Unreliable:

Inherent Bias: ChatGPT, like any other AI language model, is trained on large datasets that reflect the biases and prejudices present in the data. This can result in biased responses that perpetuate stereotypes, promote misinformation, or exhibit discriminatory behavior. The model lacks the ability to critically analyze information and may provide unreliable or misleading answers as a result.
Lack of Contextual Understanding: While ChatGPT can generate coherent responses, it often struggles with comprehending the context and nuances of a conversation. It may misinterpret queries or fail to grasp the underlying intent of the user, leading to inaccurate or irrelevant responses. This limitation makes the model unreliable in situations that require a deep understanding of complex topics or require precise and accurate information.
Generation of False Information: ChatGPT does not possess fact-checking capabilities, and it can generate plausible-sounding but false information. It relies solely on patterns and associations learned from its training data, without external verification mechanisms. Consequently, it may provide unreliable answers when confronted with ambiguous or unverified queries, potentially spreading misinformation.

Why ChatGPT is Reliable:

Vast Knowledge Base: ChatGPT is trained on a vast corpus of text from various sources, allowing it to draw from a wealth of information. It can provide accurate and detailed responses on a wide range of topics, making it a valuable tool for retrieving factual information, historical data, or general knowledge. This breadth of knowledge enhances its reliability as an information resource.
Continuous Learning and Improvement: OpenAI regularly updates and refines ChatGPT based on user feedback, addressing its limitations and enhancing its capabilities. By continually learning from its interactions with users, the model becomes more accurate, reliable, and responsive over time. This commitment to improvement ensures that ChatGPT evolves into a more trustworthy and dependable conversational partner.
Flexibility and Adaptability: ChatGPT’s ability to generate coherent and contextually appropriate responses, even in open-ended conversations, contributes to its reliability. It can engage in meaningful discussions, understand and respond to user prompts, and adapt its tone and style to match the conversation’s context. This adaptability allows it to maintain consistency and generate reliable responses across a wide range of conversational scenarios.

Bmosbacker · June 12, 2023, 5:37pm

Another than that, it’s pretty useful!

tf2 · June 12, 2023, 5:54pm

Do the sources always check out? I haven’t had a chance to try Bing seriously, but for a while when ChatGPT blew up and people were posting their exchanges (including asking for sources), a lot of them were either invented links or they linked to something not particularly related.

jcarucci · June 12, 2023, 5:59pm

I tried it today and it had good links to Wikipedia and Baseball Almanac.

tf2 · June 12, 2023, 6:07pm

Remember when everyone fretted that Wikipedia was unreliable, too easily manipulated and a danger to learning and society? Ah, quaint times.

pantulis · June 12, 2023, 6:22pm

Bing is different from ChatGPT. Bing GPT uses the Bing Index so it basically runs a Bing search, finds the most relevant pages and then feeds those pages -and your original query- to the GPT engine. While ChatGPT can hallucinate URLs, Bing probably gets them right.

JKoopmans · June 12, 2023, 6:35pm

curious who wrote that piece in wikipedia…

I’m always amazed at the value people attribute to wikipedia as being a valid source of information, while at the same time asking where ChatGPT got it’s source.

From wikipedia:

Wikipedia:Wikipedia is not a reliable source - Wikipedia.

Wikipedia is only as reliable as the competence of the people taking the time to check the articles and willing to go through the process of correcting mistakes (and there are many, many mistakes)
Anyone can put something in, or edit something, and if it is not picked up then it will become “truth” apparently.

jcarucci · June 12, 2023, 6:46pm

Which is why I mentioned Baseball Almanac too. From my limited experience of today, Bing GPT usually gives 5 sources, one from Wikipedia, but then some others. For the over .400 query it also gave, mlb.com, forbes.com and sportsmanist.com, bing.com, the bing just being a straight search.

One of the things I really like about Wikipedia is it always has a list of References at the end. The one on over .400 hitters has 43 links to things like Baseball-Reference.com, newspaper articles etc. So I feel like even if the article has mistakes, I can always dig in to the primary sources to verify. Not sure why this isn’t cited more often in arguments over Wikipedia’s reliability. Isn’t this what a traditional encyclopedia does to establish it’s usefulness? I always follow a professor’s advice I got a long time ago. Consider the encyclopedia as step 0 in your research. You would never use it as your primary source, but it points you in the right direction to get to the good sources.