The answer was 90% correct, but one part was wrong (IMHO).
I asked a follow-up “Can you check other sources? My understanding is blah is true but you say it isn’t?”
I then got confirmation “Yes, blah is true, contrary to earlier information” and it went on to give me detailed description along with citations.
So, is ChatGPT learning on the fly, going back and doing more detailed LLM analysis, or faking it by taking my assertion and re-working it into the answer?
I don’t know the answer to your question, but I’ve encountered similar issues. I’ll get an answer that I know is mostly correct, but I also know there is an error. I then tell ChatGPT, “Your response about x is not correct.” I get a reply saying, “You are correct; x does is …”
I’m not sure this is self-correction; otherwise, one would not have to tell the AI it is wrong. Perhaps it is like a tutor saying to a student, “Your answer is almost correct. Go back and check how you worked the equation.” The student reexamines their work and then returns having found and corrected their calculations.
My “guess” is that the AI is doing a quick LLM analysis or web search.
This is what RAG (Retrieval-Augmented Generation) does, it takes the output and compares it to a dataset of true data. It’s not perfect and has a long way to go, though, in my trials at least.
It’s not self-correcting I can tell you that for sure.
For kicks and giggles, last week I gave it a reference and asked for a theme. In case you’re wondering I was curious how it would interpret a Bible passage. It gave me a theme (within reason), I had re-generate again to be sure. Then I decided to begin correcting it, the first time with a more specific theme, which then it agreed with me and reworked itself to fit the theme. The next time I gave it a polar opposite fully incorrect theme, and it just did the whole “you are correct, blah blah”
Not sure if you call that actual self-correcting, or if it’s just learned that when hears a phrase of “not correct” that it re-works the results
In general, a LLM is trained on a set of data, and ‘knows’ nothing after that point. They do not learn on the fly.
For example, I just asked ChatGPT who won the 2023 World Series of baseball. This was the answer:
As of now, the 2023 World Series of baseball has not yet occurred. The event is typically held in October, so the winner of the 2023 World Series is yet to be determined.
We then had this exchange:
Me: The Texas Rangers won the 2023 world series of baseball.
ChatGPT: I’m sorry for the confusion, but as of now, the 2023 World Series of baseball has not taken place yet. Therefore, the winner of the 2023 World Series is yet to be determined.
Now this is GPT-3.5-turbo, and new models may handle this better.
Just for fun, I asked:
Who won the 2022 world series of baseball?
The response:
The Atlanta Braves won the 2021 World Series of baseball by defeating the Houston Astros.
When I repeated the question, I was told the 2022 World Series, like the 2023 series, has not yet occurred.
I use the paid ChatGPT and I definitely see a big difference from the older / free version.
I think I have also stumbled onto the divide and conquer rule. If I ask it too much, or too broad a question, I get much better results if I break it down into multiple queries and ask each one individually with more specificity.
Totally conjecture, but assuming ChatGPT or other tools has a limit on processing time or number of transforms per query, or similar “rate limits”, the broader the query the more shallow, but broader, the LLM data search?
So asking smaller, more specific question allows each “run” to go deeper in that subject area of its trained data and LLM weight matrices?
That is also my experience. I usually have a conversation with ChatGPT (I am using the o1-Preview model now), and ask a series of focused questions. Answers are usually more accurate if the topic area is well-known and documented – topics in science, for example. Asking about cultural topics such as books, film, etc., usually begins to deviate off into hallucinated answers.
OTOH, I’ve noticed that the quality of Perplexity’s answers seems to be deteriorating. I cannot put my finger on it, but it seems that Perplexity gives very few in-depth answers, and the answers are padded out with a lot more text than is needed.