It’s literally a composite of other people’s work, whatever texts were used as training material for the LLM. There’ is no way for an LLM to output something that isn’t, in essense, a tapestry of other people’s words. It creates nothing.
This displays a common, but fundamental misunderstanding, of LLMs.
There are problems with the current state of the art, as the hallucinations show.
But is used correctly LLMs can create.
One way to test if you are plagiarising is to run it through a plagiarism checker. Just paste your text from ChatGPT into the checker to see how it scores - Grammarly have a free one:
My experience is my ChatGPT outputs consistently fail plagiarism checks.
… that said I just asked ChatGPT to write me a short story and it passed the plagiarism checker! Have just asked ChatGPT if ChatGPT plagiarises:
When using ChatGPT or any other language model, the output it generates is typically a combination of information it has been trained on, which includes publicly available text from a wide variety of sources. It does not directly copy from those sources but can generate text that is influenced by the data it has seen.
As a result, while the output from ChatGPT might not be a direct copy of any specific text, it can still inadvertently produce text that closely resembles or mirrors existing content. To avoid plagiarism, it is always recommended to thoroughly check any text generated by language models and, if necessary, to rephrase or cite any parts that may be too similar to existing content.
In academic or professional settings, it’s essential to give proper credit to original authors and to adhere to any relevant guidelines for attribution and citation.
I’ve received that reply from ChatGPT. It really seems “weasel-worded” to me. It reads like a short instruction manual on how to succeed at plagiarizing. And I particularly like the last sentence saying one needs to give proper credit to original authors which is something ChatGPT never does.
But yet, isn’t that exactly how much of what is purely human-generated is done?
When I write an article, blog, or email to a client, I’m drawing on the large corpus of everyone else’s words I have read, listened, watched, or heard.
When I “add my spin” and interpretation, that really isn’t that much different than LLM’s using multi-terabyte n-dimentional tables of probabilities to calculate each next word in the sentence fragment it is constructing.
Not in total, but probably XX% or more of human content creation is really the same thing?
(I’ve used xx instead of a number to keep this from being an argument about quantitative versus qualitative methodologies)
A “tapestry” - I wish I’d written this!
No. The way LLMs work is absolutely nothing like how individual humans generate language.
Do you go and read a massive amount of articles and then create something that probably says the same thing? If you do I’d argue the value of your piece.
An LLM cannot have experiences or an opinion, yet wouldn’t your article be an expression of your opinion and experiences?
It’s a mistake to conflate machine learning and human learning. They are not at all comparable processes. Machines copy and can never make a choice they were not programmed to make. This is not the case with human learning.
I understand LLMs pretty well. The problem tends to be in the use of “generative” in generative AI and a common misunderstanding of how they actually produce their output. Which may be abstracted but is still a composite of their training materials.
Which is true of humans as well.
I stand by my “fundamental misunderstanding” statement.
No it isn’t.
Humans do not just stitch together things they already know. This is reductive at best even if given the benefit of the doubt. Invention exists. Shakespeare exists. ChatGPT could not produce Shakespeare without first disecting shakespeare. Shakespeare is unlike any work that came before it.
LLMs do not function like human brains function. They don’t generate language in the same way as a human mind does. They do not learn in the way that human minds learn. They do not accurately mimic the way that brains function. They cannot create anything new. This is categorically not true of humans.
Esperanto exists. Enochian exists. LLMs cannot create new languages. They can’t create language from nothing - which is literally the whole evolution of the human mind. People not exposed to language can and have created languages, however simplistic and rudimentary.
You’re wrong. The fundamental misunderstanding here is yours.
Here’s some easy reading that’ll hopefully go some way to help explain why comparing what an LLM and an actual brain do is not even wrong.
What it really boils down to though is that there is no working theory for how biological neural networks actually work, so no software written by humans can possibly simulate it. What we now know of as neural networks (in computing) and machine learning is largely based on a decades old and long-disproven but useful modelling theory that does in fact create some really cool software.
This assertion has absolutely no merit. That’s not how humans work. The human brain doesn’t even store data for recall, never mind for reuse.
We will agree to disagree.
Not to get too far into this you are right. This is all hype really. There is some chunking as it were but basically nearly all of this is just prediction using such large models that on scale alone the human or animal mind cannot be similiar. Nothing like Language or ‘mind’ can possibly be doiing what these systems do: as you say the mind is clearly not a linguistic output prediction machine. It is a ‘prediction by understanding how things fit together or work’ machine. The interchange with MevetS for example is not based on him ‘predicting’ in some sense what you are going to say next.
There is no actual construction of sentences by these machines, not as the human brain does, though, as you say, we don’t understand the details or how ‘meaning’ is achieved. It is ‘computational’ in a sense ironicaly and in a loose way. That is infinite results from finite sources; generative roughly.
What the ‘accuracy’ of these programs shows is how predictable a lot of our discourse and knowledge really is. It is all hype, as is much of current tech.
In fact the latest ios update, following weird support call I had to make, regarding my ios 17 update which fried my cellular data I think or there is some bug they won’t acknowledge?
Makes me wonder how much I actually need of this stuff!? When I have my books, professional papers in DEVONthink 3, some useful apps and professional log ins, some lectures in ‘music’ and some podcasts all of which I can play or use on my computer alone.
I could even manage now with a MacBookAir not the rapidly devolving over speced 4K MacBookPro I splashed out on. Never again. Oh and the watch, if when I need to get a new one, if the band doesn’t fit it… I go back to a Gshock or whatever I have in the drawer! scuse the rant, it sort of follows on from what you say. Hype on top of some very useful and amazing computing power that I am coming to think could be better directed?
Just for your interest, some researchers and scholars are turning, in the hope of finding some principles at least that will unify as it were our model of brains, to animal models. Of course it turns out that pigs, monkeys and dogs are way too sophisticated and so like ours in relevant ways as to push back the inquiry onto insect brains hoping some clearer clues emerge.
The Million neuron Bee brain being the best in my own view. Bee behavior turns out to be astonishingly sophisticated. They even have feelings as it were and personalities it now turns out on some accounts.
Again, as you point out, nothing like a chip based computer.
As you say, though it is tricky point, there is no real ‘storage’ capacity anywhere for memory. So it must work in some other way. Synapases have, we have found out, a huge potential for protein variety and hence plasticity… Same for bees and some of the biological equipment seems to be present in Bacteria. That is three Billion years old. Predates the iPod even!! Who knew!