Is your contention that the new Spotlight creates the risk by indexing more, or is that it creates the risk by indexing differently?
Just as an example, the existing Spotlight creates a reasonably-complete keyword index of all files on all hard drives, in addition to app-specific data. Just in my user-level Library folder, there are almost 30,000 files. And this isn’t a human-readable index either. We have no idea what’s actually in there, other than trusting Apple’s say-so.
So if we currently have a reasonably-complete, non-human-readable index of all data on our computers, does that also create standing exposure risk?
If I grant for a moment that the risks are as bad as you’re saying, and the “master off-switch” is the only protection, I would agree with this 100%. A master off-switch could disappear at any time, whether in the public release, in a point update, etc. And it’s the sort of thing the average person would be very unlikely to notice until after they’d updated.
AI typically preserves manual filing but provides an additional semantic layer of indexing beyond manual indexing. That is very much a feature, not a bug.
Sorry, I don’t understand the increased risk presented by the existing of the index. The index (if anyone actually knows what it is) is derived from the data on the machine, so the “exposure risk” you mention apply to the already-exist data in our files and the machine as a whole, and is no different than where we are today. I don’t understand how a different kind of index on the same machine as the data is an increased risk. It seems we just need to protect the index with the same care as we protect the machine and data today.
An AI semantic index holds information with context and intent; a standard keyword index does not. With a standard index, a malicious actor or an advertising bot has to fish for exact keywords across dozens of separate, sandboxed app databases. With a cross-app semantic index, all that information is already pre-digested and contextually mapped.
Someone gaining access to that data doesn’t need to guess your keywords. They can simply ask:
“Extract all text related to political protests.”
“Find all medical record information.”
“List all purchases in the last 6 months.”
This will immediately surface relevant data across every app the AI touches. This is why I argue we are moving away from passive, fragmented data storage and significantly expanding the exposure vector. A practical example of this risk is at border crossings, where government officials can legally demand access to search a device. A semantic index allows them to audit a person’s entire life conceptually in seconds.
I’m not arguing that this architecture is inherently evil, but rather that the risks are fundamentally changing. As the manual sorting and contextualizing of our data is handed over to AI, we lose direct visibility. The data layer becomes opaque, and we become entirely dependent on the AI to retrieve it. If that system experiences glitches, maladaptation, or is exploited by bad actors, the negative impact is drastically magnified—simply because the AI has already built them a perfect conceptual map of our private lives.
Sorry, my ignorance defeats me. If someone gets access to the semantic index – “how” is unspecified – why would they bother running queries against it, since is sets on the Mac right next to the original data? I am no worse off.
The semantic index would make it more efficient for somebody to find the original data.
Do you mean “compartmentalized” instead of “sandboxed”? There’s no real “sandboxing” at the foundational OS level. It can access everything.
I would argue that we’ve already lost direct visibility on almost every iOS app, as they don’t use the standard file model. But I’d agree with @rkaplan above that AI on desktop isn’t going to be preventing us from using manual filing - it’s just going to index the documents we’ve manually filed.
People can already be lazy and store all of their data in a single folder. I know a number of people who have one or two folders in their email total, and then figure they’ll just search for things they need. AI isn’t any different than that, except it’s better at searching.
This is a potentially-valid concern. Given their existing search/copy/seizure powers, I would suggest that this isn’t a new risk though. It’s making an existing power more efficient for them to exercise. I would be significantly more worried that they’d have software which could rip through common locations on one’s computer and build its own AI index.
But it seems to me Apple may have actually done us a favor by making it more clear to everyone how much information is in our phones. Governments have had the ability to do semantic search for quite some time.
It seems to me the solution is not to worry about AI on our iPhones but rather to be sure the data on our phones is encrypted and secure. Toward that end, Apple has long been way ahead of competitors in advocating for customer privacy.
As for the situation of a user being required by force or by law to provide passwords to a device - that is not a situation for which I would blame Apple.