It all started with an email issue. My own domain at iCloud+ kept throwing all Gmail into junk, where I always discovered it hours or days later. After many hours of troubleshooting I gave up and decided to migrate away. First attempt was using MailCast.io but even after more hours troubleshooting with the owner I could not send email to Yahoo email users. And that was the Treasurer of the non-profit I volunteer for.
So I moved the email domain to MS 365 Standard Business and it was a PITA but it is working. But to justify the extra expense I decided to migrate my data and photos from Google One over to OneDrive. And thatās where I screwed up.
Actually I was already migrating away from Google Drive Desktop to other services that would sync to Google Drive. This is mainly due to Drive keeping separate folders for each device. It made finding things online a PITA. Search is great IF you remember what you named something. I ended up using rclone and rsync to sync and deduplicate. But in the process of all of this I have managed to somehow lose the latest copies of at least two spreadsheets I use to track financial things. They werenāt critical, but now Iām wondering what else was lost in the process?
OneDrive has its own quirks, and I moved most of my photos over using the online service MultCloud. Unfortunately it seems to have changed the dates of the photos and videos from their original dates. I foresee a long series of sessions with exiftool to reset this. Again.
And here I am, staring at an ungodly mess on my drives. I have my original local copy of Google Drive, and another copy I manually downloaded from Google Drive which is 5GB smaller. I have another local folder as my local OneDrive copy (1.3GB smaller than Google Drive) but I supposedly have OneDrive set to keep all copies on the cloud (for now) and that seems to be the copy that is in the .CloudStorage/data folder on my external drive.
So I have a lot of folder merging and calculating to do before this is usable and my confidence in it is restored. I mostly blame myself though.
1 Like
I am among the least technically qualified on this forum to offer advice, but I will be the fool who rushes in where angels fear to tread. 
If I were in your situation, I would consider letting an AI agent help with the cleanup, but with important guardrails. Claude Cowork (or a similar tool) can scan a folder, inventory its contents, identify likely duplicates, rename files, and reorganize by content rather than filename alone.
I believe my approach would be:
-
Make a verified backup first. Then, copy, do not move, everything to a temporary working folder, and keep the originals untouched until you are confident the cleanup is correct.
-
Use a deduplication tool, perhaps something like Gemini 2. AI probably can deal with what to keep, how to name, how to organize after the obvious duplicates are gone.
-
Run a āpilotā before having AI work with all the files. I frequently run āpilotā projects at the school for any major initiatives and restructuring of systems, e.g., 1:1 programs. Iād test a small batch of files before going any further.
-
Have AI design the above process and its appropriate prompts. I have discovered that AI is better at writing its own instructions than I am.
Again, I am hardly qualified to give technical advice, but I would at least experiment with this approach but only after having a complete backup and then only by copying files to a temporary folder(s) for AI to work on.
You have nothing to lose if your pilot using a temporary folder does not work. It may be worth a try.
My two cents, which is what it may be worth. 
4 Likes
An update - I ended up resetting, unlinking/relinking and generally futzing with OneDrive and somehow creating multiple copies in various states of downloaded/not downloaded. I also ran myself almost out of local drive space. At least I am guessing Iām almost out. It is hard to tell because of how OneDrive handles files that are not actually downloaded but their file size is still reporting as taking up space. So lots of false drive space reporting.
In the process I learned a lot about how Apple instituted FileProviderAPI in 2022 w/ Monterey, then added iCloud in 2023. Files which are in the cloud but not downloaded have a ādatalessā flag - they are technically pure metadata with a zero byte size. But Finder shows the logical size. The theory is that now you know how much space will be consumed if you download a local copy. Only in āGet Infoā can you see the difference between logical and āon diskā sizes. In the status bar Finder does show the correct available space. So you can have free space showing on a drive where the contents of individual folders in Finder might add up to much larger than the driveās capacity (almost double) - depending on the cloud download state.
In terminal this gets tricky. ālsā will report the full logical size, du queries the physical block allocations and will show 0 or just a few kb for the size. cat will trigger hydration/materialization (a download).
Any way, I know much more about this than I ever wanted to learn. Now copying (via rsync) some of these extra copies to an older backup drive so I can restart my OneDrive sync and proceed from there without running out of space on my 1TB SSD. I think that this copy has almost all of my recent file reorganization and metadata changes that I did via multiple shell scripts. It is just over 120k files, although OneDrive is currently reporting over 190k active files, whatever that means. It also shows almost 800GB even though Iām pretty sure it is supposed to be more like 500GB. Thatās what it was on Google Drive and Photos.
Any way, fun fun fun!
More fun! My 4TB backup drive is failing!
Keeping my fingers crossed that one folder (older, not critical, but ānice to haveā) can finish copying back out before the drive dies.
Sorry, this is turning into a sad tech journal entry. lol
Narrator voice
It could NOT, in fact, finish copying back out before the drive diedā¦
1 Like
Nothing humbles a person faster than a migration project that starts with a simple email problem and somehow turns into three cloud providers, missing spreadsheets, and metadata chaos. Iād probably keep the original copies untouched until everything is verified. A similar mindset helps with data tracking in Phonexa too, because one wrong sync can create a mess thatās much harder to untangle later.
2 Likes