Email exporting / archiving / backup. (DEVONthink? File Folders? Backups?)

I have probably 100,000 emails in Mail. It seems a little clunky when dealing with some folders because they’re just huge. And some of these emails are 15+ years old, but I’d like to have (most of) them around for archival purposes.

I’d like these to live somewhere other than my IMAP account, and I’d like robust backups of both these and current emails.

(a) For backup, can I just back up the entire mail directory so I have a copy of the .emlx files? I would assume I could always restore those to an “On My Mac” folder if necessary?

(b) For better long-term archiving, does anybody use the “files in folders” method? If so, do you do creative naming and such to make them easier to find (i.e. “2018-01-05 - Sender - Subject”)?

© For anybody using DEVONthink, do you find it to be a very robust program? Can everything be exported quickly if I just wanted to dump my data and go?

Any advice is, as always, greatly appreciated!

I have used and can recommend MailSteward (whose database front-end isn’t terribly Mac-like), and EagleFiler.

https://mailsteward.com

https://c-command.com/eaglefiler/

https://c-command.com/eaglefiler/help/importing-mail

DEVONthink is good and robust, but importing 100K emails would take a very long time, and I would question if it is worthwhile.

Would it significantly slow down DEVONthink, or do you think it would just be a bad expenditure of time?

I’m going to pare down the total amount, but I’m thinking I might wind up with two categories - “stuff that I’m likely to need more frequently” and “stuff I need to hang onto a copy of” - and put the first category into DEVONthink.

Thoughts?

Not sure that’s an either / or situation. Yes, DEVONthink doesn’t do much of anything else while it importing email, so it’s a good to break the job down into chunks. How big the chunks are depends on the machine, memory, and the nature of the email – messages with more attachments take more time to archive.

Is it a bad expenditure of time? If all you want to do with archive in DEVONthink (or somewhere) is to be able to search and read mail in the archive, then maybe consider the alternative. If the main thing is to get the mail out of the IMAP account and offline, then create folder(s) in the “On My Mac” section of Apple Mail and move the email there. Mail in “On My Mac” is removed from IMAP.

More info here.

One option could be to use a dedicated email archiving programme to convert your emails into PDF-archives, and then have Devonthink index that location. You would then have an archive, but with Devonthink’s search functionality/tools included in the mix.

1 Like

By “significantly slow down DEVONthink”, I’m talking about long-term. Like an “if I do this, am I going to be kicking myself in a month because my DEVONthink database search is horribly, horribly slow?” sort of situation.

I know some programs are well-written as far as indexing, speed, etc., and others start getting a little wobbly when you throw large data sets at them.

The answer is not black and white. It depends on your data and your machine and whatever else is happening on the machine . If you import 10,000 emails that each have multi-page PDFs as attachments, then slowdown is possible. If the emails are mainly short, then probably there will be no slowdown. One can have multiple databases in DEVONthink, and multiple databases open at any time. So if the email archive is for occasional and not continual use, then you can manage the load on machine.

I don’t mean to imply DEVONthink is not robust and becomes a slog with large datasets – it is neither.

1 Like

This is very helpful. Thanks!

1 Like

I use MailSteward. It archives the emails and the attachments.

2 Likes

I’ve imported over 15 years of personal email and two business accounts and it has not slowed down DEVONthink at all. I import every week and it takes only a few minutes, although the initial import was slow (I’m not sure exactly how long it took as I left it overnight).

I started doing this after Google lost 3 years of my business emails on a GSuite account I was using for work, due to a “software bug” in their system. Their backups were “corrupted” according to their support and they couldn’t recover any of the emails, so I decided it was a good idea to start making back ups.

Dumping the data can be done by dragging all the emails to Finder.

I tried to have DT do a full import and I only had about 80K emails. It could not finish the task. DT Support suggested breaking the large mailboxes up into smaller segments and trying that. I didn’t bother. Instead I went to a curated mail import by hand, still working the backlog but once I get it under control I’ll import keep messages monthly so as to allow it to work properly.

Have you ever tried to get stuff OUT of MailSteward? I have been totally unable to actually use search to find anything so I gave up on it.

Both MailSteward and EagleFiler work with mbox format. It’s the standard for EagleFiler, but MailSteward exports e-mails to a tab-delimited text file, a new MailSteward database file, an mbox file, or an SQL file.

https://mailsteward.com/manual.html#a-10_exporting

I have not had any problems with searching in MailSteward. I usually search by date range, subject, from, or text within the body. If I want to get an email out, I usually print as a pdf.

1 Like