Musings: How to archive email

Next in the series: Thinking about archiving emails.

Some time ago, there was a thread talking about how to save and potentially export emails to files. That got me thinking about the idea of exporting my emails for future saving.

Sometime thereafter, I was looking through Stephen Wolfram’s blog and found a posting where he noted that he saved basically every email he has ever received, all of which are readily searchable via his custom search engine (which I would assume was written in the Wolfram Language, but I digress).

In any case, since that point I have been mulling over the idea of saving emails. I do have a huge archive of emails from work, because our email is in Gmail so pretty much everything gets saved, but for my personal email I do not have any real archive - once deleted and erased from Trash, pretty much gone.

I have found that I am always searching back through email because I cannot find where I made a note of something or perhaps I never remembered to record the content of an important email. As a result, it seems to me that it would be beneficial to set up a system to essential save all my emails in an accessible and searchable format. The question is, how to do this?

I have toying with two ideas, and would appreciate thoughts.

  1. Import all emails into DevonThink using the DT plug-in to Mail.app.
    Advantages:
    – It’s easy
    – DT provides an easy enough way to organize things
    – I could probably find some way to automate sorting the emails into personal and work related automatically
    – DT displays the email in an easily readable format
    – Attachments can be double-clicked to view, and/or could be dragged out of the email but still within DT to make them separate entries

Disadvantages:
– Readily searchable only inside DT
– I do not know if DT’s search will encompass only the email text or if it will magically OCR attachments (word documents, PDFs) so they are searchable as well. (I know DT can OCR and search a PDF, but that’s a PDF document in DT, not an attachment in an email document - anyone know the answer to this?)
– Syncing the data across computers requires setting up a DT sync store (not hard).

  1. Export emails to regular files. I threw together a quick script that takes a file containing a single email message (saved from mail in .eml format) and parses it. The text component of the email is saved to a text file with all the email headers intact. The html component if present is saved to an html file. All attachments are saved to separate files. Right now the filename is .txt or .html, and for attachments the same filename with the attachment filename appended.

Advantages:
– Email files are readily copy-able, share-able, sorted on disk by the date and message id, and if I have Hazel for example automatically ocr pdf’s, indexed and searchable via Spotlight or at the command line with grep.
– Individual attachment files can be easily accessed transparently.
– Sync happens automatically if I put the messages in a folder in my Dropbox-like cloud folder

Disadvantages:
– Yet another script I need to maintain
– Tons of disk files may be harder to readily search vs searching in DT
– Lots of cruft. For example, email messages that have an embedded image (for example company logo) get those images saved as files separately which is just messy
– The email message can be easily viewed via QuickLook, or in BBEdit, but isn’t pretty.
– The filenames are basically horrible. The date prefix is OK, but a subject to an email often contains too many characters and just and while convenient for searching, is just a mess (I do convert spaces and periods to underscores). Including the message ID makes sure that every file has a unique name, BUT makes the filenames incredible long, impossible to reasonably type, and just basically ugly.

  1. Save the email as a single .eml file exported from Mail.
    Advantages:
    – same advantage as in (2) for files on disk, but only one file per message

Disadvantages:
– Since attachments remain in BASE64 encoded form in the email, they are not viewable or searchable at all unless I then decode the email with the script from (2), which makes this pretty much useless.

So, what do you do to save your emails? Something like what I am considered in (1) or (2), or something else entirely?

1 Like

I don’t care to save my email, but if I did, I would look at EagleFiler:

https://c-command.com/eaglefiler/

Use a tool built to solve the problem you want to solve.

1 Like

At one time I used Mailsteward to archive all email going through my company’s email server. This was probably 8 - 10 years ago, perhaps longer. Everything was kept in a mySQL database. Seems like I had around 2 million messages in the system when we purchased our first Barracuda Message Archiver. Mailsteward is still around and you can use it for free up to 15000 emails.

These days I leave everything in Gmail. Using search in webmail or the Gmail IOS client I can normally find what I’m looking for on the first try. About every 60 days I export everything except photos (email, calendars, contacts, etc.) via takeout.google.com and store the file on S3 for 6 months or so.

Prior to that I would use the Export function in mail.app to export the Archive folder and store that file for safe keeping. The file that mail.app creates when exporting is standard MBOX format that many/most? email clients can import.

Several years ago, I “stress tested” Apple mail just for fun. Performance was reasonably good with around 800,000 emails in the account. For all its limitations mail.app is a pretty decent client.

https://mailsteward.com

I took a quick look at EagleFiler. I’ve seen it before, and I may even have tried it out at one point - I cannot recall for sure and don’t know why, if I tried it, I didn’t stick with it so it bears a relook.

It appears, however, that EagleFiler stores emails in either mbox or .eml format, which would presumably mean that attachments remain part of the email raw message, which means BASE64 encoded and hence not searchable. I will have to look into that further.

There re a lot of options like these:

https://www.mothsoftware.com/

1 Like

Haven’t had a need to do this in a while – which makes me think I should give it a go again – but when I did, it worked a charm.

I agree with @Wolfie; most email should be permanently deleted. The few emails that need to be saved can be exported to a pdf file and stored appropriately in a standard filing system, not in an email database file.

My problem is that Gmail’s emails handled through Apple’s Mail app are placed in something called “All Mail”. These emails are hard to delete. The iMac Mail app bogs down and sometimes refuses to batch delete these Gmail emails. Even though I delete the emails from the inbox, they still appear in “All Mail”.

@Arthur and @Wolfie: Obviously, I do not fully agree (or I wouldn’t have embarked on this project).

I do agree that not every email needs to be saved; I think Mr. Wolfram has perhaps exaggerated the need with his protocol of having saved every email forever.

Of course I don’t plan to save spam; there’s only so much Viagra one person can take!

However, a lot of my email contains things that I do need to retain, whether instructions for a work-related process, confirmation of a work or personal transaction that I need to keep, etc. In the former case I generally try to create a note (in Notes) with the necessary information, while in the latter the actual email (and sometimes the entire thread) needs to be kept for later reference and possibly confirmation. It is not that uncommon that I have to go back through old emails to find something, and so I want a better system in which I control the organization.

Printing to PDF is a fine idea, but does not address the issue of attachments effectively. Yes, I could print to PDF and also save the attachments in files, which is what the script I wrote does, save that instead of a PDF it stores the information in a text file instead - which is much less pretty, but far more searchable without the need for OCR.

So essentially I am doing what you have suggested, but making sure I still have the attachments. I could, I suppose, come up with a way to save the email text as a PDF as well instead of as a text file (my python script won’t do that very well, but I could certainly use AppleScript to have mail write the message to a PDF and save the attachments as well.

The downside to saving as PDF is that all the header information is lost, but I cannot recall a time when I needed to go back to the header information in an old email message - although losing the email addresses themselves could be an issue.

Still, the question is still: Save them in DT, EagleFiler, or just in the file system?

1 Like

I use Evernote for emails I want to keep.

I save all the email I save in DT. I routinely refer to emails that are 10-20 or more yrs old so it’s important to have the robust searching I get with DT. (Some of my oldest are old Compuserve messages and some from really early, when domain names had to be 8 characters or less. Which is why we still own dsrtweyr.com as a domain name)

2 Likes

CompuServe, $6/hr. Good times.

I ran TAPCIS on a Bondwell laptop (one 3.5in floppy for DOS, another for programs & data, 2400 baud modem, no hard drive)

1 Like

I’m in the save them as a PDF in the file system camp. Works well enough for my personal needs. Being retired I have less requirements than others. Spotlight does fine for searching.

One issue not discussed is document retention policies. If the emails are for a business, the archiving should follow your retention policy. If you don’t have one, I suggest consulting with an attorney regarding the legal implications.