566: Paperless Strikes Back

GaryP · December 28, 2020, 10:54am

Listened to the podcast, bought the Paperless Field Guide, Bought ScanSnap, bought Devonthink Pro and am now enjoying getting all my bills and invoices digitised.

One question - in the field guide you show us how to organise docs using both Hazel and Devonthink. Which is your preferred method?

rms · December 28, 2020, 12:53pm

They are two completely different things.

GaryP · December 28, 2020, 4:44pm

I know, but both can rename the files according to the file content and Dave shows us both methods. I was wondering which of the two people prefer to use.

rms · December 28, 2020, 4:59pm

I’ve used both for years. I try to use Hazel for things that are limited to the operating system, and DEVONthink rules for things that are related to DEVONthink. Each works in its own world terribly well, but less well when working in the other’s world.

I wrote up a short summary of one aspect of using Hazel to get files into DEVONthink. See https://rmschneider.wordpress.com/2020/12/04/devonthink-and-hazel/

tomalmy · December 28, 2020, 5:17pm

I use Hazel but not Devonthink. The reason is that I like to keep documents in the OS file system hierarchy so that they can be easily accessed by anything anywhere.

rms · December 28, 2020, 6:58pm

DEVONthink does not preclude what you want to do. Easy to accomplish.

MacSparky · December 29, 2020, 5:45am

I’ve been doing a rather strange experiment over the past few years and defining folder structures around roles in my life. It isn’t as big of a deal for paperless records but makes a ton of sense with things like task and project management, PKM, and other digital structures.

Lars · December 29, 2020, 9:05am

Same here. But can be done with DevonThink (index folders).

OogieM · December 29, 2020, 1:41pm

That’s why I use my standard filing system in Finder but ALSO have a separate DEVONThink database of the file cabinet that is an index of that folder. That way I get the best of both worlds.

OogieM · December 29, 2020, 1:43pm

How do you handle things that are part of more than one role?

Lampornis · December 30, 2020, 3:12am

Well, bought the paperless field guide and have completed it. I have for many years been paperless (except for my wife’s medical records which she likes to hoard) but it has never been very automated. After finishing the field guide I got Hazel (@MacSparky field guides always cost me money) and started setting up bunches of rules to make my life easier. One issue I am having is that Hazel’s ability to read PDF contents has been inconsistent. For example, rules I set up for my bank statements work most of the time but do not for a handful of files. Perhaps the OCR for the bank PDFs are not equally good, but I seem to be able to copy and paste from them just fine. Also, I bank with Wells Fargo. It seems the only way that I can download statements is to do it one at a time manually. Anyone have a solution for this? This kills much of my time savings. Many other things I am working on but will save them for later. Thanks!

tomalmy · December 30, 2020, 5:51pm

Some sites have two download choices, where one is OCRed and the other is just images of the statements. To handle that I’ve got several Hazel rules that are at the bottom of the list to catch documents that cannot be identified because they aren’t OCRed. This process I got from Katie Floyd’s website, based on a less powerful version of David Sparks. I added some tweaks of my own. It uses PDFPenPro to do the OCR.

![Banners_and_Alerts|690x360](upload://pQ

0iq6Sqh0grFbQFyBVeu8PvqXz.jpeg)
If the file is a PDF and hasn’t been OCRed by this step (prevents endless recursion!) and does not contain an “e” or “a” which probably means it has not been OCRed, then it runs the script:

tell application "PDFpenPro"
	open theFile as alias
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	tell application "PDFpenPro"
		quit
	end tell
end tell

marks the file as OCRed and continues matching rules – so it tries again and should match an earlier rule.

Lampornis · December 30, 2020, 11:38pm

Thanks for sharing! I actually do have use for something like this although I am not sure it address the issue I described since all the documents seem to be OCRed. After spending a little more time with this it does seem that Hazel just gets stalled once in a while…particularly when I throw a whole bunch of files at one time. Anyway, I figure I will learn how to deal with these sorts of things with time.

scottdellar · January 10, 2021, 4:16am

Those looking at automating OCR should look at this Automators post - if you are willing to brew install ocrmypdf to use OCRmyPDF then it is a free OCR solution.

I’ve completed the Paperless Field Guide and have previously set up Hazel to use this script to OCR stuff that I drop into a particular folder. My paperless workflow is a lot simpler as the ‘capture’ phase can be quite simple (no fancy OCR app required) and the Hazel OCR rule just OCRs (and then compresses) any PDF it comes across. I get a small PDF with text included, ready for the next organisation step. Note that you will also need to brew install ghostscript to install ghostscript, used for PDF compression.

As with any script, use with care, but this creates a new instance of the file (beware of losing tags, etc) first OCRing, then compressing the OCR version. Note that the rule ignores those temporary files. It also logs to an output.log file. This was useful during development, feel free to remove logging.

# Get elements of filename: path, filename, extension
a=$1
xpath=${a%/*} 
xbase=${a##*/}
xfext=${xbase##*.}
xpref=${xbase%.*}

# Output Filename for OCR my PDF
inputOcrFilename="$1"
outputOcrFilename="${xpath}/${xpref}_ocr.${xfext}"

# Process OCR my PDF
echo "Process OCR my PDF" > output.log
ocrmypdf "${inputOcrFilename}" "${outputOcrFilename}" --skip-text >> output.log

# Output filename for Compress PDF
inputCompressFilename="${outputOcrFilename}"
outputCompressFilename="${xpath}/${xpref}_ocr_compressed.${xfext}"

# Process Compress PDF
echo "Process Compress PDF" >> output.log
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4  -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile="${outputCompressFilename}" "${inputCompressFilename}" >> output.log

# Delete Original File
echo "Delete Original File" >> output.log
rm "$1" >> output.log

# Delete OCR my PDF temp file
echo "Delete OCR my PDF temp file" >> output.log
rm "${outputOcrFilename}" >> output.log

# Move the new file to the Ready folder
echo "Move the new file to the Ready folder" >> output.log
readyFilename="${xpath}/PDF Prep Ready/${xpref}.${xfext}"
mv "${outputCompressFilename}" "${readyFilename}" >> output.log

BrettFL · January 30, 2021, 4:24pm

Did anyone see that Fujitsu had introduced a couple new ScanSnap models? I believe one is called the ix1600

mcginnie · January 31, 2021, 6:45pm

Thank you for this! I’ve been frustrated lately because the key items I use for Hazel for many of my downloaded docs don’t seem to be recognized anymore, like account numbers. Maybe this will solve that.

mcginnie · January 31, 2021, 6:52pm

Stephen (I think) mentioned using sparse images for sensitive info, like taxes, and not keeping the folders in the cloud. I’d like to hear about how he backs them up. I’d be afraid I’m not utilizing the 1-2-3 method and might lose them if there is a local catastrophe. I went paperless using David’s book many years ago, and I’m excited to explore the new version.

tomalmy · January 31, 2021, 10:09pm

Apparently the iX1600 hardware is identical to the iX1500 but for scanning 40ppm instead of 30ppm. They claim some software improvements, but nothing struck me as earthshaking.

There is also an iX1400 which bears an amazing resemblance to the iX500 – no LCD display. Also is USB only (no WiFi). Lower price. Same speed as the iX1600.

Looks like all models work with ScanSnap Home or the original ScanSnap Manager. The iX1400 like the iX500 only has one computer license for use with ScanSnap Home.

pilotgt · February 21, 2021, 2:47pm

@MacSparky Curious to know your opinion of Genius Scan compared to the other scanning apps you reviewed? I find that I eventually return to Genius Scan after giving other apps a test drive.

Katie · February 23, 2021, 6:19am

The only viable way to manage that is what works best for you. I like to get as much as possible into a folder’s name eg so that I’m essentially describing what’s inside until I get the hang of it and it’s practically second nature.

Also you could generate a key list. And you could use tickler’s to remind yourself that the info is elsewhere like “Banks accounts– see under (child’s) name.”