566: Paperless Strikes Back

Listened to the podcast, bought the Paperless Field Guide, Bought ScanSnap, bought Devonthink Pro and am now enjoying getting all my bills and invoices digitised.

One question - in the field guide you show us how to organise docs using both Hazel and Devonthink. Which is your preferred method?

They are two completely different things.

I know, but both can rename the files according to the file content and Dave shows us both methods. I was wondering which of the two people prefer to use.

Iā€™ve used both for years. I try to use Hazel for things that are limited to the operating system, and DEVONthink rules for things that are related to DEVONthink. Each works in its own world terribly well, but less well when working in the otherā€™s world.

I wrote up a short summary of one aspect of using Hazel to get files into DEVONthink. See https://rmschneider.wordpress.com/2020/12/04/devonthink-and-hazel/

1 Like

I use Hazel but not Devonthink. The reason is that I like to keep documents in the OS file system hierarchy so that they can be easily accessed by anything anywhere.

1 Like

DEVONthink does not preclude what you want to do. Easy to accomplish.

Iā€™ve been doing a rather strange experiment over the past few years and defining folder structures around roles in my life. It isnā€™t as big of a deal for paperless records but makes a ton of sense with things like task and project management, PKM, and other digital structures.

2 Likes

Same here. But can be done with DevonThink (index folders).

1 Like

Thatā€™s why I use my standard filing system in Finder but ALSO have a separate DEVONThink database of the file cabinet that is an index of that folder. That way I get the best of both worlds.

2 Likes

How do you handle things that are part of more than one role?

3 Likes

Well, bought the paperless field guide and have completed it. I have for many years been paperless (except for my wifeā€™s medical records which she likes to hoard) but it has never been very automated. After finishing the field guide I got Hazel (@MacSparky field guides always cost me money) and started setting up bunches of rules to make my life easier. One issue I am having is that Hazelā€™s ability to read PDF contents has been inconsistent. For example, rules I set up for my bank statements work most of the time but do not for a handful of files. Perhaps the OCR for the bank PDFs are not equally good, but I seem to be able to copy and paste from them just fine. Also, I bank with Wells Fargo. It seems the only way that I can download statements is to do it one at a time manually. Anyone have a solution for this? This kills much of my time savings. Many other things I am working on but will save them for later. Thanks!

Some sites have two download choices, where one is OCRed and the other is just images of the statements. To handle that Iā€™ve got several Hazel rules that are at the bottom of the list to catch documents that cannot be identified because they arenā€™t OCRed. This process I got from Katie Floydā€™s website, based on a less powerful version of David Sparks. I added some tweaks of my own. It uses PDFPenPro to do the OCR.

![Banners_and_Alerts|690x360](upload://pQ

0iq6Sqh0grFbQFyBVeu8PvqXz.jpeg)
If the file is a PDF and hasnā€™t been OCRed by this step (prevents endless recursion!) and does not contain an ā€œeā€ or ā€œaā€ which probably means it has not been OCRed, then it runs the script:

tell application "PDFpenPro"
	open theFile as alias
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	tell application "PDFpenPro"
		quit
	end tell
end tell

marks the file as OCRed and continues matching rules ā€“ so it tries again and should match an earlier rule.

Thanks for sharing! I actually do have use for something like this although I am not sure it address the issue I described since all the documents seem to be OCRed. After spending a little more time with this it does seem that Hazel just gets stalled once in a whileā€¦particularly when I throw a whole bunch of files at one time. Anyway, I figure I will learn how to deal with these sorts of things with time.

Those looking at automating OCR should look at this Automators post - if you are willing to brew install ocrmypdf to use OCRmyPDF then it is a free OCR solution.


Iā€™ve completed the Paperless Field Guide and have previously set up Hazel to use this script to OCR stuff that I drop into a particular folder. My paperless workflow is a lot simpler as the ā€˜captureā€™ phase can be quite simple (no fancy OCR app required) and the Hazel OCR rule just OCRs (and then compresses) any PDF it comes across. I get a small PDF with text included, ready for the next organisation step. Note that you will also need to brew install ghostscript to install ghostscript, used for PDF compression.

image

As with any script, use with care, but this creates a new instance of the file (beware of losing tags, etc) first OCRing, then compressing the OCR version. Note that the rule ignores those temporary files. It also logs to an output.log file. This was useful during development, feel free to remove logging.

# Get elements of filename: path, filename, extension
a=$1
xpath=${a%/*} 
xbase=${a##*/}
xfext=${xbase##*.}
xpref=${xbase%.*}

# Output Filename for OCR my PDF
inputOcrFilename="$1"
outputOcrFilename="${xpath}/${xpref}_ocr.${xfext}"

# Process OCR my PDF
echo "Process OCR my PDF" > output.log
ocrmypdf "${inputOcrFilename}" "${outputOcrFilename}" --skip-text >> output.log

# Output filename for Compress PDF
inputCompressFilename="${outputOcrFilename}"
outputCompressFilename="${xpath}/${xpref}_ocr_compressed.${xfext}"

# Process Compress PDF
echo "Process Compress PDF" >> output.log
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4  -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile="${outputCompressFilename}" "${inputCompressFilename}" >> output.log

# Delete Original File
echo "Delete Original File" >> output.log
rm "$1" >> output.log

# Delete OCR my PDF temp file
echo "Delete OCR my PDF temp file" >> output.log
rm "${outputOcrFilename}" >> output.log

# Move the new file to the Ready folder
echo "Move the new file to the Ready folder" >> output.log
readyFilename="${xpath}/PDF Prep Ready/${xpref}.${xfext}"
mv "${outputCompressFilename}" "${readyFilename}" >> output.log
3 Likes

Did anyone see that Fujitsu had introduced a couple new ScanSnap models? I believe one is called the ix1600

Thank you for this! Iā€™ve been frustrated lately because the key items I use for Hazel for many of my downloaded docs donā€™t seem to be recognized anymore, like account numbers. Maybe this will solve that.

Stephen (I think) mentioned using sparse images for sensitive info, like taxes, and not keeping the folders in the cloud. Iā€™d like to hear about how he backs them up. Iā€™d be afraid Iā€™m not utilizing the 1-2-3 method and might lose them if there is a local catastrophe. I went paperless using Davidā€™s book many years ago, and Iā€™m excited to explore the new version.

Apparently the iX1600 hardware is identical to the iX1500 but for scanning 40ppm instead of 30ppm. They claim some software improvements, but nothing struck me as earthshaking.

There is also an iX1400 which bears an amazing resemblance to the iX500 ā€“ no LCD display. Also is USB only (no WiFi). Lower price. Same speed as the iX1600.

Looks like all models work with ScanSnap Home or the original ScanSnap Manager. The iX1400 like the iX500 only has one computer license for use with ScanSnap Home.

@MacSparky Curious to know your opinion of Genius Scan compared to the other scanning apps you reviewed? I find that I eventually return to Genius Scan after giving other apps a test drive.

The only viable way to manage that is what works best for you. I like to get as much as possible into a folderā€™s name eg so that Iā€™m essentially describing whatā€™s inside until I get the hang of it and itā€™s practically second nature.

Also you could generate a key list. And you could use ticklerā€™s to remind yourself that the info is elsewhere like ā€œBanks accountsā€“ see under (childā€™s) name.ā€

2 Likes