Paperless-ngx (a community-supported open-source document management system)

This Mastodon post from Casey Liss made me aware of Paperless-ngx:

Paperless-ngx screenshot

Search on this forum returns a few older posts that mention this software.

Is anyone using this right now and willing to share her/his experience?

(I’m wondering whether I should put investigating this on my todo-list)

Yes, I’m using this on my Synology. I screwed up initially because it didn’t work, and I couldn’t figure out why - but it was because of the firewall stopping access between the docker containers. However, I’ve got the firewall off now and it works fine.

I’ve set Hazel to upload downloaded PDF’s straight to it. Likewise, I have a Shortcut that allows me to use the share sheet to send PDF’s straight to it using it’s in built API.

Overall, it seems to work nicely - better than just filing in to folders from what I’d been doing previously. The search seems to work well and the OCR isn’t bad - I use it’s built in OCR, rather than the OCR of any scanners I might use (one of the scanners I use doesn’t do OCR). The files are accessible as well if you ever wanted to move away from it as well.

3 Likes

It is not clear where the documents are stored but seems like it stores there somewhere on the Internet. If it can run locally then perhaps I could be useful provided that the original documents are not moved from their current locations. It needs to be local as many of the documents that I would store in it contain private information about other people — especially those deemed by UK laws as “vulnerable” adults — that cannot be uploaded to an external application.

That’s an important requirement for me as well, but if I understood this FAQ correctly, documents are stored (unmodified) in a Docker volume. So if that Docker instance rus on a local device (instead of a hosted machine), we should be good??

No idea. I neither understand nir use Docker for anything.

If I could host paperless-ngx on a machine on my intranet then I might be interested.

It stores documents on whatever device you run the server on. So if you run it on your laptop, it stores them on your laptop. If you run it on a headless Mac Mini or a Synology, it would store them there.

1 Like

I just watched a couple of videos online on how to install it, and it was over my head since I have never used Docker before. So it might be a good project for me. Not sure I need it though, because DevonThink basically does the same thing doesn’t it? OCR everything and then making it easy to find with search.

It creates a copy of the file in it’s “database” (though that’s just a folder). It’s like Calibre, iTunes or anything else that has a database - you should use it to find the files and “ignore” what it’s doing in the background, as long as it doesn’t modify them or you can get the files out again. However, this does allow you to set a filename formatting option for saving the files, so you could access them from the file browser.

No, from what I know of Devonthink, you can probably achieve the same with that (OCR and tagging).

EDT: Reread my post to @RunningBoris - I meant that no, you probably don’t need Paperless, if you’re already a Devonthink user, as that should do the same (OCR and tagging of documents).

1 Like

I saw Casey’s post, had a quick look, and my initial thought was “now there’s a solution to wasting time on a problem I already have solved.”

If my goal was to set up a server to hold my documents, then I might be interested. It’s not. My goal is to capture documents and have them available to me. KeepIt does that.

  • It OCRs automatically.
  • It stores everything in iCloud so I have access from anywhere/any device.
  • The documents are accessible outside of KeepIt if needed.

:man_shrugging:

Here’s my guide to setting up KeepIt.

  1. Buy it.
  2. Use it.

I guess that’s the same with any software! I have Eaglefiler - which I believe should do similar to Paperless. But I don’t use it often - not entirely sure why, but this clicked for me (and it’s cross platform, as it’s self hosted, so it doesn’t matter if I’m on my Windows PC - which won’t be an issue for some).

1 Like

Some of the documents I would add to such a document management system are PDF files containing (non-OCR) scans of documents that have been amended by hand. Tick marks, rulings, additional lines. Some of those things are not much more than scribbles (and in some cases ambiguously positioned) but they throw off OCR scans. Oh and they are two column listings. One of the AI chat systems makes an attempt at scanning the documents but its success rate is less than 70% accurate; other AI chats mess it up.

Associates working on similar documents have tried Tesseract but it too is not good at extracting the data from these scanned images.

A good point, but as soon as I see the word “hosted” these days, I read it as “hard work” and just don’t go any further with it. If I was 30 years younger, maybe. I’ve moved past the point my life were “cool to tinker with” was a positive.

Fair play.

I followed a guide on the internet and it worked straight off the bat, but it is something else to learn. Interesting to see various projects online that attempt to make self hosting easier for the masses to self host project like this - CasaOS for example.

Don’t forget that “self hosting” is a phrase carrying a lot of baggage. That “host” takes significant work on its own.

That was my thought when I checked it paperless-ngx out. My needs for file management are easily covered by Notebooks which is somewhat similar to KeepIt. But I am sure I will be tempted in the future by something :sweat_smile:

1 Like