Homelab Journey Part 6: Document Management with Paperless-ngx

Posted on 26/06/14 in Homelab

Paperless-ngx is one of those services that sounds boring until you need it.

Nobody gets excited about scanning paperwork. That is probably why the piles build up. A letter arrives, it gets put somewhere “safe”, and then six months later you need the exact thing you were sure you kept.

I wanted a better answer than drawers, folders, and hope.

Paperless-ngx gives me a simple workflow: capture the document, let OCR make it searchable, tag it properly, and then keep the original somewhere backed up. It turns paperwork into data I can actually find again.

What I wanted from it

I did not want a perfect archive.

That matters. Perfect systems tend to collapse because they need perfect habits. I wanted something good enough that I would keep using it.

The goals were:

scan letters before they become piles
ingest useful email attachments
search by text, sender, type, or date
keep originals safe
avoid a folder structure that needs constant gardening

That last point is important. A rigid folder structure always feels sensible at the start. Then real documents arrive and refuse to fit neatly. Tags and correspondents are more forgiving.

Running it on the homelab

Paperless runs as part of the Swarm like the rest of the services.

The state lives on shared storage, not on an SD card. That includes the media directory, consume folder, database storage, and anything Paperless needs to survive a redeploy. The stack is deployed through the normal process so I am not hand-editing a special snowflake service.

The Pi hardware can run it, but OCR is not free. It is one of the workloads where the limits show up. A small batch of documents is fine. A large import needs patience. That is acceptable for me because this is a home document system, not a company scanning department.

The trick is to make the daily path easy and avoid huge backlogs.

The capture workflow

The workflow is deliberately plain.

If it arrives on paper, it gets scanned and dropped into the consume folder. If it arrives by email and matters, it goes through email ingestion or gets saved into the same path. Paperless picks it up, runs OCR, and leaves it ready for review.

Then I do the human bit:

check the title
confirm the correspondent
set the document type
add tags if they help
archive the physical copy only if I no longer need it

That review step is worth keeping. Automation can get close, but paperwork is full of awkward edge cases. I would rather spend a few seconds checking the result than build a complicated rule system that still gets things wrong.

Tags beat folders

Folders force one answer. Tags allow several.

A document can be about the house, insurance, and a specific supplier at the same time. Trying to pick one folder for that is silly. Paperless makes it easier to find documents by the way I remember them later.

Usually I remember one of these:

who sent it
roughly when it arrived
what it was about
a word inside the document

That is enough. OCR and metadata do the rest.

What needs backing up

Paperless is only useful if I trust the data to survive.

That means the originals, database, configuration, and media files need to be part of the backup set. It is not enough to back up the container definition. The container is replaceable. The documents are not.

This is the same lesson as the rest of the homelab. A service is not just the thing running in Docker. It is the state behind it.

What is still manual

There is still manual work.

Some scans are poor. Some letters have odd layouts. Some documents need better titles. Email attachments can be noisy. Paperless makes the job smaller, but it does not remove judgement.

That is fine. I do not need it to be magic. I need it to be better than a pile of paper and a vague memory.

The payoff

The payoff is not dramatic. It is quieter than that.

When I need a document, I search instead of rummaging. When a letter arrives, there is a place for it to go. When I think “I should keep this”, I have a system that does not depend on me inventing a folder name.

That is enough to make Paperless-ngx a core service.

Not exciting. Very useful.