Eigenentwicklung

From the chaos of invoice emails, a clean accounting folder emerges automatically — every PDF with the right name, in the right monthly directory, without anyone opening, renaming, and moving.

Pattern Email & newsletter workflows

Python 3.11
litellm (Anthropic / OpenAI / Azure / Ollama / OpenRouter)
IMAP / POP3 / Exchange

THE STARTING POINT

The starting point

Incoming invoices arrive by email today: mobile carriers, hosting, cloud services, advertising, credit card statements, software subscriptions. Every month the same routine — open the email, download the PDF, rename the file, move it to the right supplier and month folder. For credit card statements there's an extra step: open the PDF, read the amount, build it into the filename so accounting can later see what it's about without opening it.

At two dozen invoices a month, that isn't "just a quick task" — it's the recurring friction point at the start of the month, where typos creep in, invoices accidentally get filed twice, or whole receipts end up lost in the inbox.

WHAT WE BUILT

What we built

A command-line tool that systematically scans the mailbox for invoice senders, extracts the matching PDF attachments, renames them by configurable rules, and files them into the correct folder hierarchy — with a dry-run before every real write, and a state database that reliably prevents double-processing.

The flow

Fetch emails — the application connects to the mail server (IMAP, POP3, or Exchange) and reads emails in the chosen time range. Which senders are processed at all is controlled by a simple CSV file — already-processed emails are recognised by their Message-ID and skipped.
Pick the right attachment — when several PDFs are attached to the same email (e.g. invoice and receipt), the tool decides by clear rules: "Receipt" beats "Invoice", and in unclear cases it asks — the person clarifies the exception once, the tool remembers the decision.
Understand the amount on credit-card statements — for senders whose receipts conventionally carry the amount in the filename (American Express, certain cloud providers), the application reads the PDF and sends the text to an AI model that returns amount and currency in structured form. If that fails, the tool asks — it doesn't guess.
Name cleanly — from receipt date, sender short name, and (where needed) amount, a consistent scheme is produced: 2026-02-15_amex_142.50€.pdf. Other senders keep their original filename — what the banking software or the supplier already used is usually good enough.
Move into the right place — path templates with placeholders like {year} and {month} resolve from the receipt date: /Invoices/amex/2026/2026-02/. Missing directories are created where needed.
Prevent double-processing — every processed email is logged in a local state database with its Message-ID. Running the same job twice doesn't produce two copies — it produces a summary of what was already done.
Report clearly what happened — a summary appears in the terminal at the end: per email, status, sender, subject, final filename, target path, detected amount. Anything that went wrong is visible at a glance.

Dry-run before every write

With --dry-run the tool shows exactly what it would do — without writing a single file or marking an email as processed. Configuration changes can be tested safely before being unleashed on the real mailbox.

Multiple models, one codebase

Which AI model extracts the amount from the PDF is a configuration question — Anthropic Claude, OpenAI, Azure, OpenRouter, or a local model running on Ollama. The choice between cloud and on-premise stays configurable without touching code.

Configuration and credentials separated

Mail server, path templates, and sender mappings live in readable text files (YAML and CSV). Passwords and API keys come from environment variables or a .env file — what can be maintained on site can be maintained on site, without secrets ending up in plain text in the configuration.

WHAT IT GIVES THEM

What it gives them

Routine clicking becomes one command. What used to be email-by-email work is a single invocation with a date range — the rest happens.
Credit-card receipts carry the amount in the filename, without opening. Accounting sees what each one is about while scrolling — no need to open every PDF.
Double-processed receipts are gone. The Message-ID-based state database is more reliable than any "have I done this already?" email marker.
Dry-run makes configuration errors visible — before the damage. Before a real file is written or an email is marked as done, the full run can be played through harmlessly.
Supplier-name sprawl gets structured. Instead of twenty different filename conventions per supplier, a consistent archive — as navigable in two years as it is today.

WHAT WE DELIBERATELY DID NOT AUTOMATE

What we deliberately did not automate

The choice of senders. Which email addresses count as invoice sources is decided by a person — once, in the CSV. So not every marketing email lands in the accounting folder, only what belongs there.
The booking itself. Cleanly named and correctly filed files are produced — no journal entries, no DATEV handover, no creditor records. The downstream accounting system stays as it is.
Disputed cases. When several PDFs are attached and the rule doesn't match, the tool asks — it doesn't guess. The few unclear receipts stay visible.
The mailbox itself. Emails are read but not deleted or moved. The inbox stays the inbox — what gets archived is still decided by the email workflow.

WHY THIS PATTERN TRANSFERS

Why this pattern transfers

The setup works wherever structured documents arrive regularly by email and need to land in a consistent archive — and the content-side handling stays with a person or a downstream system: supplier invoices, reminders, dispatch notices, order confirmations, contract attachments, application documents, official notices.

The pattern: mailbox → sender filter → extract attachment → understand content with AI where needed → name → file → mark as done.

AI steps in precisely where a document needs to become structured information — for example, the invoice amount. The routine mechanics (download, rename, file, no duplicates) are deterministic and therefore reliable. Responsibility for what is processed at all stays with the person — where it belongs.