Eigenentwicklung

From an overflowing newsletter inbox, a searchable, tagged, and (where needed) translated knowledge base in Obsidian emerges automatically — with the actual content, without tracking pixels, ad banners, and nested footers.

Pattern Email & newsletter workflows

Python 3.11
litellm (OpenAI / Anthropic / Azure / Ollama)
BeautifulSoup

THE STARTING POINT

The starting point

If you want to stay current professionally, you subscribe to newsletters. If you're serious, several — industry news, product updates, technical deep-dives, tools-of-the-week. After a few weeks the inbox is a graveyard: unread, unsorted, eventually deleted. The content was valuable; the form makes it unusable. Layout tables, tracking pixels, "click here" buttons, triple-nested footers with unsubscribe links — and somewhere in the middle, the paragraph that was the actual point.

Honestly: who really searches their newsletter archive in the mail client two years later?

WHAT WE BUILT

What we built

A command-line tool that collects newsletter emails, strips the HTML clutter, summarises with AI, optionally translates, and files them as clean Obsidian notes in your own knowledge base — complete with tags, short summary, and source reference.

The flow

Fetch the newsletter — the application connects to the mail server (IMAP, POP3, or Exchange On-Premises) and reads new emails. Which senders are eligible at all is configured; already-processed emails are reliably skipped.
Separate signal from noise — not every email from a newsletter sender is content. A two-stage classification — first rule-based, then a cheap AI model — recognises ads, dispatch confirmations, and order notifications and filters them out before expensive processing starts.
Clean up HTML — a five-stage pipeline removes tracking pixels, ad banners, layout tables, and footer remnants. It detects the dispatch platform (Mailchimp, Substack, beehiiv, and friends) and extracts the actual content — data tables are deliberately kept, layout tables are dissolved.
Take the images along — embedded graphics and externally referenced images are downloaded in parallel, placed in a daily folder, and linked correctly in the markdown. Even if the sender shuts down their image CDN in two years — the note stays complete.
HTML becomes Markdown — a tailored conversion produces Obsidian-compliant Markdown with ATX headings, properly formatted data tables, and clean links — no HTML residue, no broken lists.
Understand the content — an AI model reads the newsletter and returns a short summary (TLDR), matching tags, and a meaningful title — structured, not as free text.
Translate where needed — English newsletters are translated to German. A placeholder mechanism reliably protects URLs, image references, and code blocks from being "co-translated".
File in Obsidian — every note gets structured metadata (YAML frontmatter), a TLDR box up front, the prepared content, and a reference to the source email. Filenames follow a configurable pattern; German translations sit next to the original with a _de suffix.

Multiple models, one codebase

Which AI model does the work is a configuration question — OpenAI, Anthropic, Azure OpenAI, OpenRouter, or a local model running on Ollama. Different models can be chosen for the cheap classification and the more demanding content analysis; per run, which model was used when is logged.

Delivery as a self-contained application

The tool ships as a standalone program (Windows or macOS) that runs without a separate Python install. Configuration and credentials live in two files alongside it — what can be maintained on site can be maintained on site.

WHAT IT GIVES THEM

What it gives them

Inbox clutter becomes a knowledge base. What used to gather dust in the mailbox and eventually got deleted is now searchable, linkable, and part of your own research.
Tracking pixels and ad footers end up where they belong — in the bin. What's left is the content.
English isn't a barrier any more. Read the original if you prefer English; have the German translation alongside if you don't — at the press of a key.
Images stay available offline. Content is still fully readable in two years, regardless of whether the sender still hosts the newsletter.
Tags emerge automatically and consistently. What used to be tedious manual work sorts itself — and makes the archive actually navigable.

WHAT WE DELIBERATELY DID NOT AUTOMATE

What we deliberately did not automate

The choice of sources. Which senders are processed at all is decided by a person — once, in the configuration. That way the whole inbox doesn't drift into the knowledge archive, only what belongs there.
The reading, linking, commenting. The tool prepares — reading, connecting to your own notes, condensing into cross-references happens in Obsidian, as before.
The inbox. Local Markdown files are produced. No cloud sync of email content, no extra data flows — the inbox stays the inbox.

WHY THIS PATTERN TRANSFERS

Why this pattern transfers

The setup works wherever HTML, PDF, or web content from emails or feeds regularly needs to flow into a structured knowledge or document base — and the content-side handling stays with a person: tender mailings, supplier and product updates, compliance and regulatory bulletins, market-research briefings, industry press digests, competitor monitoring.

The pattern: source → automatic clean-up → AI understanding → structured filing in your own knowledge base.

AI turns noise into content. The filing structure and the reading workflow stay where they belong — with the person who actually uses the content.

Talk to us

Two doors, one address.

Specific bottleneck?

Let us talk for 30 minutes about your use case.

No obligation, no cost, with concrete next steps at the end.

Book a 30-minute call

Your own AI platform?

See CompanyWizard live in action.

Demo with your own data is possible. We bring the pseudonymisation set up and ready.

Request a demo