Skip to content

Link Triage Pipeline

Automated knowledge extraction pipeline processing hundreds of links weekly.

The Challenge

My “Read Later” queue was a graveyard of 5,000+ urls. I needed a way to not just archive links, but to extract value from them without human intervention.

The Solution

I built a semi-autonomous pipeline that acts as a dedicated research assistant. It doesn’t just summariz; it triages based on my specific taxonomy.

The 5-Phase Pipeline

  1. Fetch: Pulls new bookmarks from Raindrop.io via API.
  2. Extract: Uses Jina and Trafilatura to get clean, readable text (no ads/nav).
  3. Classify: Sends content to Claude 3.5 Sonnet with a specific prompt: “Is this a Tool, a Tutorial, or Strategy? tag it against my personal taxonomy.”
  4. Route: Updates the Raindrop bookmark with the new tags and an AI-generated note.
  5. Digest: Generates a weekly Markdown summary of “High Value” reads.

Key Engineering Decision: Semi-Autonomy

Instead of a fully autonomous loop that might burn API credits on junk loops, I implemented a “Human-in-the-Loop” Orchestrator.

The execute-plan CLI runs each phase but pauses for a “quality gate” check. I can run /execute-plan --start-phase 3 to resume work. It balances automation speed with human architectural oversight.

Impact

  • Zero Backlog: Inbox processes automatically.
  • High Signal: I only read what the AI has pre-qualified as relevant to my current projects.
  • Knowledge Graph: My bookmark collection is now a structured, tagged datasets ready for RAG (Retrieval Augmented Generation).
ready