most people who hear "second brain" think of obsidian, plain markdown, and a forest of links. that's roughly where i started. what i have now is a little more involved — the same markdown vault, but with an llm agent attached to it that handles the work i'd otherwise never do: reading every source carefully, summarising it consistently, cross-linking, and keeping an index that doesn't rot.
this post is the architecture, the rules i run it by, and what's actually worked.
the three layers
the vault has exactly three top-level folders. conflating them is the failure mode i've watched second brains die from.
SecondBrain/
├── raw/ # immutable source documents
├── .cache/ # disposable working area
└── wiki/ # everything the agent writesraw/ is the inbox and the archive. pdfs, articles, screenshots, course notes, exported chats, github repo stubs. anything i want to remember the agent has read. read-only by rule — nothing in raw/ is ever modified or deleted.
.cache/ is throwaway. cloned repos land here under .cache/repos/<owner>-<name>/. safe to nuke whenever. the wiki never cites from cache — citations point at the wiki page or the stub in raw/repos/ that wraps the cache entry.
wiki/ is the entire output surface. source summaries, entity pages, concept pages, syntheses, an index, and a chronological log. this is the artefact that compounds.
the discipline of keeping these three layers separate is what makes the system survive contact with use. when raw/ and wiki/ blur, the immutable source eventually gets edited, and now you can't trust the chain.
page conventions
every wiki page (except the index and the log) starts with frontmatter:
---
title: page title
type: source | entity | concept | synthesis
created: YYYY-MM-DD
updated: YYYY-MM-DD
sources: [list of source page slugs cited]
tags: [optional]
---cross-links are obsidian-style: [[entities/some-person]], [[concepts/some-idea]]. external links use standard markdown. claims that originate from a specific source are cited inline:
bcrypt is intentionally slow, which is the property that makes it a good password hash. ([[sources/some-paper]])
one concept per page. if a page passes ~500 lines or covers two distinct things, it gets split. filenames are kebab-case slugs of the title. no spaces, no capitals, no dates in filenames.
the ingest workflow
when i drop a source in raw/ and ask the agent to ingest it:
- read the full source. for images, view them. for long pdfs, multiple passes.
- discuss the takeaways with me before writing. this is the rule i keep tightening. the agent's draft of "what matters" is almost always close-but-not-quite, and the five-minute conversation before writing saves an hour of re-editing after.
- create the source page — frontmatter, one-paragraph summary, key claims bulleted, short attributed quotes, open questions raised.
- update entity and concept pages for every notable thing mentioned. new page or new dated section on an existing page, always cited back to today's source.
- cross-link. every new page is reachable from at least one other page.
- update the index. the index is a content-oriented catalog — sources, entities, concepts, synthesis, each entry a link and a one-line summary.
- append to the log. chronological, append-only, one entry per session.
steps 4–7 are the bookkeeping. they are also the steps i would never have done by hand. the agent does them every time, identically. that's the whole multiplier.
the repo ingest
a useful specialisation. stubs in raw/repos/<owner>-<name>.md look like this:
---
type: repo
url: https://github.com/owner/repo
private: true
default_branch: main
last_synced: null
focus: "auth flow, admin panel"
---
why this repo is in the wiki.the focus field bounds the ingest — for a large repo, "everything" is the wrong answer. the agent clones into .cache/, surveys structure with README and package.json, then drills into the directories named by focus. it produces a source page, entity pages for major modules, and concept pages for patterns the repo embodies.
after the ingest, the stub gets updated with last_synced: <today> and a note on what scope was covered. future re-ingests know what's already documented.
the hard rules
a few rules i've learned the hard way:
- never modify or delete anything in
raw/. once it's the source of truth, it has to actually be the source of truth. - never invent claims. if a claim is unsourced, mark it that way. if it needs to be sourced, run a web search with permission and cite the search result.
- prefer updating an existing page over creating a duplicate. search the index before creating.
- flag contradictions instead of silently overwriting. new content that disagrees with an existing claim gets a
> ⚠ Conflict:callout citing both sources. the wiki tracks disagreement honestly. - no long verbatim quotes. original wording. short attributed quotes only.
these aren't aesthetic preferences. each one closes a specific failure mode i have personally watched a knowledge base fall into.
what it's actually for
two things.
the first is retention. i read better when i know something downstream depends on me reading well. a page i'll cite tomorrow is a page i'll read carefully today.
the second is compounding context for the llm itself. every conversation with the agent pulls from the entire wiki. the questions i ask in may benefit from the sources i ingested in february. the portfolio i'm building right now is partly assembled from project descriptions that reference wiki pages that reference source repos that reference course material from two years ago. the chain holds, and the chain is the value.
what it isn't
it's not a published artefact. the wiki is private — a github repo synced over PAT auth, mirrored across my devices. nothing in it is meant to be read by anyone but me and the agent. when something from it deserves to be public — like the post you're reading — it gets rewritten as standalone prose. the wiki is the workshop. the blog is the gallery.
it's also not a substitute for memory or for thinking. it is a tool that lowers the cost of being careful with sources, so i'm careful more often.
what i'd do differently if starting again
the flat-folder structure took me a few weeks to commit to. i started with wiki/projects/cybersec/ and wiki/projects/full-stack/, which felt natural and was wrong. the right shape is flat, link-based — one folder per type (sources, entities, concepts, synthesis), and let cross-links do the rest. karpathy was right.
the agent's frontmatter was inconsistent in the first month — i'd write some pages, the agent would write others, and the schemas drifted. writing a CLAUDE.md at the root of the vault that documents the schema as the source of truth fixed this immediately. the agent reads it on every session. new conventions get added to that file, and they hold from then on.
if i were starting today: flat folders from day one, CLAUDE.md from day one, three-layer separation from day one. everything else is downstream of those three.