TheModernSoftware NotebookLM
Overview
Run an agent-first workflow where Codex is the controller. Crawl weekly course materials from themodernsoftware.dev, ingest them into NotebookLM, and produce deep-study outputs without requiring a custom standalone agent service.
The workflow is deterministic through state files and templates:
- Use
templates/course.yaml,templates/weeks.json, andtemplates/week-notebooks.jsonas the single source of run state. - Use
agent-browserfor page crawling and link extraction. - Use
notebooklmfor source ingestion and output generation.
Core principle (read first): references/principles.md
Workflow
Step 0: Initialize runtime workspace
Create a runtime folder and copy templates:
./themodernsoftware-notebooklm/scripts/init-workspace.sh ./runtime
This creates:
./runtime/course.yaml./runtime/weeks.json./runtime/week-notebooks.json./runtime/prompts/*.prompt.md
Step 1: Preflight checks
Verify required tools before crawling:
agent-browser --help
notebooklm status --json
If notebooklm status --json fails, run notebooklm login.
Step 2: Crawl weekly materials (Curriculum-native)
Extract week-scoped sources from the authoritative course page:
./themodernsoftware-notebooklm/scripts/crawl-extract.sh "https://themodernsoftware.dev/" ./runtime/weeks-extracted.json
./themodernsoftware-notebooklm/scripts/merge-weeks.py ./runtime/weeks-extracted.json ./runtime/weeks.json ./runtime/weeks-diff.json
./themodernsoftware-notebooklm/scripts/verify-week-state.sh ./runtime/weeks.json ./runtime/week-notebooks.json
Notes:
- Crawl rules:
references/crawl-rules.md - Merge is incremental: it preserves prior ingest evidence per URL, reclassifies new inputs, and emits a week-level diff (
weeks-diff.json). - Non-course assets with
provenance=manual|external-searchare preserved across merges.
Step 3: Ingest into NotebookLM (Week-scoped + evidence-gated)
Default behavior is one notebook per week (prevents cross-week context pollution).
./themodernsoftware-notebooklm/scripts/notebooklm-per-week.py ./runtime --diff ./runtime/weeks-diff.json
./themodernsoftware-notebooklm/scripts/verify-week-state.sh ./runtime/weeks.json ./runtime/week-notebooks.json
What this does:
- Creates/uses a per-week notebook and stores mapping in
week-notebooks.json. - Prefers direct URL ingest (web / YouTube).
- Waits until each source is
ready(evidence gate) and confirmssource_idappears insource list. - Retries failures with backoff (
course.yaml.retry.*). - If direct ingest fails and
download_urlis available, falls back to download+verify (scripts/verify-file.sh) then upload as--type file. - Records
ingest_status,source_id,ingest_attempts, andlast_errorper asset.
To retry a specific week:
./themodernsoftware-notebooklm/scripts/notebooklm-per-week.py ./runtime --weeks week-01
Step 4: Generate deep outputs
For each status: ingested week:
./themodernsoftware-notebooklm/scripts/generate-week.py ./runtime --diff ./runtime/weeks-diff.json
./themodernsoftware-notebooklm/scripts/verify-week-state.sh ./runtime/weeks.json ./runtime/week-notebooks.json
Outputs are written under ./runtime/outputs/<week_id>/ and paths are recorded back into weeks.json:
lesson-plan.zh-en.mdlecture-notes.zh-en.mdvideo-script.zh-en.md(default)
Optional: generate and download NotebookLM video artifact by setting course.yaml.outputs.video.mode: artifact|both.
Step 5: Failure policy
Apply strict policy from references/failure-handling.md:
- If
themodernsoftware.devis unreachable: fail the current run immediately. - No automatic fallback source.
- Keep failure reason in
weeks.json.errorand per-assetlast_errorfor the affected week/run.
Output Contract
Each completed week must include:
lesson_plan_pathlecture_notes_pathvideo_script_path(whenoutputs.video.modeincludesscript)video_artifact_id/video_local_path(whenoutputs.video.modeincludesartifact)
Do not mark completed if artifact-mode video download is unfinished.
Resources
references/crawl-rules.md: URL discovery and week classification rules.references/notebooklm-deep-mode.md: source ingestion, output generation, language and quality requirements.references/failure-handling.md: stop/retry/error recording policy.references/execution-checklist.md: runbook checklist.references/principles.md: abstract operating principle (week-scoped + evidence gated + incremental).templates/course.yaml: runtime config template.templates/weeks.json: runtime state template.templates/weeks.schema.json: structure contract for state validation.templates/week-notebooks.json: week_id -> notebook_id mapping template.templates/prompts/*.prompt.md: prompt templates.scripts/init-workspace.sh: initialize runtime workspace.scripts/crawl-extract.sh: extract week items via agent-browser.scripts/merge-weeks.py: merge extracted data into runtime state and emit diffs.scripts/verify-week-state.sh: validateweeks.jsonstructure.scripts/verify-file.sh: verify downloaded file is a real document (not HTML/login/error).scripts/notebooklm-per-week.py: ingest per-week sources with retry + evidence gate.scripts/generate-week.py: generate per-week learning outputs.
Common mistakes
- Running generation before source processing finishes.
- Marking week as completed before artifact-mode video download succeeds.
- Losing state by not persisting
source_ids and per-week notebook mapping. - Mixing unrelated links into week assets.