Claudia Website!

How We Escaped Dev Environment Hell (And Made It Agent-Friendly)

2026-01-02T00:00:00+00:00

Building a multi-worktree development platform that humans and Claude can both navigate

The Problem: Death by a Thousand Paper Cuts

Every startup has dev environment debt. Ours was getting bad.

The symptoms:

A Makefile with 47 targets, half undocumented, some broken
Three different setup-env.sh scripts that contradicted each other
Devs running with mock API keys, then wondering why integrations “worked locally but not in staging”
“How do I see the logs?” asked weekly in Slack
“Which port is the API on again?” followed by someone sharing a screenshot of their terminal
Postgres running on 5432… unless you’d changed it… and forgot

Then we added Claude Code to the mix:

Every dev now had 2-5 Claude worktrees running simultaneously. Claude would spin up the dev environment, start working, and then… another Claude instance would start the same services. Port conflicts everywhere. Database migrations stomping on each other. One agent’s test run nuking another agent’s seed data.

Then there’s just… the sheer number of services:

Standing up the dev environment means: Postgres (seeded with test data), Elasticsearch (synced from Postgres), Redis, Celery workers, Celery beat, the API, the frontend, email testing, log aggregation, and monitoring. Miss one and something fails silently. Get the startup order wrong and migrations break.

And then you have to replicate all of that in CI. And make it work on everyone’s machine - M1 Macs, Intel Macs, the one guy on Linux. I wanted to cry constantly.

We used to use Supabase. Endless nightmares. Docker + Supabase local dev was a constant battle. Opaque errors, magic auth flows, platform-specific behaviors that worked in their cloud but broke locally.

Here’s the thing I’ve learned: pre-AI, heavy platforms made sense. Supabase, Firebase, Railway - they handled the complexity you couldn’t. Deploys, migrations, dev environments - too much to build yourself.

Post-AI, lightweight composable tools win. Now I can just use:

Postgres with SQLAlchemy and Alembic (no platform magic, just SQL)
Celery with state graphs for complex jobs (stored in the database, fully inspectable)
OTEL for everything (old, boring, standardized)

No vendor lock-in. No platform-specific behaviors. Just well-documented open source tools that Claude actually understands because they’ve been around forever.

Building our own deploys? Claude writes the Terraform. Building our own dev environment? Claude helps debug the docker-compose. The platforms were training wheels. With AI, I don’t need training wheels - I need composable primitives.

Then we moved to VPC:

We put RDS behind a private subnet (correctly!). But now nobody could connect to the database locally. The bastion host existed but nobody knew how to use it. Someone wrote a script, it got lost, someone rewrote it differently.

I spent more time debugging dev environments than building features. Something had to change.

The Human Cost

Here’s a subset of the people on my team:

The frontend dev. Great at React, ships UI fast, doesn’t want to know what an OTEL collector is. Shouldn’t have to.

The cofounder. His main job is sales and customer-facing work. When he codes, it’s rapid in-and-out - a customer reports a bug, he dives in, fixes it, deploys, back to calls. Not deep work. He does not have time to debug why Docker isn’t finding a volume mount.

Me. I built the infrastructure. I understand the Dockerfiles, the Terraform, the VPC topology. Which means every env question lands on me.

The pattern was brutal:

I’d make an infrastructure improvement (good!)
Someone would pull main
“Hey, the API won’t start anymore”
I’d context-switch from feature work to debug their env
Find the issue (missing env var, stale container, whatever)
Fix it for them
Different person same issue: “Hey, the API won’t start anymore”
Repeat for a week

Every change to the dev environment triggered a support avalanche. I became the bottleneck. I was scared to improve anything because of the support cost.

And these weren’t dumb questions. “How do I see the Celery logs?” is reasonable. “What port is Elasticsearch on?” is reasonable. “How do I connect to the dev database?” is reasonable. The problem was that the answers weren’t obvious, so every reasonable question came to me.

The frontend dev shouldn’t need to understand Docker networking to see why their API call is failing. The cofounder doing a quick bug fix shouldn’t need to remember the bastion host SSH command. They should be able to be productive immediately and get back to their actual jobs.

The real cost wasn’t my time - it was their momentum. Every “hey quick question” interruption for them was a context switch away from the customer problem they were solving. The dev environment was actively slowing down customer response time.

The Insight: Dev Environment as Product

I realized our dev environment wasn’t a utility - it was a product. And like any product, it needed:

Discoverability: New devs (and Claude) should find everything from one place
Isolation: Multiple instances shouldn’t interfere with each other
Observability: Logs, traces, and debugging tools should be obvious
Documentation: Not READMEs that rot, but integrated docs

What if spinning up a dev environment was as simple as cd project && make up? What if every worktree got its own URL? What if Claude could navigate it all without asking me?

The Architecture

Traefik: One Port, Many Services

The core insight: use a reverse proxy to route by hostname, not port.

http://supplyco-dev.localhost/api      → FastAPI container
http://supplyco-dev.localhost/frontend → Vite dev server
http://supplyco-dev.localhost/docs     → MkDocs documentation
http://supplyco-dev.localhost/logs     → Grafana/Loki
http://supplyco-dev.localhost/flower   → Celery task monitor
http://supplyco-dev.localhost/mailpit  → Email testing UI
http://supplyco-dev.localhost/kibana   → Elasticsearch UI

Different worktree? Different hostname:

http://supplyco-feature-x.localhost/api
http://supplyco-feature-y.localhost/api
http://supplyco-bugfix.localhost/api

All running simultaneously. No port conflicts. The directory name becomes the hostname prefix automatically via COMPOSE_PROJECT_NAME.

How it works:

# docker-compose.dev.yml (simplified)
services:
  api:
    labels:
      - "traefik.http.routers.${COMPOSE_PROJECT_NAME}-api.rule=Host(`${COMPOSE_PROJECT_NAME}.localhost`) && PathPrefix(`/api`)"
      - "traefik.http.middlewares.${COMPOSE_PROJECT_NAME}-api-strip.stripprefix.prefixes=/api"

Traefik reads Docker labels, discovers services automatically, routes based on hostname + path. Zero config per worktree.

The Landing Page: Everything in One Place

When you visit http://supplyco-dev.localhost/, you get a landing page with:

Links to every service (API docs, frontend, logs, etc.)
A worktree dropdown that auto-discovers all running instances from Traefik
Service health indicators (FastAPI response time, Celery task counts)
Quick links to documentation

The dropdown queries Traefik’s API to find all projects with running services. Switch between worktrees without remembering URLs.

There’s also a Remote Environment Tools section for live incident debugging. DEV/PROD toggle, and pre-written prompts you can paste directly into Claude:

“Query Logfire for ERROR level logs in the last hour”
“Query Logfire for API requests over 5s in the last hour”

One click to copy the database tunnel command, one click for the password from Secrets Manager. Production incident? Open the landing page, toggle to PROD, paste the commands into Claude, start debugging. No scrambling to remember how to connect.

Claude loves this. Instead of “what port is the API on?” it just goes to the landing page.

Shared Observability Stack

Here’s the thing about logs: you need them when things break. Which is exactly when you don’t want to be setting up logging.

We run a single observability stack shared across all worktrees:

Alloy (log collector)
    ↓
Loki (log storage) ← labeled by project + service
    ↓
Grafana (UI) ← accessible at /logs on every worktree
    ↓
Tempo (traces) ← for distributed tracing

Every container gets labels:

labels:
  - "project=${COMPOSE_PROJECT_NAME}"
  - "service=api"  # or celery-worker, postgres, etc.

Alloy scrapes Docker logs and tags them automatically.

The key: the /logs link on each worktree’s landing page goes directly to Grafana pre-filtered for that project. Click the link from supplyco-dev.localhost, you see supplyco-dev logs. No query writing required.

No thoughts, only logs.

If you want to filter further, LogQL is there:

{project="supplyco-dev", service="api"} |= "error"

But most people never need it. They click the link, see their logs, find the error. The frontend dev doesn’t need to know what LogQL is. The cofounder definitely doesn’t. They click, they see, they fix, they leave.

One dashboard, all worktrees, all services. No more “which container was that in?” or docker logs | grep.

Multi-Worktree Isolation

The hard problem: multiple worktrees need to run simultaneously without stomping on each other.

Database isolation: Each worktree gets its own Postgres container with its own volume. No shared state.

Port isolation: Services bind to internal Docker network ports, not host ports. Traefik handles external access.

But what about GUI tools? DataGrip, pgAdmin, TablePlus - they need a TCP port to connect. You can’t route Postgres through HTTP.

For non-HTTP services, you have two real options:

TLS with SNI routing - Traefik can route TCP connections based on the TLS server name. Each worktree gets supplyco-dev.localhost:5432, supplyco-feature.localhost:5432, etc. Proper isolation, but requires TLS setup and your tools need to support it.
Port stealing - One worktree “claims” the standard port at a time. Simpler, works with any tool, but only one active worktree per port.

I was lazy. I went with option 2:

# In worktree supplyco-dev
make expose-this-db
# Now localhost:5432 points to supplyco-dev's Postgres

# Switch to worktree supplyco-feature-x
cd ../supplyco-feature-x
make expose-this-db
# Now localhost:5432 points to feature-x's Postgres

Instant switching. DataGrip config stays the same (localhost:5432). Claude can use psql without port gymnastics. It’s a socat container that forwards the port to whichever worktree ran the command last.

Could I do TLS? Sure. Might I just add pgAdmin or CloudBeaver to the docker-compose and skip all this? Also yes. The port-stealing approach is “good enough for now” - the whole point is I can improve it later without anyone noticing.

VPC Tunneling: The RDS Problem

Moving RDS into a private VPC was the right security call. But it broke local development.

The old way (bad): RDS in public subnet, security group allows your IP. Works until your IP changes, or you’re on coffee shop WiFi, or AWS throttles your connection.

The new way: RDS in private subnet, bastion host in public subnet, tunnel through.

But “tunnel through” meant:

Find the bastion instance ID
Get the RDS endpoint from… somewhere (Terraform outputs, technically - but I’ve banned the team from touching Terraform, and none of them have ever used it or entirely know what it does, which is correct and good)
Run an AWS SSM command with the right parameters
Hope you got the port right

Nobody could remember it. Nobody should have to remember it. Scripts got written, lost, rewritten.

The solution: One script, multiple environments, automatic discovery:

make db-forward-dev   # localhost:5433 → dev RDS
make db-forward-prod  # localhost:5434 → prod RDS

The script:

Gets bastion instance ID from Terraform outputs
Gets RDS endpoint from AWS Secrets Manager
Starts SSM port forwarding session
Outputs connection string for easy copy-paste

Different local ports for dev vs prod means you can have both tunnels running simultaneously. Compare data between environments without reconnecting.

# Connect to dev
make db-connect-dev
# Opens psql session directly

# Or just forward and use your GUI tool
make db-forward-dev
# "Forwarding localhost:5433 → supplyco-dev-db.cluster-xxx.us-east-1.rds.amazonaws.com:5432"
# "Connection string: postgresql://app_user:xxx@localhost:5433/supplyco"

Claude can run these too. Production incident? Claude forwards the port, runs a query, reports back.

The Makefile: One Command for Everything

The Makefile became the API for the dev environment. Every action has one command:

# Lifecycle
make docker-up              # Start everything
make docker-up LITE=1       # Skip heavy services (saves RAM)
make docker-down            # Stop everything
make docker-down-v          # Stop and wipe data

# Logs
make docker-logs            # All services
make docker-logs-api        # Just FastAPI
make docker-logs-celery     # Just workers

# Database
make db-local-psql          # Connect to local Postgres
make db-local-migrate       # Run migrations
make db-local-reset         # Drop and recreate
make db-forward-dev         # Tunnel to dev RDS
make db-forward-prod        # Tunnel to prod RDS

# Quality
make lint                   # Fix linting issues
make check                  # Check without fixing (for CI)
make test                   # Run all tests
make test ARGS='-k login'   # Run tests matching pattern

# Types & API
make api-types              # Regenerate TypeScript client from OpenAPI
make check-api-types        # Verify types are current

Every target is documented with ## comment so make help produces a useful reference.

No more “let me check the README” or “I think it’s npm run dev or maybe yarn start?” The answer is always make <something>.

Documentation That Lives With the Code

READMEs rot. Wiki pages get lost. Notion docs go stale.

We put everything in docs/ and serve it with MkDocs. And here’s the forcing function: CI blocks your PR unless you’ve updated the docs, or you explicitly add a “skip-docs-check” label.

No “I’ll document it later.” Later never comes. Either the docs get updated with the code, or you have to publicly declare “I’m skipping docs” on your PR. Shame-driven documentation.

docs/
├── architecture/
│   ├── overview.md           # System diagram
│   ├── domain.md             # Entity relationships
│   └── code-organization.md  # Where stuff goes
├── code-style/
│   ├── python.md             # Python conventions
│   ├── fastapi.md            # API patterns
│   ├── typescript.md         # Frontend conventions
│   └── terraform.md          # IaC patterns
├── guides/
│   ├── local-development.md  # This whole setup
│   ├── database.md           # SQLAlchemy, Alembic
│   ├── testing.md            # Test utilities
│   └── logging.md            # Logfire integration
└── operations/
    ├── deploys.md            # Release workflow
    └── incidents.md          # Runbook

It’s accessible at http://{project}.localhost/docs - same hostname, different path.

The CLAUDE.md file points here: “Before non-trivial work, read the relevant doc.” Claude actually does this. It’s remarkable how much better the code is when Claude reads the style guide first.

Making It Agent-Friendly

Here’s the thing about Claude worktrees: they’re not just “another developer.” They’re parallel processes that need to be isolated, discoverable, and debuggable.

What Claude needs:

Discoverability: Where’s the API? Where are the logs? What commands exist?
Isolation: Don’t step on other Claude instances
Debugging tools: When something fails, how to investigate
Context: What patterns does this codebase use?

What we built:

Landing page as entry point: Claude can visit http://{project}.localhost/ and see everything available.

Makefile as API: Every action is a documented make target. Claude runs make help and knows what’s possible.

Logs accessible by URL: Claude can check http://{project}.localhost/logs in a browser, or query Loki directly:

curl -sG 'http://localhost:3100/loki/api/v1/query_range' \
  --data-urlencode 'query={project="supplyco-dev", service="api"} |= "error"' \
  | jq -r '.data.result[].values[][1]'

CLAUDE.md with explicit guidance:

## Local Development
- Start: `make docker-up`
- API docs: http://{project}.localhost/api/docs
- Logs: http://{project}.localhost/logs or `make docker-logs-api`
- Database: `make db-local-psql`

## Before You Code
- Read docs/code-style/python.md for Python work
- Read docs/code-style/fastapi.md for API work
- Read docs/guides/testing.md before writing tests

Hookify rules prevent common mistakes:

# .claude/hookify-rules.local.md
- Don't import from supabase (we migrated away)
- Use uv, not pip
- Use pnpm, not npm
- Run make lint before committing

Work-in-progress tracking: We use .claudetext/ for tracking what different Claude instances are working on. Convention: [ ] unclaimed, [C] claimed by Claude, [x] done. Prevents duplicate work.

The Result

Before:

“How do I run the API?” (Slack, weekly)
“My migrations are conflicting with someone else’s” (constant)
“The tests pass locally but fail in CI” (environment drift)
“I can’t connect to the dev database” (VPC confusion)
Claude agents fighting over ports
Me: scared to touch infrastructure because of the support cost

After:

make docker-up and you’re running
Each worktree isolated by hostname
Logs aggregated and queryable from one place
Database tunnels are one command
Claude reads the docs, follows the patterns, doesn’t conflict with other agents

For the frontend dev: He runs make docker-up, goes to project.localhost/frontend, never thinks about Docker. When something’s weird, he checks /logs in his browser. No Slack DM to me required.

For the cofounder: Customer reports a bug. He pulls main, make docker-up, reproduces it, fixes it, make db-forward-dev to check prod data if needed, deploys. Back on a sales call in 30 minutes. Never asked me what port anything is on.

For me: I can improve infrastructure without fear. Last week I changed how we handle env vars. Nobody noticed. It just worked. That’s the goal.

For Claude: Agents spin up isolated environments, check the landing page for URLs, query Loki for logs, read the docs before coding. They’re better at using the dev environment than most humans were before.

The dev environment went from “source of constant interruptions” to “invisible infrastructure that just works.”

The Technical Details

Stack:

Traefik - Reverse proxy with Docker provider, label-based routing
Docker Compose - Service orchestration per worktree
Loki + Alloy - Log aggregation with Docker label scraping
Grafana - Log visualization
Tempo - Distributed tracing
MkDocs - Documentation served locally
AWS SSM - Bastion tunneling for RDS access
Makefile - Unified command interface

Key files:

docker-compose.dev.yml          # Main dev environment
docker/traefik/docker-compose.yml   # Shared Traefik + observability
docker/port-forwarder/          # Port claiming for GUI tools
scripts/port-forward-db.sh      # RDS tunneling
docs/                           # All documentation
Makefile                        # Command interface
CLAUDE.md                       # Agent context

The magic: COMPOSE_PROJECT_NAME defaults to the directory name. Everything keys off that. Rename the directory, get a new isolated environment.

What I Learned

Dev environments are products. Treat them like one. They need UX, documentation, and maintenance.
Discoverability beats documentation. A landing page that links to everything is worth more than a README that explains everything.
Isolation is non-negotiable. Especially with AI agents spinning up parallel environments. Design for multiple simultaneous instances from the start.
Observability isn’t optional. You will need logs when things break. Make them easy to access before things break.
One command per action. make docker-up is infinitely better than “run this, then that, then set this env var, then…”
Agents need the same things humans need - just more explicitly documented. CLAUDE.md isn’t extra work, it’s documentation you should have written anyway.

Getting Started

If your dev environment is in the “47 Makefile targets” phase:

Add a reverse proxy. Traefik with Docker labels is ~50 lines of config. Suddenly you have URLs instead of ports.
Add a landing page. One HTML file that links to all your services. Put it at the root route.
Aggregate your logs. Loki + Alloy is free and runs locally. One dashboard instead of docker logs across 10 terminals.
Document commands, not concepts. Your Makefile should be the entire interface. make help should answer most questions.
Write CLAUDE.md. Not because you’re using AI, but because explaining your environment to an agent forces you to make it logical.

The best dev environment is one that makes the next person (or the next Claude) productive immediately. Everything else is just complexity.

Our setup is specific to our stack (Python/FastAPI/React/AWS), but the patterns generalize. The goal is: clone repo, run one command, be productive. If your env requires more than that, you have work to do.

How I Used AI to Outsource My Executive Function

2026-01-02T00:00:00+00:00

Building a personal operating system with Claude Code when your brain fights you

The Problem: Context Scatter as a Founder

I have ADHD. I run a company. I also have a body that needs PT exercises, meal planning, and regular gym sessions. Every morning I’d wake up to:

Linear inbox with issues
Slack with @mentions and DMs
GitHub with PRs needing review
Email (personal AND work)
TickTick with half-finished todo items
A nutrition plan I’d inevitably forget
A workout routine I’d modify based on which body part hurt that day

The cognitive load of just figuring out what to do was exhausting before I’d done anything. Context switching between 8+ apps, each with their own notification system, each requiring different mental models. For someone with executive function challenges, this is death by a thousand cuts.

The Insight: What If AI Could Be My Prefrontal Cortex?

I started using Claude Code for work stuff - code reviews, debugging, writing features. But then I noticed something: Claude was really good at structured decision-making. The exact thing my brain sucks at.

What if I could program Claude to:

Aggregate all my contexts into one place
Enforce good habits through methodology (GTD)
Automate the boring stuff entirely
Guide me through decisions I’d otherwise avoid

The System I Built

`/morning` - The Daily Aggregator

Every morning, I type /morning and Claude:

Opens my Linear, Graphite, and Slack dashboards
Asks me to paste whatever I see (Claude can read screenshots!)
Fetches via API:
- GitHub PRs needing my review
- My PRs that are stuck/approved/need fixes
- Linear issues assigned to me
- Email counts from both accounts
- TickTick tasks due today/overdue
Presents a unified dashboard
Walks me through building today’s plan interactively

## Morning Dashboard

### PRs Needing Your Review (3)
- supplyco#1739 - "Add RLS policies" by @vlad
- supplyco#1725 - "Fix auth flow" by @jannik (CHANGES_REQUESTED by you)

### Your PRs - Status Check
- #1698 "Metrics dashboard" - APPROVED ← merge this?
- #1702 "API refactor" - waiting on reviewers

### Linear Issues (5)
- [SUP-1542] Blocked: dependency on Jannik's PR
- [SUP-1589] Due today: write migration

### Email: Personal 12, Work 8
### TickTick: 4 due today, 2 overdue

Then it asks: “What should go on today’s plan?” and creates TickTick tasks with proper due dates.

The key insight: I don’t have to context-switch. I don’t have to remember which app has what. I just answer questions.

The Hybrid Approach: Dashboards That Won’t Automate

Here’s the thing: not everything has a nice API. Linear has an MCP, but it’s clunky for inbox-style browsing. Slack’s API doesn’t capture the gestalt of “what’s blowing up right now.” Graphite (our PR tool) has no API at all. LinkedIn DMs? Forget it.

So I built a hybrid system. Claude runs:

open "https://linear.app/mycompany/inbox"
open "https://app.graphite.dev/#needs-your-review"
open "https://app.slack.com/client"

Three browser tabs pop up. Claude then asks me to paste what I see - and crucially, Claude can read screenshots. So I:

Cmd+Shift+4 to screenshot the Linear inbox
Paste it into the terminal
Claude parses it: “I see 5 issues assigned to you, 2 are marked urgent…”

Or I just Cmd+A, Cmd+C the text and paste it. Either works.

I’ve also gotten automated scrolling screenshots working with Apple’s screencapture and osascript - it can capture an entire scrollable page. I occasionally use these for long Slack threads or Linear backlogs. Next step is programming those into the workflows so Claude can just grab them automatically.

The key is: my focus immediately returns to the terminal. The browser tabs are open, but I’m not there. I grabbed what I needed and came back. Claude becomes the home base that I always return to after quick excursions into distracting apps.

This matters because my failure mode isn’t “I don’t check Slack.” It’s “I check Slack and then it’s 2 hours later.” The open → grab → return flow keeps the terminal as my anchor.

Why I Could Never Do GTD Before (And Why Claude Changed That)

I’ve read Getting Things Done three times. I’ve tried OmniFocus, Things, Todoist, Notion, paper systems. I understand the methodology intellectually. Inbox zero, weekly reviews, next actions, contexts - I get it.

I just couldn’t do it.

The problem with GTD for ADHD brains is that it requires sustained executive function at exactly the moment you have none. You’re supposed to:

Look at your inbox
Pick ONE item
Ask “is it actionable?”
If yes, ask “what’s the next action?”
Route it appropriately
Repeat

Simple, right? Except my brain does this:

Look at inbox
See 47 items
Feel overwhelmed
Notice an interesting link
Open 12 browser tabs
Forget I was processing inbox
Add “process inbox” to inbox
Close laptop

The methodology isn’t the problem. The execution is the problem.

AskUserQuestion as ADHD Medication

Here’s what /gtd-process actually looks like:

Claude: Processing inbox item 1 of 23:
"interesting article about distributed systems"

Is this actionable?
○ Yes - there's something I need to DO
○ No - it's reference/inspiration
○ Trash - I don't need this

Me: [clicks "No"]

Claude: Where should this go?
○ Someday/Maybe - might read later
○ Reference - save with a tag
○ Trash - actually I don't need it

Me: [clicks "Trash"]

Claude: ✓ Deleted.

Processing item 2 of 23:
"Call dentist"
...

Notice what’s happening:

One item at a time. I literally cannot see item 3 until I’ve decided on item 2.
Forced binary choices. Not “what do you want to do?” but “pick A, B, or C.”
Claude proposes, I dispose. The cognitive load of generating options is removed.
Progress is visible. “Item 7 of 23” creates momentum.
No escape. I can’t open a new tab because I’m in the terminal and Claude is waiting.

This is the external structure my brain lacks internally.

The “Brainrotted Zoomer” Interface

I call it the “brainrotted zoomer” interface because it’s designed for someone with the attention span of a TikTok scroll:

Micro-decisions only. Never “plan your week.” Always “is THIS one thing actionable, yes or no?”
Immediate feedback. Task moved, checkmark shown, next item appears.
No dead ends. If I say “unsure,” Claude asks a clarifying question. I’m never stuck staring at something.
Suggested actions. Instead of “what’s the next action?” Claude says:

What's the next physical action? I suggest:
○ "Call Dr. Smith to schedule appointment"
○ "Text Sarah about Saturday plans"
○ "Research best options online"
○ Other - I'll type it

I can just tap an option. The verb is already there. The specificity is already there. I’m not generating, I’m selecting.

Why This Finally Works

Traditional productivity systems assume you can:

Hold context in working memory
Generate options from scratch
Maintain focus through a list
Self-direct without external structure

ADHD means I can’t do any of that reliably. But I can:

Answer a direct question
Pick from multiple choice
Follow along when someone else leads
Stay engaged when there’s immediate feedback

Claude becomes the external executive function. It holds the context (“you’re on item 7 of 23, you’ve trashed 3, moved 2 to someday”). It generates the options (“here are three possible next actions”). It maintains focus (“okay, now here’s item 8”). It provides structure (“is this actionable, yes or no?”).

I’m not doing GTD. Claude is doing GTD to me. And that’s why it finally works.

`/inbox` - Email Triage Without the Dread

Same principle applied to email:

Claude: Email 3 of 18:
From: eric@supplier.com
Subject: Re: MacBook order status
Received: 2 days ago
Preview: "Hey, following up on the laptop order..."

What would you like to do?
○ Tell me more - show full email
○ Archive - I've handled this
○ Create todo - need to respond/act
○ Draft reply - help me write back
○ Skip - deal with later

If I pick “Tell me more,” Claude fetches the full email, summarizes it, and asks again. If I pick “Create todo,” it asks for a due date and creates a TickTick task like “Reply to Eric about MacBook order” with proper GTD formatting.

The inbox goes from “264 unread anxiety pile” to “answer 18 questions and you’re done.”

The Recursive Loop: Everything Flows Back

Here’s where it gets elegant: the system feeds itself.

When I process email with /inbox and choose “Create todo,” that task goes into TickTick. Later, when I run /gtd-process, that same task comes back through Claude for processing. If it’s vague (“deal with MacBook situation”), Claude helps me sharpen it (“Email Eric confirming delivery date”).

If it needs research, I run /research-task on it - which produces more tasks that flow back into TickTick. If those tasks produce emails, those emails get processed in /inbox.

Email inbox → /inbox → TickTick task
                            ↓
                      /gtd-process → sharpen task
                            ↓
                      /research-task → research + new tasks
                            ↓
                      TickTick → more tasks for /gtd-process
                            ↓
                      Action → sends email → response arrives
                            ↓
                      Email inbox → /inbox → ...

Everything flows through Claude. Claude is the processing layer between all my inboxes. The recursive nature means nothing gets lost - it just keeps cycling through until it’s either done or explicitly deferred to Someday/Maybe.

`/research-task` - When You Don’t Know What To Do

Some tasks sit in my Next Actions list because I don’t actually know how to do them. “Save money on Cytopoint” - my dog needs this expensive medication. I know I should research alternatives but… that’s vague and overwhelming.

So I run /research-task "Save money on Cytopoint" and Claude launches into research mode:

Claude: I'll help you research cheaper Cytopoint options. First, some questions:

What's your current situation?
○ Getting it from the vet directly
○ Using Chewy/online pharmacy
○ Not sure of current price
○ Other

Me: [clicks "Getting it from vet directly"]

Claude: What's your dog's weight/dosage?

Me: 40lb, monthly injection

Claude: Got it. Let me research options...
[web searches, fetches pharmacy pages, reads rebate programs]

Ten minutes later, Claude produces a memo:

## Research: Cheaper Cytopoint Options

**Current cost:** ~$80-150/month at vet

**Findings:**

1. **Costco Pharmacy** - $45-65 for same injection
   - Has pet Rx program, no membership needed for pharmacy
   - Delivers to home
   - Need vet to send prescription

2. **Zoetis Rewards** - Manufacturer rebate program
   - $30 back after 2nd injection
   - $50 back after 4th injection
   - Stackable with pharmacy savings

3. **Librela** - Newer alternative medication
   - Similar mechanism, may work for some dogs
   - Slightly different side effect profile
   - Worth discussing with vet

**Recommendation:** Switch to Costco pharmacy + enroll in Zoetis Rewards. Ask vet about Librela as backup option.

Then Claude asks what to do with this:

How would you like to proceed?
○ Create next-action tasks (I'll schedule the calls/emails)
○ Make original task actionable (update with specific next step)
○ Just the memo - I'll handle manually

I pick “Create next-action tasks” and Claude creates:

“Call Costco pharmacy about pet Rx delivery” - due tomorrow
“Email vet requesting Cytopoint prescription to Costco” - due tomorrow
“Sign up for Zoetis Rewards rebate program” - due today (quick)
“Ask vet about Librela at next appointment” - due next month

The vague “save money” task becomes four concrete actions with verbs and dates. The research is done. The decisions are made. I just execute.

`/hevy` - AI-Built PT Routines

This is where it gets weird and cool.

I had chronic pain issues. I’d done PT but couldn’t keep track of which exercises to do. So I started having conversations with Claude about my symptoms:

“I have forward head posture and my right hip is always tight. Also my feet collapse inward when I squat.”

Claude would research the biomechanics, suggest exercises, and explain the mechanism. Then I’d say “add that to my workout routine” and Claude would call the Hevy API to update my gym app directly.

My current routines have exercises like:

Chin tucks and wall slides (forward head posture)
Face pulls with notes: “Pull to face, elbows HIGH. Squeeze rear delts.”
Short foot with notes: “Shorten foot WITHOUT curling toes. Pull ball toward heel, create dome. Toes LONG/relaxed.”
Dead bugs with notes: “RIB CONTROL: Exhale fully, ribs drop to floor. Low back FLAT.”
Tibialis raises with notes: “Back to wall, feet 12in out. Lift toes toward shins. Pause 1s at top, 3s down.”

The form cues are embedded in my workout app. When I’m at the gym, I just open Hevy, and the routine Claude and I debugged together is right there with instructions. No memory required. No “what was that exercise my PT showed me?”

I ran diagnostic conversations like:

“My right shoulder clicks when I raise my arm overhead” → added specific rotator cuff work
“I can’t get into a deep squat without my heels rising” → added ankle mobility (knee-to-wall) and calf stretches
“My lower back hurts after sitting” → added hip flexor stretches, glute activation, core anti-extension work

Each conversation produced updates to my actual workout routine. The gym became a place to execute, not to think.

AWS Lambda + SSM: “Claude 1Password”

My YNAB (budgeting app) was always messy. Amazon transactions would say “AMAZON.COM” with no detail about what I bought. I’d have to cross-reference email receipts manually to remember if that $47.23 was cat food or impulse purchases.

So I built a Lambda that runs every hour:

Fetches Amazon/Apple receipt emails from Gmail
Parses item names and prices from the HTML
Matches to YNAB transactions by amount and date (±5 days tolerance)
Uses Claude Haiku via AWS Bedrock to shorten long product names (40+ chars → 2-4 words)
Updates the YNAB memo field automatically

Now my transactions auto-label themselves:

Amazon: Cat Food 15lb ($24.99), Protein Bars 12ct ($18.99), +2 more
Apple: Claude Pro Subscription ($20.00)

All credentials live in AWS SSM Parameter Store. I call it “Claude 1Password” - Claude can fetch any API key it needs with aws ssm get-parameter, and I never have to paste secrets into chat or worry about them leaking.

The Terraform for this is like 100 lines. EventBridge triggers hourly, Lambda runs, my budget stays accurate without me thinking about it.

Grocery Lists + Nutrition Autopilot

I used to meal plan by staring at an empty document, getting overwhelmed, and ordering DoorDash.

Now:

Tell Claude my macro targets and dietary restrictions
Claude suggests a week of meals
I pick the ones that sound good
Claude generates a grocery list organized by store section
I buy roughly the same stuff every week with minor variations

The decision fatigue is gone. I’m not standing in the grocery store wondering what to buy. I have a list. The list produces meals. The meals hit my macros.

Nutrition is on autopilot. I follow the system.

The Expanding Frontier

The system keeps growing because it’s self-reinforcing:

Learning makeup: “What order do I apply these products? What’s wrong with my technique based on this photo?” Claude can look at selfies and give specific feedback.

Fashion: “I’m going to [X event], I own [Y items], what works together?” Outfit planning without the paralysis.

Company planning: Strategic roadmaps, investor update drafts, hiring rubrics, competitive analysis. Work stuff, but guided instead of blank-page.

The Snowball Effect

Here’s what nobody tells you about integrating AI into your life: it compounds.

The more context Claude has about:

My work projects (Linear, GitHub, codebase)
My habits (TickTick patterns, what I defer vs complete)
My body (PT history, what’s worked)
My preferences (past decisions, communication style)

The better its suggestions become. Claude starts to know that I always defer dentist appointments, so it asks “is this actually happening or should we Someday/Maybe it?” Claude knows I respond well to morning workouts, so it doesn’t suggest evening gym sessions.

It’s like having a chief of staff who’s read every email, attended every meeting, and remembers everything. Each new integration makes the whole system smarter.

What “Autopilot” Actually Means

My daily overhead is now:

Morning: Run /morning, answer questions for 5-10 minutes, get a plan
Gym: Open Hevy, follow the routine Claude and I designed. Every exercise has form cues.
Email: Run /inbox when I have energy, archive/todo/skip through them
Tasks: Work from my TickTick “Today” view - everything already has verbs and due dates
Meals: Follow the grocery list, cook the planned meals

The stuff I used to burn executive function on - figuring out what to do, remembering how to do exercises correctly, deciding what to eat, enriching budget transactions, processing email - just happens.

My job is now to:

Show up at the gym
Eat what’s on the list
Work through the tasks Claude helped me prioritize

The meta-work is gone. The actual work remains.

The Technical Stack

For the curious:

Claude Code Setup:

Custom slash commands defined in .claude/commands/*.md
Each command is a workflow written in markdown that Claude follows
MCP (Model Context Protocol) servers for Hevy, Slack, Linear
Python scripts for Gmail, TickTick (Claude calls via Bash)

AWS Infrastructure:

Lambda (container-based) for background jobs
ECR for container images
SSM Parameter Store for all credentials (“Claude 1Password”)
EventBridge for hourly scheduling
Bedrock (Claude Haiku) for text processing in Lambda
SNS + ntfy.sh for mobile alerts
CloudWatch for logs and metrics
Terraform for infrastructure-as-code

Integrations:

Gmail API (personal + work accounts)
YNAB API (budgeting)
TickTick API (tasks)
Linear API (issues)
GitHub API (PRs)
Hevy API (workouts)
Slack API (messages)

The slash commands are just markdown files that define a workflow. Claude reads them and follows the steps. It’s programming by writing English that explains what you want.

What This Means

I’m not sure I’m “using AI” anymore in the way most people mean. Claude isn’t a tool I pick up to do a task. It’s the interface through which I interact with my own life systems.

The distinction matters because it changes the frame from “AI can help with X” to “what would it look like if AI handled all the glue work?”

For someone with ADHD, the glue work - context switching, remembering, prioritizing, maintaining systems, generating options, sustaining focus - is often harder than the actual work. The actual work is fine. It’s everything around the work that kills me.

Now I have a system that:

Aggregates instead of scatters
Enforces methodology instead of hoping I remember
Automates the boring parts entirely
Guides me through decisions with proposed options
Remembers everything I forget

I just have to show up and answer questions.

Getting Started

If you want to build something like this:

Start with one pain point. Mine was the morning chaos. What’s yours?
Write a Claude command for it. Just a markdown file explaining the workflow you wish existed.
Add integrations as needed. MCP servers, Python scripts, API calls.
Notice what else is annoying. Build a command for that too.
Let it compound. Each new piece makes the whole system more useful.

The system grows from there. You’re not building an app. You’re programming a chief of staff in English.

If this resonates and you want to see the actual code/commands, I might open-source the setup. Let me know.

Token: Considered Harmful

2024-05-23T00:00:00+00:00

You’ll be pleased to know that I left crypto shortly after writing this essay.

Notes from inside the machine.

I’ll probably do plenty of ragging on projects I’m familiar with. It doesn’t mean they’re exceptionally bad, or that other projects are exceptionally good, just that I am more familiar with how things went down.

Hot take: Ponzi schemes can be beautiful

Economic bubbles have historically been engines of real progress. The dot-com bust left us with an abundance of fiber optic cable that powered the next decade of internet growth. The Roaring Twenties built infrastructure that served America for generations. Speculative excess, channeled well, can fund things that rational capital allocation never would.

You have to give proof-of-work and proof-of-stake L1 chains their due credit: the trick startups do, where they hand out equity to the long-suffering early employees, is brilliantly repurposed to direct the long-term reward towards the exact action needed to create and secure the chain. One of the coolest things about our industry is how aggressively we blur the lines of compensable work, although we’ve since proceeded to beat the concept to death several times over.

Startups have long paid their early employees in equity or options- in exchange for a high performer’s opportunity cost elsewhere, they get a piece of the future value of the venture, which encourages them to fight for their team to win. Startups have also often subsidized early customers’ costs with investor money, hoping that they will build sufficient network effects before the money runs out and stick the landing into profitability. Businesses frequently craft creative compensation packages for executives and salespeople to induce them to achieve certain goals. Bringing game theory in to accomplish something cool isn’t new. Using it to convince selfish humans to collaborate and scale something beyond their wildest dreams is also not new. What is new is how creative we got with it and how we automated it to remove the bureaucracy.

Smart contracts have built novel ways to bring people together, making markets, investing in startups, running network nodes, attempting to create digital museums, and more. They’re just helpless dead code until they induce humans to execute and interact with them. What animates them is when programmers weave intricate systems of monetary forces into code and data to induce people to interact with the system and accomplish interesting things cooperatively without any mediating entity. Successful examples of this include millions of ASICs executing SHA256 until they find a small enough output, adding transactions into blocks and executing them, liquidating bad debt, connecting hard drives to a network to ostensibly store files, or building an intricate crypto-backed dollar-pegged stablecoin. It’s insane; it feels like magic that you can literally write some text to summon millions of dollars and thousands of people with your computer.

Of course, legal codes, contracts, and inspiring speeches can also muster and direct outsize resources like this. But I’ve never seen something so fast and automated. A permissionless piece of code where legal contracts, security issuance, citizenship concerns, KYC, and all the other accumulated bureaucratic kruft are thrown out in exchange for “execute this computation, this other number will increase, and you can trade that number for dollars” is cool. It is limited in that it is sort of difficult to incentivize things that cannot be directly validated by a Turing machine, but oracle technology is good and getting better.

It’s awesome. You can summon up massive swarms of organized humans and machines with a text file. You can even get them to do interesting and useful things if you’re careful.

Misfiring the Money Cannon

That “careful” bit is quite load-bearing. The first fall from token grace is when people do the classic government trick and incentivize something very similar to, but not exactly, the goal. Lots of people like to quote Stafford Beer in saying that “the purpose of a system is what it does”, and this gets poetically illustrated when the (public) incentives on the (public) protocols are released into the wild, everyone in the project behaves mostly in good faith, everyone outside the project behaves rationally, and they all proceed to accomplish something hilariously orthogonal to the stated goals of the project.

My favorite examples of this are BitTensor and Filecoin, with their “innovations” of useful proofs of work (UPoW). BitTensor attempts to link a computational proof of work to useful machine learning inferences, while Filecoin attempts to link useful data storage to hard drive consumption as their proof of work. In practice, both of these are not useful. It is very difficult (one might even say “obviously paradoxical”) to build a resource-intensive potlatch that the participants make to secure the chain as a credible and costly signal, which also serves as a competitively-priced computational commodity (without taking the “pump” side of the pump-and-dump as permanent). In the end, you waste the content of the potlatch regardless of whether you choose UPoW or PoW, but you add the waste of years of your team’s time trying to bring your unsellable UPoW “resources” to market. Instead of salvaging some waste, you waste inestimably more.

(Note that we found a very nice solution to the conundrum: the potlatch can choose to sacrifice financial optionality, from which we get the more resource-efficient (although potentially with cyclical risk issues) proof-of-stake.)

My favorite ill-designed incentive is not alone; the industry has far more failed evolutions than successful ones. Again and again, we see token inflation going to airdrop farming, fake “users” who are farming for tokens instead of using the product (see: crypto games like Axie Infinity), alt-L1 “usage”, ecosystem protocol deployment of unusable “apps”, and the ridiculous economic situations we invariably discover under the promising metrics when DePINs scale.

It’s not a fun trap to find yourself in: you cast your spell when you deploy your ecosystem, and the value accordingly floods in. Suddenly, every move has sky-high stakes for your protocol and token price. Changing the topology of the financial reality you have now created is a high-friction process, especially if you (like most of web3) are sticking to democratic governance ideals that prevent you from rapidly iterating on your prototypes like a startup ought to be. Unfortunately, it may be hard to break away from this and iterate freely, as your stakeholders have often sunk significant amounts of capital into their interactions with your incentive scheme. You’ve become beholden to them, at least reputationally, if not legally.

Another common failure is a mismatch of the incentives’ velocity to the project’s goals. Most projects’ incentives use a bit of ponzinomics, or rewarding early contributors with outsize token shares in order to reward them for taking a risk. This is to induce the network to grow to a size where you start getting nice valuable network effects. Uber did this for years, by softening both sides of the market with venture capital money.

The most common failure mode here is shooting off all the bonus incentives for bootstrapping before the network reaches a steady state. I believe this is the way that Bitcoin will eventually die. Ethereum seems to be doing better with getting sufficient inbound gas fees to feed the validators, at least as long as there are people attempting to build apps atop it. (but my previous post discusses why this might not last either)

In my opinion, judiciously used ponzinomics are fantastic jet fuel. Properly built, they’re the financial energy that converts ideas to reality. Clumsy, naive, or predatory ponzinomics are the problematic side of tokens. Now, we’ll start to discuss the more predatory ones.

Vesting and “trying to get in early”

A core tension is the one between the difficulty of the things we’re trying to accomplish and the perverse timelines we’ve set for ourselves. Most things that are sufficiently cool to excite a community to participate in the incentive scheme, and most things with sufficient market to draw venture capital investment, are extremely ambitious. Ambitious things are hard. Ambitious things take time. Building something massive, as a key person or founder, may take a decade of hard work, which is not even reflected in standard four-year startup vesting. Contributing substantially to a product as an employee takes years, which is probably reflected well in standard startup vesting paired with stock refills.

The standard in web3 is two years from token creation. Oh, word, and we’re planning to rebuild the internet? Extending this vesting schedule probably won’t happen because a vesting schedule is a Mexican standoff between the investor and the founding team, and they both benefit financially by bringing their exit to USD closer to the present. The eventual “community” has nothing to do with it. For this to end up otherwise, both parties would have to make an altruistic and self-sacrificing agreement for the health of the networks they build.

The problem is complicated by the fact when the network is live, the token is usually liquid (if it’s even pretending to have a purpose). Suddenly, an organization that probably has the maturity of a seed-stage startup is beholden to the public market. At this point, the product is usually too immature to be doing anything that anyone wants (other than attempting to hit it like a piñata so it drops some tokens to market-sell). The founding team finds itself answering to angry speculators who want the price to go up, but not to users. Becoming deeply useful is years of hard work away.

The optimal and obvious strategy becomes yelling about techno-optimism while pretending to ship for two years until they can sell their tokens to the remaining rubes and abscond. It’s extra easy to pull off when you’re marketing to naive people who don’t understand your industry, are easily swayed by fake or misleading statistics, and are intoxicated with collective effervescence in a bull run. The lack of securities enforcement on these tokens means that people get away with saying anything they want to manipulate the token price upwards.

Even for projects that haven’t rationally decided to rug, if they misalign the ponzi-versus-vesting-versus-building schedules, they can end up in situations where the team is rich and unwilling to work but the product isn’t working. Another fun one is where the system not collapsing depends on maintaining the token price and tokenholder interest, which requires a massive amount of energy from the team, and prevents them from ever focusing on building anything useful, which dooms them to an excruciatingly slow death of perpetual useless PR announcements and no users.

The right way to shape these incentives is to make it so that it’s approximately the same expected value to participate early (with a small chance of hitting a jackpot on your earned token value and little useful product) as it is to participate late (with a useful product and no rewards), unless the early participant is bringing rare and useful information to the market by participating. This means that the team and investors need to be locked in for the appropriate amount of time to iterate, get the thing built, and grow it to where the incentives can take care of themselves.

Secondly, users’ needs ought to be mostly figured out before non-user token-holders come into market play. You need that as baseline because that’s where you’ll get your sustained steady-state revenue after the ponzi scaling phase. In standard startup parlance, you need to achieve PMF before you dump resources into scaling. However, in crypto, the stakes are significantly higher because you can only launch the token into orbit once. You have one toss at achieving network effects through a good pump, so don’t jump until you have a plan for how you’ll stick the landing.

Lack of securities regulation, much time-wasting trying to define tokens as a not-a-security

In the last section, I alluded to the lack of securities law allowing teams to exploit information asymmetries freely. I’ll discuss that further here.

Much of the innovation in incentive schemes that I talked about in the first section is possible because of the lack of regulation. Realistically, the closest cousin to what we’ve built is usually some kind of security, whether it’s as simple as a growth or dividend stock, an option, or some exotic financial derivative we’ve historically banned from traditional markets.

Sometimes, protocols attempt to force users to pay for services in their token, and they make the legal claim that it isn’t a security but a “utility token” more akin to a coupon or airline point, even though all their users treat it like a growth equity share. Sometimes this works out for them (Ethereum, thanks to immense network effects), sometimes it doesn’t, and they fail to force users to pay for their goods in their token (Sia, most alt-L1s).

Regardless, nitpicking definitions to figure out whether a token is or is not a security is stupid. Securities regulation is meant to prevent people from scamming and defrauding each other when they trade financial instruments. Rigidly clinging to the Howey test is not useful in terms of determining which financial transactions ought to be regulated for the good of society, where the goal is to prevent people from harming each other with the practices that securities law is built to prevent.

Right now, the SEC seems to be fighting very slowly through the massive fog of misinformation, and our industry is spending half its time and attention attempting to prevent ourselves from being defined as a security so that we don’t all get hit with charges for issuing and selling unlicensed securities. For me, the right answer to clean the industry up is probably amnesty for that charge in particular, retroactively defining tokens as securities, retroactive enforcement for bad-faith securities violations, and hard work writing a lot of policies to start enforcement going forward.

Things are not well here

Under all these perverse incentives and a total lack of healthy regulation to prevent crime, all sorts of ill effects have taken over the industry. To me, it has felt diseased for years now. It’s hard to take hopeful people seriously.

Vampire attacks have been in vogue for a while, where a company barely innovates in anything useful except attracting more liquidity for no good reason, but they succeed in stealing liquidity from the other project that might have been more technically innovative. Zero-sum liquidity and order flow battles feel like the main frontier of innovation in the industry, with a small minority of projects trying to solve real problems instead of building a financial ouroboros. Innovation has moved from the technology and game theory to the social and memetic layers, which is not where it ought to be when our tech is still unusable and not solving problems.

People have moved from launching protocols that use a token, to launching protocols with a useless token hastily glued onto the side, to launching a token with a fake protocol attached to lend the techno-optimistic bullshitting an air of legitimacy, to launching naked memecoins with absolutely nothing but the meme attached. Flagrant bullshitting has become completely acceptable, and the speculation doesn’t even pretend to be based on techno-optimism: nobody in the industry believed these goofy AI narratives, but everyone believed that the cavemen Coinbase drags in will buy.

The people who are left in the industry are (with a few small pockets of exception) trapped by contract or income, too clueless to understand what’s happening, do understand but are completely past caring, or are sociopaths actively exploiting anyone they can drag into their net. This is not a recipe for bringing together smart-kind-passionate people who accomplish world-historical things with technology. This is how you attract apathetics, losers, mercenaries, and villains. What you accomplish is crime and failure.

The people who are allocating capital aren’t entirely to blame, although there are a lot of them who are colluding heavily with the worst offenders, because their LPs pressure them to return the fund, and there’s not much honest money left to be made in this industry (or competent founders). Founders can’t always be blamed either because they’re making rational decisions in a social environment that finds these sorts of schemes allowable. The pressure for all parties to predatorily extract capital now at the expense of long-term value is coming from every possible direction at this point. The incentives have decayed into a nasty situation for everyone involved.

As a founder, I’ve felt trapped and sick for a long time. Few in web3 want to pay for useful things because they’re all attempting to juice money out of the industry as fast as they can. If you’re trying to sell to people who aren’t in the cult, you have to make up new jargon to avoid the distasteful (well-earned) associations. In the past year, talent from within the industry is nearly always worse than talent sourced from outside, which I’ve verified across so many interviews. Teams trying to achieve product-market fit by solving real problems in people’s lives instead of playing various predatory zero-sum games are very difficult to find funding for (we’re a rare exception), although every VC and ecosystem lead is constantly bemoaning the lack of founders building consumer apps to self-immolate on the pyre of a post-TGE ecosystem’s corpse.

Can we clean it up?

I mean, maybe. We need some regulation and better norms. As I said earlier, there’s a good reason that we have the SEC, restrictions on how you can buy and sell securities, and restrictions on speech related to securities where you inherently have asymmetric information.

It’s to prevent what we’ve been watching happen, to prevent all of society from exploding out of control with speculation and then rotting in its wake. Thank god this was contained to a tiny corner of the market. It’s been entertaining (albeit bloody) to watch history repeat itself and justify securities law before my once-libertarian eyes.

As I said earlier, a sketch of the right approach is probably to start enforcing retroactive securities law on tokens, with an exception for offenses related to selling unregistered securities as long as everything else was done mostly in good faith. Work needs to be prioritized immediately to allow useful game-theoretic innovations without legalizing scams. There’s a fine line between “cool useful protocol” and “derivatives so complex that they’re inherently predatory”, and our regulators need to buckle down and do their homework to figure out the difference. I’m not naive, I know it won’t be easy, bureaucrats will have to analyze incomprehensible ponzi programs with millions in circular transaction volume and names reminiscent of arcade games.

We already have some groundwork to start from: we know what the common scams look like and what financial crimes like manipulating markets look like. We can green-light situations and provide guidance for places where things are fine and investment is derisked- fair valuations for growth and cash-flowing tokens are something that we’ve known how to calculate for ages. It’s a difficult problem because a lot of regulation is protection from exotic derivatives that only your MIT PhD quant can understand (and not your cousin who loves /r/wallstreetbets), but exotic derivative structures are basically our whole thing. Nevertheless, we must start figuring this out if we hope to rescue anything.

Secondly, we need longer vesting, long enough to build and stabilize the product. It should attempt to be scheduled to last until the end of the ponzi phase (e.g. get to where system inflows match outflows and inflation is aligned with growth, risk, and work done), with a stabilization lockup overhang to let the steady-state sit. This would be similar to waiting for the startup to have mostly saturated its target markets and be executing a robust financial model before an IPO.

We also need better standards about when projects should take on the responsibility of answering to a population of tokenholders at TGE, which should be after sufficient attention has been given to serving the needs of the people who will be paying to obtain value from the protocol.

The other question is, once we’ve cleaned up the piranhas, what remains to be salvaged? Very little technology is ready to solve problems for users. Much of our innovation has been into scams, infrastructure that enables scams, and infrastructure that helps people win zero-sum games. Some areas that I think have immediate hope include:

CBDCs and distributed enterprise ledgers,
stablecoins,
socialFi and regulated gambling,
privacy tech/fundamental cryptography like ZK,
and fundamental p2p/local-first/distributed systems research.

Unfortunately, we’re used to massively inflated valuations, because both founders and VCs are accustomed to exiting their investments in a flaming pump-and-dump. There will be carnage for venture funds if these practices end; they’ll have to significantly mark down almost all their investments. There may not be a lot of liquidity to deploy into whatever’s left, which is a problem, because there’s still a significant amount of work left before any of these products could make money.

conclusion

Novel incentives in code (as smart contract or distributed system) are extremely cool, and you can do really cool things with them.
People aren’t always very skilled at designing these, and when they mess up, it can go catastrophically wrong for all sorts of sad and unavoidable reasons.
There are major unsolved incentive alignment problems between the teams building the protocols, the tokenholders, and the health of the useful protocol itself. These are probably surmountable, but there are perverse incentives against fixing them.
Lack of securities regulation, and the entire industry fighting getting regulated, has led to this technology developing into a hotbed of crime and scams. We have built financial products that nearly always ought to be regulated as a security for the good of society, but instead of attempting to work with regulators, we’ve mostly acted like children (and are now receiving our just Wells notices).
We are seeing increasingly short-sighted, extractive, and wasteful behavior across the industry. Meanwhile, the industry rots from the inside out as everyone except the bottom of the barrel flees. However, individuals aren’t entirely to blame for this sad situation, because perverse and powerful incentives are coming from all sides at this point.
I believe a quick and brutal cleanup is both possible and desirable: the SEC should retroactively consider almost all tokens securities by default, grant amnesty to projects that sold unregistered securities in the past and ongoing until better regulation has been developed, and start aggressively prosecuting the most egregious and predatory violators from the past few years (as in, probably not Uniswap and Coinbase, certainly not to an existential level of judicial attack)
Longer-term (but starting immediately), the SEC ought to rapidly develop some guidelines against the worst scams (inspired by the obvious parallels in history and securities law) and start a good-faith collaboration on figuring out how to constructively regulate complex financial incentive schemes over distributed systems that are marketed to non-accredited investors.
This will be a bloodbath for crypto VC funds and many founders. What’s left may feel small and sad, and venture interest may dry up for a while. That’s okay because we’ll be left with the core of what’s true and valuable. If the technology is truly useful, we’ll eventually get back on our feet, this time stronger without parasites.

Will we ever get PMF as a better money?

2024-05-22T00:00:00+00:00

This post will discuss the possibilities in the market for blockchain-based money and financial products outside of speculation. I will argue that the situation isn’t great.

The product-market-fit of money is the stability and ease of transacting

The thing about money itself is that you don’t want it in itself, you want it because you think someone else will accept it in the future, in exchange for something that you do want. Good money is a bubble that doesn’t pop, because the forces buoying its value upwards (e.g., collective dreams like trust in its future value and exchangeability) are stronger than the forces dragging it back to its wretched little reality as a slip of paper, a lump of shiny metal, an entry in a database, or a u256 in the Ethereum state.

The US dollar is historically a somewhat stable money. The powerful US government backs it, and although their level of fiduciary responsibility to minority dollar holders has been under question lately as they massively dilute the share pool, we have spent the past 10 years feeling largely comfortable that we’ll be able to buy yogurt for a price within an order of magnitude of where it sells today.

Dissecting the Fiat Moat

A huge number of “micro-forces” are constantly re-centering the dollar’s value. Every time an exchange takes place, both parties nod at the value of the dollar in terms of its purchasing power for concrete goods with use-value. Every second, the dollar is hammered into human minds as having the possibility to rapidly turn into a concrete parcel of yogurt, gas, financial risk, stock, weapons, drugs, animals, land, sex, medicine, or movie tickets.

Uncertainty about the value of the dollar is (most of the time) squishy and theoretical for the average person, measured in “9% inflation over a year”- a tenth of a dollar over a full year of paychecks. You can make arguments about the dollar (and Miko did) having devalued 93% in a century, but the frog seems pretty happy in its rapidly warming pot, for all Coinbase’s beautifully constructed propaganda. Read “When Money Dies” for a beautifully documented historical example of what goes down during hyperinflation in a developed society very similar to our own- the details shed a lot of light on how you can expect humans to behave.

People express feeling short USD annually in elections, complaining online about the government, or (for hedge funds and doomsday preppers) hedging. Almost nobody shorts USD by churning away from using USD. The threat of catastrophic hyperinflation is a distant black swan that we all mostly ignore, which is necessary for society to continue functioning.

Certainty about the dollar is thrown in our faces daily, and is measured in being able to obtain Chipotle with (lately a somewhat larger amount of) Apple Pay US Dollars. People express their certainty about USD daily by continuing to participate in the dollar economy instead of defecting to euros, yuan, gold, or canned goods, and they constantly witness others expressing these signs of certainty and stability as they participate in the market.

This leads into the other side of fiat’s moat: Your bartender and your grocer and your barber and your landlord and your tax-man all take USD. You will carry a method of giving them USD until almost all of them no longer accept USD. To start inconveniencing yourself to carry Money2, you’ll need to expect be forced to pay Money2 when you’re doing errands, which statistically means a sizable chunk of places will need to accept exclusively Money2. But they won’t accept it until everyone else does either.

Money is extremely hard to dislodge

Between these two factors, we see a very powerful self-reinforcing emergent phenomenon that coexists well with the human tendencies to discount theoretical future risks, prioritize immediate needs, prioritize convenience, and go along with the group opinions. Money evolved with our society, and we need it as much as it needs us, but the relationship is mostly very comfortable and stable.

Some users get an extremely bad UX with the fiat money in itself, like people in countries with hyperinflation or cash shortages, but this is historically quite rare. A serious inflation event that dislodges the day-to-day trust in money, combined with good UX of stability on an alternative solution, could theoretically pull people away from fiat money. People are not likely to undertake the extremely costly move of moving money systems without a societal catastrophe, and the government is not likely to facilitate it. Additionally, they’re much more likely to move to a different fiat money system that excitedly welcomes them into a usable and well-developed infrastructure.

Can crypto expect to dislodge users?

No crypto (except pegged assets) has the requisite stability or widespread usage to tick either of these boxes. The forces weighing on the prices of Bitcoin and Ethereum (the most stably valuable non-pegged assets) are:

the price of electricity or risk-free rates (for proof of work and proof of stake) (small effect),
the day-to-day noise of retail speculation (small effect),
the demand for blockspace based on user volume in various apps (none with a durable user base) (small effect),
and the macro trends of ETFs and other institutional capital inflows, government regulation, risk-off behavior during low interest rates, which drive large momentum from follow-on retail and institutional speculation. (in combination, an extremely large effect)

This makes Bitcoin and Ethereum unsuitable candidates for being money in themselves. There isn’t any quick go-to-market that I see to get them to that point- even if crypto gets traction, people will defect to using stable pegged assets atop them, because of the volatile forces above. The stability of fiat comes from the network effects, but is also a precursor to achieving product market fit and growth. Extremely problematic.

An economic catastrophe like dollar hyperinflation is not bullish for crypto, except via the secondary effects of institutional buys who believe the thesis that fiat collapse means crypto success. Economic collapse is likely to push people to join a currency zone with existing stability, infrastructure, and network effects. As I alluded to earlier, the owner of the incoming currency zone will shovel resources into promoting this shift, because it cements their soft global power over the collapsing economy. We’ve seen this historically over and over again in cases of colonization, predatory “development” investing, hyperinflation, and governmental collapses.

Unpegged cryptos without nation-state support lack the power, stability, and usable infrastructure to properly take advantage of these situations.

As I see it, Bitcoin and Ethereum may be investable as short-term macro or regulatory bets, medium-term bets on their ecosystems’ product-market fit, or long-term (in my opinion clumsy) fiat hedges, but they won’t be finding product-market fit as money.

Are gas fees bullish? Theoretically, yes, but practically…

Gas fees could stabilize the price of Ethereum or Bitcoin by having demand for network compute becoming the predominating factor in the price. However, this could only happen by getting stable, scalable product market fit for apps built atop them. The only app that really can be built atop Bitcoin is Bitcoin, which creates a bit of a chicken and egg situation, because Bitcoin is not good money. The only apps atop public blockchains like Tron and Ethereum that seem to have strong and durable product-market fit are Farcaster (durable crypto social), ponzi-scheme timing games and other popular casinos (Uniswap, DeFi, pump.fun, OpenSea), and stablecoins/RWA (proxies for non-digital value, intended for avoiding transaction UX problems).

I have more theoretical arguments about why other apps don’t need blockchains, but the fact that these are the only ones that have non-farming stabilized demand after fifteen years of web3 and billions of dollars of capital investment is a sufficient argument, in my opinion.

The first two are easy to strike out as non-durable/scalable and unlikely to make a public blockchain’s gas token stable enough to be used as “world money”. Farcaster only has product-market fit because people believe in crypto (or, cynically, want to land on the right side of a ponzi)- another chicken-and-egg that can’t necessarily scale to become a massive emergent phenomenon like money. Our gambling is a prime target for regulation, especially because our casinos are more degenerate and predatory than any house in Vegas, and we are virtually a news factory for ugly high-profile fraud cases. Stablecoins are… a bit more promising.

Stablecoins and RWAs

Pegged assets’ suitability to be “proxy fiat” obviously depends on the quality of the peg, as we’ve seen throughout crypto.

The highest-quality fiat pegs hold treasury bonds in a high-quality DAO or regulated and audited entity, and track balances on-chain in an ERC20. This makes them more or less a standard bank, except with their database on a blockchain. This also opens them up to regulation, because the most trustworthy design for an on-chain asset involves a centralized entity that can be laser-targeted by regulators. Obviously, this is a prime target for regulation and regulatory capture. Therefore, I am not convinced that the blue-chip stablecoins’ product-market fit atop existing public blockchains is scalable: even at our current small level of traction, Circle supports sanctions for USDC as a first-class feature. Clearly, the government could twist their arm to do much more.

The highest-quality RWA pegs are in the same situation. It is very rare to have a purely digital object with real and non-speculative stable value: data brokerage is harebrained for reasons I’ll eventually write about, and decentralized cloud as a digital commodity is a shitshow for reasons I’ll also eventually write about. Most “digital” products are a title of a physical object, which is secretly just a piece of paper that lets you point the government’s monopoly on violence at whoever claims they own your thing. This means the guarantor of the title (DAO, company with a smart contract, whoever) needs to be able to back up their claims to be able to enforce the state of the database in the real world. They either need to be able to (and trusted to) protect the resources themselves, or the government needs to recognize that this company and its smart contract is reflecting real enforceable property rights. These are, once again, an obvious target for regulation and regulatory capture, and the only reason we don’t see pushier enforcement is that none of them have sizable traction.

“Well, fine, the RWAs and stablecoins can just coexist with regulation and exist happily in crypto, so my Ethereum bag is fine after all!” Actually, no. The regulatory burdens will only increase with time and traction, because we are messing with the financial system, which is a cherished US government muscle for soft global power.

An entity or consortium that can regulatory-capture stablecoins, RWAs, and CBDCs will have no reason to pay a gas royalty to the Ethereum network or even use a non-proof-of-authority blockchain. This entity or consortium is likely to be someone with finance, government, and NGO connections, who defects from the “web3 community” to build their own thing to the precise specification of the powers that be. In fact, this startup probably already exists, is working their network, and will bring a product to market by 2030. Meanwhile, a project that is successfully dodging regulators and executing on the anarcho-capitalist dream will never be allowed to scale beyond pockets of criminals and the dispossessed.

Our win won’t be money. It is banking for niche markets.

I won’t comment further on the competitive landscape of CBDCs and regulated RWAs because I’m not a regulatory expert, but I will say a few words about the “anarcho-capitalist dream” market.

Cash and checks had a shitty UX, so we now use digital US dollars administered by tightly-regulated banks who occasionally gently fuck us with overdraft fees, hazing about attempted withdrawals and wire transfers, sanction enforcement, and account closures. These bad things aren’t bad enough for most users to churn, so we all mostly use Apple Pay and Chase. There are a few exceptions to this rule, and those are what’s capturable.

Some users have an extremely bad UX with banks and money in ways that haven’t been solved yet. This includes people in industries that the banks feel are too risky to serve (drug dealers, organized crime, sex workers, crypto buyers, marijuana industry, startups, sanction evaders), people who are attempting to wire money overseas between shitty or incompatible banking systems, and people who exist in states with unstable, inflated, cash-insufficient, or otherwise degraded money systems. The overwhelming majority of transaction volume is not here, but we can serve these people.

In fact, we do, and that’s probably most of the non-speculative non-farming volume across the entire industry. There is certainly more market to capture, and more little niches within these segments to find and serve. Tron and USDT, people who slip through the gaps with USDC, privacy coins and swaps, and Venmo-like transfer apps are all finding strong product-market fit when they successfully go to market with people who are underserved with the banking industry.

Conclusion

All this is to say that the FDV of non-pegged public-blockchain L1 tokens should not ever be expected to grow to the market size of “money”. If you remove the enormous speculative markups, the correct valuation as a long-term asset is probably the expected distributed computation and consensus overhead costs to process the transaction volume of underserved people who churn from fiat due to UX issues with banks, regulation, or the money itself.

Arguing for expanded market caps on most of these assets is nonsensical. Maybe the word for it isn’t “fraud”, but it’s in the neighborhood. I don’t think there’s a navigable path to market capture, because the lack of stability and lack of network effects present a chicken-and-egg problem without the power of a nation-state to catalyze and enforce the currency on the market.

Technological issues are entirely orthogonal to the real problem.

Stablecoins and RWAs could print money, but most non-speculative value lies off-chain, and only the really good pegs are trustworthy. “Really good” means holding treasury bonds and legacy signifiers of trust and compliance. The ones that can capture sizable chunks of consumer and intra-financial-institution volume will not be built on public blockchains, and will be almost completely regulatorily captured, because the powers that be are both powerful and quite defensive of their monetary power.

I think we have found one durable PMF as peer-to-peer money for more organized and intelligent criminals. Tron/USDT is on a winning path to being a regulatory-loophole cross-border money for people and nations who are financially dispossessed.

We also have strong PMF for getting large sums of capital out of restrictive markets (China) in a legitimate-seeming way, although that’s not happening through simple token transfers, it’s happening via venture capital deals, which is a story for another time.

I find the equity of CBDC and RWA startups with extremely strong (interpersonal and professional) networks in government and finance highly investible.

I also might make a bet on the tokens of projects with a plausible case for strong consumer go-to-market (do not underestimate how difficult this is…) among a financially dispossessed niche, with a team that’s properly offshored and trained for jiu-jitsu with both nation-states and organized crime.

If I could bet on the market cap of USDT, I would. I am not comfortable with buying TRON token as a bet on their network size, for obvious Justin Sun reasons.

I am not bullish about much else around here “becoming money” or “eating all land deeds”.

Not financial advice. Ever. Why would you take financial advice from me. Please refrain.

Tutorial Introduction to MDL

2024-05-11T00:00:00+00:00

Source post here.

UNDER CONSTRUCTION! as of may 13 2024. please excuse the awful formatting, I’m working in markdown and not compiling to HTML

What a great start, this is eighty pages long. It’s remarkably friendly- if you’re using my notes to skip the papers, maybe just go read this one and refer back to my notes when you’re having trouble parsing something. (or want some fun Claudia insights about applications to broader ML/my one true love cryptography)

Background

The start of the paper: the MDL (Minimum Description Length) is a method for inductive inference. What’s inductive inference, you ask? It’s the task of finding a general model from a finite set of training data. You want to fit the observed data well, but also generalize to data from the same distribution that you haven’t seen before.

If you’ve heard the terms “underfitting” and “overfitting”, those are describing not fitting the observed data well enough (e.g. not learning enough about it to predict future observations well), and fitting the observed data too well (e.g. seeing patterns that aren’t there and messing up accordingly on future observations).

MDL more or less says that any pattern you find in the data can be used to compress the data. The more you’ve learned about the data, the more you’re able to compress it, and vice versa. You want to minimize both the length of the description of your model, and the length of the observations given the description of the model, and then you’ll be able to predict the future (sort of).

MDL has some extremely nice properties:

it looks like Occam’s razor,
it naturally protects against overfitting, unlike maximum likelihood estimators
It has a Bayesian-ish vibe, but avoids the weird interpretation issues when we know there’s not REALLY a ground-truth distribution to learn
it doesn’t make any assumptions about whether there is some “underlying truth” (great, nobody who studies language today is a hardcore Platonist)
data compression and prediction are formally equivalent for a certain definition of prediction, MDL is leveraging this in order to predict
Kolmogorov Complexity for Compression

Let’s start: imagine some sequences of bits, like “101010” (a pattern) and “111011111011110101” (far more ones than zeros but otherwise random) and a completely random string. The ones with regularities you can spot are compressible with some O-notation bound on how compressible it is.

The next step is how we should go about compressing these strings, and we go with Kolmogorov Complexity. If you can write a computer program that is able to write the sequence out, and the computer program is shorter than the length of the sequence, you have successfully compressed it. Kolmogorov Complexity is defined as the length of the shortest computer program that prints out the sequence in question. This sounds really arbitrary and up to choice of programming language, but it turns out there’s a proof that asymptotically all programming languages are only a constant factor apart.

Backing up from this definition of Kolmogorov Complexity, you can get an “idealized MDL” for an “ultimate model of the data”, which is exactly just this shortest computer program… Unfortunately, this is both practically uncomputable (there’s a proof saying as much), and is dependent on the syntax of the programming language you choose. So, in reality, you take “practical” approaches to finding an MDL predictor, by using description methods that know things about the specific problem domain, generally doing the best you can in the situation.

Practical MDL, starting with “Crude MDL”

Grounding: We start with the simple example of picking a good polynomial estimation from a set of points without over or under fitting. Then we define some terms: “hypothesis” as a single probability distribution (in this case, a single polynomial), and “model” is a family of probability distributions with the same form (second-degree polynomials). Note how this maps onto our intuition from machine learning: a model is how you wire up the layers in Tensorflow and set up training, a hypothesis is a frozen-in-time partly-or-completely-trained model that you might test on a holdout set.

Next- we’ve finally arrived at some math! A crude definition of MDL. We define the “best model” in a simplified setting where we have a list of candidate models ($H_1$, $H_2$, $H_3$) containing hypotheses… imagine each one is “the set of all polynomials of degree N”. And we have a bunch of data we’re trying to explain, $D$.

$L(H)$ is the length in bits of the description of the hypothesis

$L(D	H)$, length of data given $H$, is the length in bits of the description of the data when you use the hypothesis to explain the data. This will be minimized with a well-chosen $H$, so you only have to encode discrepancies between $D$ and $H$ as an error term. More about this in a second

The best model is the one that minimizes $L(H) + L(D

H)$- fitting $D$ well without packing so much information into $H$ that it explodes in size

Now let’s make things more concrete. We’ll start with $L(D

H)$: Assume $Y=H(X) + Z$, where $Z$ is a noise term. To encode these errors, they’re using something called the Shannon-Fano code, which is not defined concretely but is proven to exist for all data sequences, and has length $L(D

H) = -\log P(D

H)$. This looks sort of like Huffman coding to me, but I think it’s maybe not actually defined over all probability distributions over the rational polynomials. But claim there’s a proof of existence in section 2.2, so I can’t wait to check it out.

Defining $L(H)$ gets a bit harder: How do you code hypotheses? Your choice of code affects the outcome of the procedure, because the same hypothesis can vary wildly in length based on code choice. We need to refine MDL a bit to make this make sense…

Refined MDL

The first thing is to smush the encoding into one part: you encode $D$ with respect to the entire model, instead of given one hypothesis. You design the code so that when there’s a member of the model class that fits the data well ($L(D|H)$ small), you get that $\bar{L}(D|\text{model})$ is also small. This $\bar{L}(D|\text{model})$ is called the “stochastic complexity” of the data, given the model.

Next, we add something called the “parametric complexity” of the model, denoted $\textbf{COMP}(\text{model})$, which is a measure of how rich the model is, e.g. the “geometrical structure and degrees of freedom”, which also indicates how well it fits random data.

There’s a relation between parametric complexity and stochastic complexity- let $\hat{H}$ be the distribution in the model that maximizes the probability of $D$ and therefore minimizes the complexity of the hypothesis. $\bar{L}(D|\text{model})= L(D|\hat{H}) + \textbf{COMP}(\text{model})$ Note that this is the same as one of the early attempts at calculating “crude MDL” mentioned in the paper, where the researchers were choosing hypothesis codings that would minimize this term… Except this one cleverly elides the hypothesis coding problem entirely, so there’s no arbitrariness, and you get something concrete that you can compute.

The paper goes on to list four interpretations of this definition:

Counting states: parametric complexity of a model is the logarithm of distinguishable hypotheses within it
Two part coding: we’re reducing back to the crude MDL definition, e.g., stochastic complexity is a two part code’s length, where you break the model into “maximally distinguishable hypotheses” and then just assign them all the same codelength ($\textbf{COMP}(\text{model})$)
Bayesian: a non-informative prior aims to minimize the bias introduced by prior assumptions onto the bayesian inference procedure as you learn over data. This procedure is more or less doing the same thing- it’s picking the model based on minimizing the code length of a best-fit hypothesis for random data.
“prequential interpretation”: you’re selecting the model with the best performance on unseen test data. This means that you’re encoding everything you know about the problem and nothing else, and leaving the model training to handle your hypothesis test: e.g., optimal breakdown of the problem!
Rissanen’s philosophy about MDL

Rissanen invented MDL.

His thoughts about it: you don’t want to assume the observed data was generated by a distribution and try to fit. Instead, you want to start with almost nothing assumed about the structure of the data, and you just want to wring as much “regularity” out of the data as possible in order to learn about it (and therefore compress it).

Second, he interprets models as languages for describing the useful properties of data, and hypotheses are to be interpreted as metrics and statistics about the data seen, that summarize certain regularities. They’re meaningful regardless of whether the hypothesis is the “state of nature”, which honestly… if you look at human language… would be a pretty silly thing to talk about. Additionally, noise is not defined relative to some theoretical probability distribution, but as relative to the model once you’ve found the hypothesis and observed the data- e.g., it’s the remainder after you’ve pulled out as much regularity as you can using the current model! It’s just a measure of how good the model can suit the data in question.

Third: do not use methods of inductive inference that assume there’s a “true state of nature” you’re attempting to approximate. These methods (Markov models for language generation, anyone?) have incorrect assumptions that will bound their “fit” to reality. They can still be useful, but they’re not learning from the data alone, they’re learning from the data while hobbled by incorrect assumptions.

Fourth:

Understanding LSTM Networks

2024-05-09T00:00:00+00:00

Source post here.

This post is a kind and simple introduction to RNNs (Recursive Neural Nets) and the magical LSTM (a special kind of RNN, a Long Short Term Memory network). RNNs allow the neural net to have continuity of thought- they build up a context from intermediate states as they observe them in order, outputting information and editing the internal context based on each state they see, to allow contextual processing that streams forward in time. RNNs work by forwarding context from one iteration to the next as it steps through inputs, adds them into the context, and outputs outputs.

RNNs have to do a hard task.

Hopefully, this allows the net to do things like answer the query: “The dog ate my spaghetti. Who ate my spaghetti?”. By the time the RNN has finished parsing the entire query, it needs to have encoded into its context both that we’re looking for the naughty spaghetti eater, and that that was an (adorable) dog. The dog was seven whole tokens, seven whole iterations of net, before we get to the end. So obviously, this needs to be safely stored somewhere until it’s question-answering time.

In a traditional RNN, that was usually just one tanh layer. And did that tanh layer ever have a ton of jobs to do! To simplify, let’s assume that it’s just trained on a lot of English prompts where there are one or two simple statements and then a simple question with the answer in the first two statements.

To perform on this task, it has to encode English grammar and parts of speech, from token embeddings. It has to learn that the answers to questions are likely to be “interesting”, so it chooses well what to remember, because it is compressing and combining its inputs into its internal state as it runs. It has to learn a simple model of relationships between objects in the world so that it doesn’t think that the spaghetti ate itself! And it has to remember both dog and spaghetti and their relationship until output time, correctly disambiguate between them once it gets to the end of the question, and know when to forget irrelevant information.

None of these are easy tasks, and it’s probably quite a bit to train into a single tanh layer to both learn about all these relationships and all of this choosing where to focus its attention and how to load its small memory choosily, because it can’t go back to re-scan the start of the sentence.

They fail at this hard task. LSTMs succeed.

This intuition turns out to be correct. RNNs with this architecture are not good at learning long term dependencies. They forget things. The solution that the LSTM presents is basically giving them a better way to store and organize information rules as they train. You’re hard-coding that the internal layers of the RNN need to do the following when they see each input token and recall the most recent output token: forget some information from the old state, add some other information to the old state, and output a token.

The relationships between these are below. The first sigmoid layer picks things to ablate from the memory based on x_t (new input token) and h_{t-1} (last output token). The next sigmoid layer distorts the things we’re going to add to the context, by creating weights with which to scale what we’re adding to the new context, based on the same input. The tanh layer encodes what we’re adding into the context, again based on the same input. Then we multiply them and add that to the context. Finally, we tanh the context, take a sigmoid layer on the most recent updates (to decide how they impact the output), multiply them, and out comes our next output token. Then this output token and our context go back into the next iteration of the neural net.

Note that all of these functions are smoothly differentiable, which makes for nice backpropagation during training, and each little net inside the RNN now has a defined question to answer, so the original RNN monolayer doesn’t have to learn how to learn anymore.

What new information does this new input token, and the presence of the last output token, bring me about the things I can forget about from the context if I want to perform well on my task? (sigmoid 1)
What does the last output and the new input tell me about what I need to add to the context to perform well? (tanh)
What does the last output and new input tell me about the changes from the old context to the new context? (sigmoid 2)
Finally, what do I output, based on the updated context, last output, and new input? (sigmoid 3)

Now it just has to learn to do these four tasks, which still includes all the english structure, but omits all the learning about learning and how information relates through time and sentence structure. And, unsurprisingly, with this issue fixed, LSTMs learn long-term dependencies much better than the Vanilla RNNs we started with.

Insights for future developments (spoiler alert!)

There are a few interesting things I notice about these four questions.

First: 1, 2, and 3 are not clearly defined as separate. If these three were roles in a startup, I’d expect them to fight a lot about boundaries of responsibility! The GRU, Gated Recurrent Unit, is a paper that also notices this, and combines these three layers into one “update” layer. This shrinks the number of connections, probably, so they’re probably more efficient to train? I remember my professors saying that these had similar performance to LSTMs, and we used them in class for projects.

Second: this architecture is really making some hardline decisions about chopping up the information that layers have access to! And not everything has access to the cell state, which I think would be important for deciding what to forget- as a (wildly unsuited to this architecture) example, an AP History AI that’s “studying” a textbook would be able to heavily discount any information that it detects as being inside a footnote, because that’s unlikely to be on the test. Peephole connections are the solution to this- they patch the cell state back in as an input to the gate. Greatly increased number of weights here, but it’s able to learn to count (and probably improve on some other tasks, as well).

Finally: I don’t feel like I read straight through anything. I jump around until I feel that I have context, filling gaps and reinforcing concepts until I walk away with the answer to my question. One solution to this, unmentioned in this post, is a bidirectional LSTM (which I wrote in grad school). It parses backwards as well as forwards- building up state in both directions, and then putting the context from both directions into the final output tokens. From the image below, you can probably work out where the wires end up going. This helps somewhat with broadening context and allowing more flexibility in where the “blinders” are allowed to go when the LSTM is looking at inputs. Even more complex is the multi-layered LSTM, where you have multiple layers that take prior states as inputs, washing back and forth through inputs and outputs. These architectures are sort of ungodly, I had to implement this in grad school as well. They do work though.

But- (Billy Mays voice)- there has to be a better way than “you can go forward, or you can go backward, and sometimes you can do both, but jumping around like a human does is out of the question”, right? and it can’t be mapping onto the human analogue of zipping your eyes around a 2D page, or zipping your brain around a knowledge graph of squishy mental associations, because that’s a not only a bunch of unnecessary information for most LLM tasks, but also immensely computationally intractable.

The answer turns out to be something called attention, which lets the LLM take a holistic view of the entire sequence at each output, train on spotting important things from it, and think from there. This is something that we started by layering onto the LSTMs and RNNs, and then we let this tech drive on its own with no RNN in 2017 when we invented the transformer. Transformers are the architecture that brought you chatGPT and working machine learning. As it turns out, teaching the nets how to learn with LSTMs was pretty good, but letting them learn how to learn with attention layers was what brought them near human-level cognition. Read about that in another post from this series later on :)

C2PA Bear Thoughts

2024-05-09T00:00:00+00:00

This is a Twitter thread I posted about C2PA, reproduced and edited here for posterity because my Twitter auto-deletes.

Today, TikTok joined the CAI (content authenticity alliance). What does this mean for blockchain people trying to make money from AI safety/data/provenance? Tl;dr, it’s not a great situation, but we already sort of knew that.

Who is the CAI?

For some background: what do you need to know about the CAI? Really, the below screenshot says plenty. It’s Adobe-led (which means most creative production), using specifications whose development was incubated by the Linux Foundation (C2PA).

Note the market share of social media / content delivery platforms, camera brands (Sony/Canon/Nikon), media and PR institutions (NHK/BBC/Publicis Groupe), large market share of devices and software (Intel/ARM/Google/Microsoft), and more. They’re all in the C2PA. Check the members list.

What are they selling?

They’re not selling anything, they’re just promoting open-source technology. Now let’s look at the technology involved here: w3c verified credentials, and chains thereof. C2PA is more or less a format for passing around, creating, and consuming stacks of attestations about content.

If you want to deep-dive, this document is very helpful because it shows “C2PA in context” instead of making you comb through piles of “the first three bytes MAY BE reserved” RFC bullshit: https://c2pa.org/specifications/specifications/1.2/guidance/Guidance.html…

Worked C2PA Example

For your ease of mindless scrolling, I’ve written a worked example through the entire data supply chain, with UX consideration at every step.

You snap a photo. Your Canon generates a hardware signature using an onboard PUF (physically unforgeable function) or an SGX/TEE with a certificate chain back to the manufacturer. This is encoded into a verifiable credential and added to the attestations.

You transform the photo into a PNG. The app (let’s say it’s Imagemagick?) either signs some attestation that it performed the transformation in SGX, or generates a ZK proof of the transformation (see this blog post by Dan Boneh for more detail https://medium.com/@boneh/using-zk-proofs-to-fight-disinformation-17e7d57fe52f…).

You use Adobe Photoshop to add your brother to the image. Adobe summarizes the transformations you performed to get from the PNG to the PNG with your brother, and adds them to the C2PA record when you save the image.

You put the image into a TikTok or share it on Twitter. These interfaces display the C2PA record of your image, probably in a format with highly simplified UX that only reveals the relevant information to the user: this is a real image, with some substantial changes to the content in Photoshop.

Another creator stitches or quote-tweets your image. C2PA credentials may have additions (with stitches) or may just be displayed as a stack of credentials.

Moat? Noat for you.

Eventually, C2PA could become like HTTPS, where the lock in the browser eventually goes away to be replaced with scary warnings when the proper certifications are not present.

So- this all seems pretty opt-in, which is great for C2PA taking over the world, and less great for moat of a business attempting to make money off of being a C2PA protocol.

It also looks like everyone who matters in terms of relevant market share is in, except maybe Apple who is probably doing some bitchy uncompetitive shenanigans in the background. This is not great for a startup trying to promote a competing standard or introduce a new solution with any sort of moat at all.

Where can a blockchain fit in?

Q1. You mentioned ZK earlier. What about all our ZK tech?

A1: Nobody in the real world will care about the security difference between ZK and SGX until it’s incredibly fast. And probably not even then. SGX is practical and safe enough for now.

Simple transformations like resizing an image (like that Dan Boneh paper above) are one thing, and ZK proving these is still not at an appropriate speed for these applications. genAI image editing workflows are getting popular, and even if you assume the generative parts of the editing flow are just signed with OpenAI’s keys instead of proven, it’s still just laughably complex and slow to prove right now.

Anyway, a few companies here in blockchain land have the runway to maybe survive until ZK GTM is practical (if they’re wise with their obscenely large VC rounds), but I’m not entirely sold that ZK will ever be practical for this. There is a lower bound on the computational complexity of ZK proving a computation that is strictly greater than the complexity of the origin computation, and image/video editing flows are the meatiest possible compute workloads that ever happen at the edge… so this is one of the last places I’d expect to see ZK market successes.

Q2: Timestamping? A blockchain proves that something existed at a point in time. You could sign a hash of the C2PA! And put it on chain so everyone can see everything!

A2 (bear/devil’s advocacy): What normal person doesn’t trust an MPC run by Google, Microsoft, a nation-state, and Intel, all signing and broadcasting timestamps?

A2 (less bear, and what I actually believe): See https://opentimestamps.org. This could be sped up over using Bitcoin. Writing a hash to Ethereum costs thousandths of a cent. You’d want to merkleize the hashes to compress them and bring down the cost per hash. Honestly, there might be something here, the amount of content being produced is absolutely staggering. You’d need a shortish block time, but ten seconds would probably be plenty, and you don’t need much capacity if you merkleize. Ethereum could handle this.

However, note the presence of Polygon on the C2PA website. This is CERTAINLY what they’re doing… I’m not bullish on a startup for getting this right and pulling the BD off, because you’d need to embed a blockchain client on every device.

Unfortunately, I think the only place that value will reliably accrue for this one is gas fees, and maybe a sequencer-esque moat for the merkleization (but there could be multiple competing sequencers posting these timestamps to the same contract. so just kidding, no moat here).

Q3: Decentralized STORAGE!??!? Our FAVE?!?

A3: I am sorry to be the bearer of bad news. Timestamping/publishing anything more than the proof of time-locked existence of the hash of (data + certificate chain) is unnecessary. Keeping the certificate chain publicly available is only useful when the data needs to be publicly available. This is a small percentage of all data, and the vast majority of use cases are happy to serve the data and its C2PA log from S3.

Q4: Decentralized compute that does verified transformations (like rendering, generation, or transcoding) over the data and perhaps even signs off on it in an MPC context?

A4: You can spend your one precious life attempting to get PMF with this, but I will not be doing so. Most users’ security models will be quite pleased with the semi-centralized guarantees they get with signatures from OpenAI and AWS. The comparable efficiency of getting tasks accomplished in the workplace will be even more delight-inspiring.

Q5: Decentralized attestation networks? Like decentralized community notes?

A5: Again, mostly quite bearish. As you see in the workflow above, most data that needs to be attested is generated as the content is generated or transformed, which means that the user will do it locally. This means there won’t be much of a market for decentralized validation- there’s not much information for parties outside the supply chain to add, so there’s nowhere for a protocol to insert itself. The two slivers of exception might be:

location: absolutely needs an external validator, easy to forge without the right protocol design, is possible to do securely, would need to rely on in-camera SGX + onboard clocks with monotonicity guarantees that only sign at photograph-time. this is the use case of FOAM network, which one of you degenerate token fiend Value Adders should really fund, damn it.
- rating: could probably make money with really freaking good GTM
decentralized community notes across the entire internet for attestations that can only be made by a real human: I think one of the media focused or NGO C2PA members is basically guaranteed to regulatory-capture this one. creds to @mattigags though for coming up with it.
- rating: only investable if the team has deep state/NGO/media/C2PA/social media connections, which very few of us web3 clowns possess

Conclusion

Everyone here has done a bunch of yelling about Blockchain And AI!!! AI safety is finally our time to shine! Oh my god!!!!!!! We found a real use case!!!!

To this I say: Hmm… maybe! If we have good GTM and execute. But… calm down, this is probably not a big enough wave to return your underwater fund.

Unfortunately: unless your name is Jaynti Kanani, I think the probability that you GMI off of this narrative is vanishingly small.

Enjoy!

Coindesk Decentralized Cloud Op-ed

2024-04-16T00:00:00+00:00

This is an op-ed about the current problems, and future promise, of the industry my startup is operating in.

It’s in CoinDesk. Read it here.

Enterprise PMF for Filecoin talk

2023-09-04T00:00:00+00:00

I gave a talk at Fildev Reykjavik in September 2023 about why Filecoin doesn’t have product-market fit, the steps necessary to get there, and how my company attempts to make the network work. Based on market and product research conducted at Banyan.

Watch it here.

Side zk work

2023-08-10T00:00:00+00:00

While I was at Zuzalu, I got pretty into writing Rust code for ZK / specifically folding.

I worked with Lev Soukhanov on implementing his “folding endgame” here. This design allows folded ZK circuit steps to pass verified information around without expensive lookup gates, memory accesses, or adding other state to verify to a ZKVM. You “leak” part of the witness and pass it around, then do a consistency check on the reads and writes from the “leaked witness bits”. It’s a modification to the Nova codebase.

I worked on and community-managed contributions to a Rust rewrite of Aztec’s Barretenberg here for a while afterwards.

Also see my post about p1nch.

Trustless CDN Design

2023-04-17T00:00:00+00:00

I gave a talk about good design and game theory for a trustless CDN at the HTTP Gateways track of IPFS Thing in Brussels, Belgium.

The PPTX is here.

Check out the speaker notes.

Matt Stephenson from Pantera formalized some of the peer-to-peer game theory here.

We haven’t built this because it making money would depend on the demand side. Which currently doesn’t exist. You have to get popular apps on (or built on) it. Good luck, nobody so far has succeeded, millions of venture capital dollars have gone into this hole.

P1nch

2023-03-05T00:00:00+00:00

We (Lev Stambler and I) wrote a cool little batched-private-defi-execution app for Ethereum with Lev Stambler for the Zuzalu ZK hackathon.

The concept is like Tornado Cash but for swapping. It’s built something like Penumbra minus the homomorphic shielding (thanks to Henry de Valence for input on this aspect of the design!), and it’s also something like ZCash but with an added “swap” transaction type.

It is not done, absolutely unaudited, do not use this. I mostly focused on the sequencer and the solidity, he wrote the Circom, and we co-created the design.

Check out the GitHub here

Claudia Website!

How We Escaped Dev Environment Hell (And Made It Agent-Friendly)

The Problem: Death by a Thousand Paper Cuts

The Human Cost

The Insight: Dev Environment as Product

The Architecture

Traefik: One Port, Many Services

The Landing Page: Everything in One Place

Shared Observability Stack

Multi-Worktree Isolation

VPC Tunneling: The RDS Problem

The Makefile: One Command for Everything

Documentation That Lives With the Code

Making It Agent-Friendly

The Result

The Technical Details

What I Learned

Getting Started

How I Used AI to Outsource My Executive Function

The Problem: Context Scatter as a Founder

The Insight: What If AI Could Be My Prefrontal Cortex?

The System I Built

/morning - The Daily Aggregator

The Hybrid Approach: Dashboards That Won’t Automate

Why I Could Never Do GTD Before (And Why Claude Changed That)

AskUserQuestion as ADHD Medication

The “Brainrotted Zoomer” Interface

Why This Finally Works

/inbox - Email Triage Without the Dread

The Recursive Loop: Everything Flows Back

/research-task - When You Don’t Know What To Do

/hevy - AI-Built PT Routines

AWS Lambda + SSM: “Claude 1Password”

Grocery Lists + Nutrition Autopilot

The Expanding Frontier

The Snowball Effect

What “Autopilot” Actually Means

The Technical Stack

What This Means

Getting Started

Token: Considered Harmful

Hot take: Ponzi schemes can be beautiful

Misfiring the Money Cannon

Vesting and “trying to get in early”

Lack of securities regulation, much time-wasting trying to define tokens as a not-a-security

Things are not well here

Can we clean it up?

conclusion

Will we ever get PMF as a better money?

The product-market-fit of money is the stability and ease of transacting

Dissecting the Fiat Moat

Money is extremely hard to dislodge

Can crypto expect to dislodge users?

Are gas fees bullish? Theoretically, yes, but practically…

Stablecoins and RWAs

Our win won’t be money. It is banking for niche markets.

Conclusion

Tutorial Introduction to MDL

Source post here.

UNDER CONSTRUCTION! as of may 13 2024. please excuse the awful formatting, I’m working in markdown and not compiling to HTML

Background

Kolmogorov Complexity for Compression

Practical MDL, starting with “Crude MDL”

Refined MDL

Rissanen’s philosophy about MDL

Understanding LSTM Networks

Source post here.

RNNs have to do a hard task.

They fail at this hard task. LSTMs succeed.

Insights for future developments (spoiler alert!)

C2PA Bear Thoughts

Who is the CAI?

What are they selling?

Worked C2PA Example

Moat? Noat for you.

Where can a blockchain fit in?

Q1. You mentioned ZK earlier. What about all our ZK tech?

Q2: Timestamping? A blockchain proves that something existed at a point in time. You could sign a hash of the C2PA! And put it on chain so everyone can see everything!

Q3: Decentralized STORAGE!??!? Our FAVE?!?

Q4: Decentralized compute that does verified transformations (like rendering, generation, or transcoding) over the data and perhaps even signs off on it in an MPC context?

`/morning` - The Daily Aggregator

`/inbox` - Email Triage Without the Dread

`/research-task` - When You Don’t Know What To Do

`/hevy` - AI-Built PT Routines