what is a repository?
in ATProto, a repository is a user’s personal data store - a signed collection of records that represents everything that user has created. each account has exactly one repository. think of it as:- git repo, but for database records
- key-value store where keys are paths, values are records
- cryptographically signed (verifiable, tamper-evident)
- portable (can move between PDS instances)
where repositories live
repositories are hosted on personal data servers (PDS):- user owns the repository (has signing keys)
- user can migrate to different pds
- pds failure doesn’t mean data loss (repositories are portable)
repository structure
repositories organize records using a path-based structure:collections
collections group related records together:- identified by NSID (namespaced ID)
- all records of same type go in same collection
- examples: posts, likes, follows, profiles
- what fields are required/optional
- what types those fields must be
- validation rules
record keys (rkeys)
rkeys uniquely identify a record within its collection:- most use tid (timestamp identifier) format
- some use fixed keys (
selffor profiles) - chronologically sortable (tids)
- unique within collection
merkle search tree (MST)
internally, repositories use a merkle search tree:- efficient listing: records in same collection are adjacent
- verifiable: each commit has a hash, creates audit trail
- deterministic: same records = same tree structure
pdsx ls app.bsky.feed.post, you’re traversing this tree to enumerate records in that collection.
repository commits
every change creates a signed commit:- writes create new commits
- repositories have full history
- commits are cryptographically signed (verifiable)
DID as repository identifier
each repository is identified by its owner’s DID (decentralized identifier):- is permanent (doesn’t change if user moves PDS)
- controls the repository (has signing keys)
- resolves to PDS location (via DID document)
authentication model
repository access has two modes:unauthenticated reads
anyone can read public records from any repository:authenticated writes
writing to a repository requires:- handle + password
- creates session with pds
- session proves you control the did
- public data is public (no gatekeeping)
- private data stays private (auth required)
- users control their repositories (not the pds)
cross-repository references
records can reference records in other repositories:- URI (which repository + collection + rkey)
- CID (content hash for verification)
repository portability
repositories can move between pds instances:- export repository (all records + blobs)
- update did document (point to new pds)
- import to new pds
- old pds becomes irrelevant
how pdsx uses repositories
reading records
- resolve handle → did
- resolve did → pds url
- fetch records from pds
- return to user
writing records
- authenticate with pds
- construct record
- submit to pds
- pds updates repository mst
- pds creates signed commit
listing collections
federation model
atproto is federated:- many pds instances
- each hosts many repositories
- users choose their pds
- repositories are portable
- bsky.social (bluesky’s pds)
- self-hosted pds
- third-party providers
why understanding repositories matters
knowing the repository model helps you understand: why handles vs dids: handles change, dids don’t - use dids for durability why collections: records are grouped by type for efficient access why authentication works this way: reading public data doesn’t need auth, writing does why records are portable: repositories are user-owned, pds is just hosting why pdsx has the flags it does:-r specifies which repository, --pds specifies which server
from concept to command
the repository model maps directly to pdsx commands: reading from any repository (no auth needed):-r flag exists: it specifies which repository to read from. without it, pdsx uses your authenticated repository (from client.me.did)