Repositories

what is a repository?

in ATProto, a repository is a user’s personal data store - a signed collection of records that represents everything that user has created. each account has exactly one repository. think of it as:

git repo, but for database records
key-value store where keys are paths, values are records
cryptographically signed (verifiable, tamper-evident)
portable (can move between PDS instances)

where repositories live

repositories are hosted on personal data servers (PDS):

user account → repository → hosted on pds

the pds is just storage - it doesn’t control the data:

user owns the repository (has signing keys)
user can migrate to different pds
pds failure doesn’t mean data loss (repositories are portable)

pdsx talks directly to pds instances to read/write records in repositories.

repository structure

repositories organize records using a path-based structure:

<collection>/<rkey>

example paths:

app.bsky.feed.post/3jwdwj2ctlk26
app.bsky.actor.profile/self
app.bsky.graph.follow/3k2y5tqw7ml2r

collections

collections group related records together:

identified by NSID (namespaced ID)
all records of same type go in same collection
examples: posts, likes, follows, profiles

collections are not arbitrary - they’re defined by lexicon schemas that specify:

what fields are required/optional
what types those fields must be
validation rules

record keys (rkeys)

rkeys uniquely identify a record within its collection:

most use tid (timestamp identifier) format
some use fixed keys (self for profiles)
chronologically sortable (tids)
unique within collection

merkle search tree (MST)

internally, repositories use a merkle search tree:

records → sorted by path → stored in MST → referenced by hash (CID)

why this matters for pdsx:

efficient listing: records in same collection are adjacent
verifiable: each commit has a hash, creates audit trail
deterministic: same records = same tree structure

when you use pdsx ls app.bsky.feed.post, you’re traversing this tree to enumerate records in that collection.

repository commits

every change creates a signed commit:

change records → create new MST → sign commit → new CID

commits form a chain:

commit 1 → commit 2 → commit 3 (current)

implications for pdsx:

writes create new commits
repositories have full history
commits are cryptographically signed (verifiable)

DID as repository identifier

each repository is identified by its owner’s DID (decentralized identifier):

at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.feed.post/3jwdwj2ctlk26
     ^------ repository (DID) -------^

the DID:

is permanent (doesn’t change if user moves PDS)
controls the repository (has signing keys)
resolves to PDS location (via DID document)

authentication model

repository access has two modes:

unauthenticated reads

anyone can read public records from any repository:

# no auth needed - reading someone else's public data
pdsx -r zzstoatzz.io ls app.bsky.feed.post

pdsx resolves the handle → DID → PDS location → fetches records

authenticated writes

writing to a repository requires:

handle + password
creates session with pds
session proves you control the did

# auth needed - writing to your own repository
export ATPROTO_HANDLE=your.handle
export ATPROTO_PASSWORD=your-password
pdsx create app.bsky.feed.post text='hello'

why this design?

public data is public (no gatekeeping)
private data stays private (auth required)
users control their repositories (not the pds)

cross-repository references

records can reference records in other repositories:

{
  "$type": "app.bsky.feed.like",
  "subject": {
    "uri": "at://did:plc:other-user/app.bsky.feed.post/abc123",
    "cid": "bafyrei..."
  }
}

the reference includes:

URI (which repository + collection + rkey)
CID (content hash for verification)

this is a strong reference - includes hash to detect if referenced record changes.

repository portability

repositories can move between pds instances:

export repository (all records + blobs)
update did document (point to new pds)
import to new pds
old pds becomes irrelevant

pdsx doesn’t care where repository is hosted - it follows the did to find the current pds.

how pdsx uses repositories

reading records

pdsx -r user.handle ls collection

resolve handle → did
resolve did → pds url
fetch records from pds
return to user

writing records

pdsx create collection field=value

authenticate with pds
construct record
submit to pds
pds updates repository mst
pds creates signed commit

listing collections

pdsx ls app.bsky.feed.post

traverses mst to enumerate all records in that collection path

federation model

atproto is federated:

many pds instances
each hosts many repositories
users choose their pds
repositories are portable

pdsx is pds-agnostic - works with any pds implementing the atproto spec:

bsky.social (bluesky’s pds)
self-hosted pds
third-party providers

why understanding repositories matters

knowing the repository model helps you understand: why handles vs dids: handles change, dids don’t - use dids for durability why collections: records are grouped by type for efficient access why authentication works this way: reading public data doesn’t need auth, writing does why records are portable: repositories are user-owned, pds is just hosting why pdsx has the flags it does: -r specifies which repository, --pds specifies which server

from concept to command

the repository model maps directly to pdsx commands: reading from any repository (no auth needed):

# using handle
pdsx -r zzstoatzz.io ls app.bsky.feed.post

# using DID (more durable)
pdsx -r did:plc:44ybard66vv44zksje25o7dz ls app.bsky.feed.post

pdsx resolves to the PDS hosting that repository and fetches public records writing to your repository (auth required):

export ATPROTO_HANDLE=your.handle
export ATPROTO_PASSWORD=your-app-password
pdsx create app.bsky.feed.post text='hello world'

pdsx authenticates with your PDS, creates a session with your DID, and writes to your repository why the -r flag exists: it specifies which repository to read from. without it, pdsx uses your authenticated repository (from client.me.did)

Getting Started

Guides

Concepts

Repositories

what is a repository?

where repositories live

repository structure

collections

record keys (rkeys)

merkle search tree (MST)

repository commits

DID as repository identifier

authentication model

unauthenticated reads

authenticated writes

cross-repository references

repository portability

how pdsx uses repositories

reading records

writing records

listing collections

federation model

why understanding repositories matters

from concept to command

further reading

Getting Started

Guides

Concepts

​what is a repository?

​where repositories live

​repository structure

​collections

​record keys (rkeys)

​merkle search tree (MST)

​repository commits

​DID as repository identifier

​authentication model

​unauthenticated reads

​authenticated writes

​cross-repository references

​repository portability

​how pdsx uses repositories

​reading records

​writing records

​listing collections

​federation model

​why understanding repositories matters

​from concept to command

​further reading

what is a repository?

where repositories live

repository structure

collections

record keys (rkeys)

merkle search tree (MST)

repository commits

DID as repository identifier

authentication model

unauthenticated reads

authenticated writes

cross-repository references

repository portability

how pdsx uses repositories

reading records

writing records

listing collections

federation model

why understanding repositories matters

from concept to command

further reading