Skip to content
Bain Capital Crypto
  • Team
  • Portfolio
  • Insights
  • Jobs
  • Contact
  • Basics
  • Cryptography
  • DeFi
  • Hiring
  • MEV
  • Press Release
  • Privacy
  • Regulatory
  • Research
  • Akshay Agrawal
  • Alex Evans
  • Andrew Cleverdon
  • Andrija Novakovic
  • Charlotte Kyle
  • Ciamac Moallemi
  • Eugene Chen
  • grjte
  • Guillermo Angeris
  • Haley MacCormac
  • Justin Rattigan
  • Kaley Petulla
  • Kamil Yusubov
  • Kara Reusch
  • Kevin Zhang
  • Khurram Dara
  • Kimleang Svay
  • Kobi Gurkan
  • Koh Wei Jie
  • Kshitij Kulkarni
  • Mallesh Pai
  • Mandy Campbell
  • Max Resnick
  • Michael Chagnon
  • Myles Maloof
  • Nashqueue
  • Natalie Mullins
  • Nathan Sheng
  • Nicolas Mohnblatt
  • Parth Chopra
  • Sanaz Taheri
  • Stefan Cohen
  • Stephen Boyd
  • Tarun Chitra
Ligerito: A Small and Concretely Fast Polynomial Commitment Scheme
In this note we present Ligerito, a small and practically fast polynomial commitment and inner product scheme. For the case…
  • Andrija Novakovic,
  • Guillermo Angeris
  • Privacy,
  • Research
05.01.25
Exploring interoperability & composability for local-first software
Check out the project at https://github.com/grjte/groundmist-syncCross-posted to my atproto PDS and viewable at Groundmist Notebook and WhiteWind. This is the third exploration connecting local-first…
  • grjte
  • Privacy,
  • Research
04.22.25
Exploring the AT Protocol as a legibility layer for local-first software
Check out the project at https://notebook.groundmist.xyzCross-posted to my atproto PDS and viewable at Groundmist Notebook and WhiteWind Some people like vim and some…
  • grjte
  • Privacy,
  • Research
04.22.25
Exploring the AT Protocol as a distribution layer for local-first software
Check out the project at https://library.groundmist.xyzCross-posted to my atproto PDS and viewable at Groundmist Notebook and WhiteWind What if private…
  • grjte
  • Privacy,
  • Research
04.22.25
CryptoUtilities.jl: A Small Julia Library for Succinct Proofs
We’re excited to open-source CryptoUtilities.jl, a collection of Julia packages built to prototype and benchmark succinct proof systems over binary…
  • Andrija Novakovic,
  • Guillermo Angeris
  • Cryptography,
  • Research
04.16.25
Public, Provable Prices
What happens when exchanges operate with a discrete clock? In our last post, we argued that blockchains have a system…
  • Theo Diamandis,
  • Khurram Dara
  • Regulatory,
  • Research
02.26.25
Perpetual Demand Lending Pools
Decentralized perpetuals protocols have collectively reached billions of dollars of daily trading volume, yet are still not serious competitors on…
  • Tarun Chitra,
  • Theo Diamandis,
  • Kamil Yusubov,
  • Nathan Sheng
  • DeFi,
  • Research
02.12.25
The Accidental Computer: Polynomial Commitments from Data Availability
In this paper, we present two simple variations of a data availability scheme that allow it to function as a…
  • Alex Evans,
  • Guillermo Angeris
  • Research
01.31.25
Manipulability Is a Bug, Not a Feature
Like a computer, a blockchain has an internal clock: the blocktime [1]. Users send the blockchain transactions which contain sequences…
  • Theo Diamandis,
  • Khurram Dara
  • Regulatory,
  • Research
01.30.25
Optimizing Montgomery Multiplication in WebAssembly
Operations on elliptic curves over large prime fields can be significantly sped up via optimisations to their underlying field multiplication…
  • Koh Wei Jie
  • Research
12.05.24
Chosen-Instance Attack
How succinct proofs leak information What happens when a succinct proof does not have the zero-knowledge property? There is a…
  • Nicolas Mohnblatt
  • Privacy,
  • Research
12.04.24
ZODA: Zero-Overhead Data Availability
We introduce ZODA, short for ‘zero-overhead data availability,’ which is a protocol for proving that symbols received from an encoding…
  • Alex Evans,
  • Nicolas Mohnblatt,
  • Guillermo Angeris
  • Research
12.03.24
ZODA: An Explainer
Data availability sampling (DAS) is critical to scaling blockchains while maintaining decentralization [ASB18,HASW23]. In our previous post, we informally introduced…
  • Alex Evans,
  • Nicolas Mohnblatt,
  • Guillermo Angeris,
  • Sanaz Taheri,
  • Nashqueue
  • Research
12.03.24
Sampling for Proximity and Availability
In blockchains, nodes can ensure that the chain is valid without trusting anyone, not even the validators or miners. Early…
  • Alex Evans,
  • Nicolas Mohnblatt,
  • Guillermo Angeris
  • Research
11.08.24
Expanding
At Bain Capital Crypto, research and investing are interlocked. Researching foundational problems has led to our most successful investments. Working…
  • Alex Evans
  • Hiring
10.29.24
An Analysis of Intent-Based Markets
Mechanisms for decentralized finance on blockchains suffer from various problems, including suboptimal price execution for users, latency, and a worse…
  • Tarun Chitra,
  • Theo Diamandis,
  • Kshitij Kulkarni,
  • Mallesh Pai
  • DeFi,
  • Research
03.06.24
Multidimensional Blockchain Fees are (Essentially) Optimal
Abstract In this paper we show that, using only mild assumptions, previously proposed multidimensional blockchain fee markets are essentially optimal,…
  • Guillermo Angeris,
  • Theo Diamandis,
  • Ciamac Moallemi
  • Research
02.13.24
Toward Multidimensional Solana Fees
A Solana transaction’s journey from user submission to block inclusion can be arduous. Even once the transaction reaches the current leader, it…
  • Theo Diamandis,
  • Tarun Chitra,
  • Eugene Chen
  • Research
01.31.24
Succinct Proofs and Linear Algebra
Abstract The intuitions behind succinct proof systems are often difficult to separate from some of the deep cryptographic techniques that…
  • Alex Evans,
  • Guillermo Angeris
  • Research
09.21.23
The Specter (and Spectra) of MEV
Abstract Miner extractable value (MEV) refers to any excess value that a transaction validator can realize by manipulating the ordering…
  • Guillermo Angeris,
  • Tarun Chitra,
  • Theo Diamandis,
  • Kshitij Kulkarni
  • Research
08.14.23
The Geometry of Constant Function Market Makers
Abstract Constant function market makers (CFMMs) are the most popular type of decentralized trading venue for cryptocurrency tokens. In this paper,…
  • Guillermo Angeris,
  • Tarun Chitra,
  • Theo Diamandis,
  • Alex Evans,
  • Kshitij Kulkarni
  • Research
07.20.23
Our Comment on The SEC’s Proposed Amendments to Exchange Act Rule 3b-16
This week, we submitted a comment in response to the SEC’s proposed amendments to Exchange Act Rule 3b-16 regarding the…
  • Regulatory
06.15.23
Opinion: A House Bill Would Make It Harder for the SEC to Argue Crypto Tokens Are Securities
The proposed Securities Clarity Act by Representatives Tom Emmer and Darren Soto would significantly reduce uncertainty for both crypto investors…
  • Khurram Dara
  • Regulatory
06.01.23
Opinion: Regulators Should Not ‘Front-Run’ Congress on Stablecoins
Growing consensus on the need for comprehensive legislation on payment stablecoins provides Congress with an opportunity to enact sensible regulation…
  • Khurram Dara
  • Regulatory
05.17.23
Our Comment on The SEC’s Proposed Custody Rule
This week, we submitted a comment in response to the SEC’s proposed custody rule, together with Dragonfly Capital, Electric Capital,…
  • Regulatory
05.09.23
A Note on the Welfare Gap in Fair Ordering
In this short note, we show a gap between the welfare of a traditionally ‘fair’ ordering, namely first-in-first-out (an ideal…
  • Theo Diamandis,
  • Guillermo Angeris
  • Research
03.27.23
An Efficient Algorithm for Optimal Routing Through Constant Function Market Makers
Constant function market makers (CFMMs) such as Uniswap have facilitated trillions of dollars of digital asset trades and have billions…
  • Theo Diamandis,
  • Max Resnick,
  • Tarun Chitra,
  • Guillermo Angeris
  • DeFi,
  • Research
02.17.23
Multi-dimensional On-chain Resource Pricing
Public blockchains allow any user to submit transactions which modify the shared state of the network. These transactions are independently…
  • Theo Diamandis
  • Basics
08.16.22
Dynamic Pricing for Non-fungible Resources
Public blockchains implement a fee mechanism to allocate scarce computational resources across competing transactions. Most existing fee market designs utilize a joint, fungible unit of account (e.g., gas in Ethereum) to price otherwise non-fungible resources such as bandwidth, computation, and storage, by hardcoding their relative prices. Fixing the relative price of each resource in this way inhibits granular price discovery, limiting scalability and opening up the possibility of denial-of-service attacks.
  • Theo Diamandis,
  • Alex Evans,
  • Tarun Chitra,
  • Guillermo Angeris
  • Basics
08.16.22
Introducing CFMMRouter.jl
We created CFMMRouter.jl for convex optimization enthusiasts, twitter anons, and Tarun Chitra to easily find the optimal way to execute…
  • Guillermo Angeris,
  • Theo Diamandis
  • DeFi,
  • MEV
04.05.22
Introducing Bain Capital Crypto
We are excited to announce Bain Capital Crypto (BCC), our first $560mm fund, and the launch of a new platform…
  • Stefan Cohen
  • Press Release
03.08.22
Optimal Routing for Constant Function Market Makers
We consider the problem of optimally executing an order involving multiple cryptoassets, sometimes called tokens, on a network of multiple constant function market makers (CFMMs). When we ignore the fixed cost associated with executing an order on a CFMM, this optimal routing problem can be cast as a convex optimization problem, which is computationally tractable. When we include the fixed costs, the optimal routing problem is a mixed-integer convex problem, which can be solved using (sometimes slow) global optimization methods, or approximately solved using various heuristics based on convex optimization. The optimal routing problem includes as a special case the problem of identifying an arbitrage present in a network of CFMMs, or certifying that none exists.
  • Guillermo Angeris,
  • Tarun Chitra,
  • Alex Evans,
  • Stephen Boyd
  • MEV
12.01.21
Replicating Monotonic Payoffs Without Oracles
In this paper, we show that any monotonic payoff can be replicated using only liquidity provider shares in constant function market makers (CFMMs), without the need for additional collateral or oracles. Such payoffs include cash-or-nothing calls and capped calls, among many others, and we give an explicit method for finding a trading function matching these payoffs. For example, this method provides an easy way to show that the trading function for maintaining a portfolio where 50% of the portfolio is allocated in one asset and 50% in the other is exactly the constant product market maker (e.g., Uniswap) from first principles. We additionally provide a simple formula for the total earnings of an arbitrageur who is arbitraging against these CFMMs.
  • Guillermo Angeris,
  • Alex Evans,
  • Tarun Chitra
  • DeFi
09.01.21
Constant Function Market Makers: Multi-Asset Trades via Convex Optimization
The rise of Ethereum and other blockchains that support smart contracts has led to the creation of decentralized exchanges (DEXs), such as Uniswap, Balancer, Curve, mStable, and SushiSwap, which enable agents to trade cryptocurrencies without trusting a centralized authority. While traditional exchanges use order books to match and execute trades, DEXs are typically organized as constant function market makers (CFMMs). CFMMs accept and reject proposed trades based on the evaluation of a function that depends on the proposed trade and the current reserves of the DEX. For trades that involve only two assets, CFMMs are easy to understand, via two functions that give the quantity of one asset that must be tendered to receive a given quantity of the other, and vice versa. When more than two assets are being exchanged, it is harder to understand the landscape of possible trades. We observe that various problems of choosing a multi-asset trade can be formulated as convex optimization problems, and can therefore be reliably and efficiently solved.
  • Guillermo Angeris,
  • Akshay Agrawal,
  • Alex Evans,
  • Tarun Chitra,
  • Stephen Boyd
  • Basics,
  • DeFi
07.01.21
Replicating Market Makers
We present a method for constructing Constant Function Market Makers (CFMMs) whose portfolio value functions match a desired payoff. More specifically, we show that the space of concave, nonnegative, nondecreasing, 1-homogeneous payoff functions and the space of convex CFMMs are equivalent; in other words, every CFMM has a concave, nonnegative, nondecreasing, 1-homogeneous payoff function, and every payoff function with these properties has a corresponding convex CFMM. We demonstrate a simple method for recovering a CFMM trading function that produces this desired payoff. This method uses only basic tools from convex analysis and is intimately related to Fenchel conjugacy. We demonstrate our result by constructing trading functions corresponding to basic payoffs, as well as standard financial derivatives such as options and swaps.
  • Guillermo Angeris,
  • Alex Evans,
  • Tarun Chitra
  • DeFi
03.01.21
A Note on Privacy in Constant Function Market Makers
Constant function market makers (CFMMs) such as Uniswap, Balancer, Curve, and mStable, among many others, make up some of the largest decentralized exchanges on Ethereum and other blockchains. Because all transactions are public in current implementations, a natural next question is if there exist similar decentralized exchanges which are privacy-preserving; i.e., if a transaction’s quantities are hidden from the public view, then an adversary cannot correctly reconstruct the traded quantities from other public information. In this note, we show that privacy is impossible with the usual implementations of CFMMs under most reasonable models of an adversary and provide some mitigating strategies.
  • Guillermo Angeris,
  • Alex Evans,
  • Tarun Chitra
  • Privacy
02.01.21
Optimal Fees for Geometric Mean Market Makers
Constant Function Market Makers (CFMMs) are a family of automated market makers that enable censorship-resistant decentralized exchange on public blockchains. Arbitrage trades have been shown to align the prices reported by CFMMs with those of external markets. These trades impose costs on Liquidity Providers (LPs) who supply reserves to CFMMs. Trading fees have been proposed as a mechanism for compensating LPs for arbitrage losses. However, large fees reduce the accuracy of the prices reported by CFMMs and can cause reserves to deviate from desirable asset compositions. CFMM designers are therefore faced with the problem of how to optimally select fees to attract liquidity. We develop a framework for determining the value to LPs of supplying liquidity to a CFMM with fees when the underlying process follows a general diffusion. Focusing on a popular class of CFMMs which we call Geometric Mean Market Makers (G3Ms), our approach also allows one to select optimal fees for maximizing LP value. We illustrate our methodology by showing that an LP with mean-variance utility will prefer a G3M over all alternative trading strategies as fees approach zero.
  • Guillermo Angeris,
  • Tarun Chitra,
  • Alex Evans,
  • Stephen Boyd
  • DeFi
01.04.21
Liquidity Provider Returns in Geometric Mean Markets
Geometric mean market makers (G3Ms), such as Uniswap and Balancer, comprise a popular class of automated market makers (AMMs) defined by the following rule: the reserves of the AMM before and after each trade must have the same (weighted) geometric mean. This paper extends several results known for constant-weight G3Ms to the general case of G3Ms with time-varying and potentially stochastic weights. These results include the returns and no-arbitrage prices of liquidity pool (LP) shares that investors receive for supplying liquidity to G3Ms. Using these expressions, we show how to create G3Ms whose LP shares replicate the payoffs of financial derivatives. The resulting hedges are model-independent and exact for derivative contracts whose payoff functions satisfy an elasticity constraint. These strategies allow LP shares to replicate various trading strategies and financial contracts, including standard options. G3Ms are thus shown to be capable of recreating a variety of active trading strategies through passive positions in LP shares.
  • Alex Evans
  • DeFi
06.01.20
Back

Exploring the AT Protocol as a legibility layer for local-first software

  • grjte
  • Privacy,
  • Research
04.22.25
  • Share on Twitter
  • Copy Link

Check out the project at https://notebook.groundmist.xyz
Cross-posted to my atproto PDS and viewable at Groundmist Notebook and WhiteWind

Some people like vim and some people like emacs and some people like something else, and why not have our documents able to be edited in all of those programs?

Seph Gentle on the localfirst.fm podcast

This is the second in a series of explorations around the possibilities we can unlock by connecting the local-first software paradigm to the AT Protocol (atproto). My motivation behind this exploration is captured well by a recent quote from Seph Gentle on the localfirst.fm podcast. As he points out, we’re accustomed to editing code across many editors–vim, emacs, and others–but in other disciplines our documents and data remain tied to single applications. The dream is to expand this editor flexibility to all of our data.

One of the most interesting features of the decentralized AT Protocol (atproto) is the flexibility it enables around how to interact with user data. The protocol has been designed to support and promote interoperation, enabling anyone to build a wide variety of different views over any of the public data repositories hosted in atproto Personal Data Servers (PDSes). These interfaces are known as “AppViews” because they provide just one of many possible views of the canonical data from the repo.

An AppView is an application in the Atmosphere. It’s called an “AppView” because it’s just one view of the network. The canonical data lives in data repos which is hosted by PDSes, and that data can be viewed many different ways.

https://atproto.com/guides/glossary#app-view

For example, different views of content I’ve recently curated and published to my data repo can be seen at my public Groundmist Library page or in the “inputs” window on my personal website. In addition to enabling different ways of interacting with data that originates in a single application, the atproto design enables AppViews to compose data from multiple application sources.

The Groundmist Library AppView is on the left; on the right, my personal website provides a different AppView of the same public data from my PDS.

The AT Protocol design offers a powerful and flexible model for interacting with user-owned data. However, it was built for large-scale social applications, and all of the data on the protocol is public. For interacting with my own personal private data, I’ve been inspired by the local-first software movement, which also prioritises user ownership and control over data. The local-first software ideals include additional benefits like offline access, spinner-free interactions, and seamless collaboration. Unfortunately, the local-first software model doesn’t currently support data reuse with the ease and flexibility of the AT Protocol.

“Local-first software” [is] a set of principles for software that enables both collaboration and ownership for users. Local-first ideals include the ability to work offline and collaborate across multiple devices, while also improving the security, privacy, long-term preservation, and user control of data.

https://inkandswitch.com/local-first

For my next experiment combining local-first software with atproto, I wanted to explore the possibility of enabling local-first AppViews for local-first data, similar to atproto’s AppViews for public data, but without taking the data or associated software out of the local-first context.

To build a local-first AppView, I applied atproto’s Lexicon schema system to local-first software, since Lexicon is one of the main mechanisms for atproto’s interoperability. It provides a global network for agreeing on data semantics and structure that is separate from the data itself, which means it can be used even when the data isn’t available on the AT Protocol.

The project includes a local-first markdown editor that uses a public lexicon to define the document schema. The editor includes the ability to publish documents to your atproto PDS. I accompanied it with a public AppView for reading the published documents. These local-first markdown files could alternatively be edited in any other AppView that uses the same lexicon.1

Lexicons enable interoperability

What is Lexicon?

Lexicon is atproto’s schema system. It specifies a schema definition language alongside usage and publishing guidelines that enable a global schema network.

A global schemas network called Lexicon is used to unify the names and behaviors of the calls across the servers. […] While the Web exchanges documents, the AT Protocol exchanges schematic and semantic information, enabling the software from different organizations to understand each others’ data.

https://atproto.com/guides/overview#interoperation

The schema definition language provides a way to agree on semantics and behaviours and makes it simple for developers to introduce new schemas or safely reuse existing ones. Lexicons are JSON files associated with a single namespace identifier (NSID), such as app.bsky.feed.post. These NSIDs are used for organizing data within the PDS or retrieving data from collections, which are identified by the NSID.

The basic structure and semantics of an NSID are a fully-qualified hostname in Reverse Domain-Name Order, followed by a simple name. The hostname part is the domain authority, and the final segment is the name.

https://atproto.com/specs/nsid

Below is an example of a schema for record lexicon type from the atproto docs. Other types include query, procedure, and subscription.

{
  "lexicon": 1,
  "id": "com.example.follow",
  "defs": {
    "main": {
      "type": "record",
      "description": "A social follow",
      "record": {
        "type": "object",
        "required": ["subject", "createdAt"],
        "properties": {
          "subject": { "type": "string" },
          "createdAt": {"type": "string", "format": "datetime"}
        }
      }
    }
  }
}

The Lexicon system also includes structured guidelines for how Lexicons should be published and resolved. This creates a global network of accessible and discoverable schematic information.

For example, the NSID of the primary lexicon for the Groundmist Library application is xyz.groundmist.library.content. The Lexicon record is published in Groundmist’s PDS at pds.groundmist.xyz and a DNS TXT record for _lexicon.library.groundmist.xyz specifies Groundmist’s DID so that the lexicon can be located.

Three benefits of Lexicon

The Lexicon system powers the interoperability of the AT Protocol. Lexicon creates a shared legibility layer between all servers on the network, which yields 3 major benefits:

  1. 1. Interfaces can be produced independently of servers, enabling enormous flexibility for how we interact with our data.
  2. 2. Rendering code (HTML/JS/CSS) doesn’t need to be exchanged. AppViews only need to know the data and the Lexicon schema definition (for understanding the data & interacting with it correctly).
  3. 3. The application doesn’t need a separate data store for simple AppViews displaying data from individual users. Creating and publishing these interfaces can be extremely fast and lightweight, since often no database or backend is needed.

This design enables high flexibility for interacting with user content. A multitude of different applications can be created for displaying or editing data from a user’s PDS, and the consistency of the underlying data is coordinated by adherence to the public Lexicons.

Does AI need schemas?

Before discussing the application of this model for interoperability to the local-first software context, it’s worth discussing the question of whether structured data still has value in a world of LLMs and sophisticated AI agents. Why not just let the models figure out what the data means and how to interact with it?

AI still needs structured data

Firstly, improved legibility improves the effectiveness of AI agents. Clear and consistent rules for understanding data semantics and structure make it easier for agents to interpret and work with the data in the ways that you expect and want. We get better results with less compute. In fact, Comind has begun exploring some of the benefits of using lexicons with AI.

Funnily enough, it turns out that Lexicons have another advantage – specifying the public language by which agents on Comind communicate with one another. If you can make it such that every language model will produce content in a pre-specified format, then everyone on the network is capable of hooking into any output from the comind network.

https://cameron.pfiffer.org/blog/lexicons-and-ai/

It’s still true that if all you ever want to do with your agents is let them read your data and create something new with it or provide insights, then letting the models “figure it out” is basically good enough. You might get better or faster results by having clearer semantics and structure, but you can still get where you want to go without it.

However, once we start imagining interactions that also modify our underlying data, reusing and changing it in different contexts, it becomes clear that we need some way for our agents and applications to coordinate.

Avoiding data fragmentation

Traditionally, reusing data often required exporting it from an application then importing it to a new application which has some custom importer code that processes the data so the new application can understand it. This results in a separate-but-similar data set for each application that interacts with the base data set. These separate near-duplicates naturally get out of sync and diverge from the base data set over time.

AT Protocol’s design enables data reuse without this sort of duplication, fragmentation, or divergence, making it possible to interact with our data in whatever way we find most effective or interesting in any given moment. This is because the canonical data always lives in a single place (the data repo in the PDS), and there are clear semantic and schematic rules for the names and behaviours of data. Every record has a single source of truth and is accompanied by the rules for how to interact with it, since the NSID of its lexicon is the name of its collection.

If AI agents naively read and modify data sets, writing the results to a new location associated with the current task or application, then we see a resurgence of the same problem that atproto solved. Data is duplicated unnecessarily and gets out of sync (unless some sort of syncing system is introduced, which also requires coordination).

On the other hand, if AI agents read and modify data sets, writing back to a canonical location, then we need guarantees that when the data is modified the result still conforms to our expectations and the data will still be usable within its original context and any contexts that use the same data. A schema system enables us to use validation and have guarantees that when data is modified the result is still valid and legible, regardless of how many agents have interacted with the data or what they use it for.

Humans still need legibility

The final simple reason that using a system for legibility and coordination is still beneficial in an AI-dominated world is that at some point we may as humans want to read some of our data directly ourselves. As Andy Matuschak notes in a recent post about generating malleable software with LLMs:

I think that our understanding bounds the complexity of systems we can create—even if an LLM is doing the programming. If we can’t really understand a system’s behavior, except through trial and error every time a change is made, problems and confusions will pile up. It will become more and more difficult to change the system predictably.

Andy Matuschak

Using a consistent system means that no matter what interactions occur with AI agents, we can still understand, access, and analyse the output.

Building a local-first “AppView”

Although Bluesky is the most well-known social application on the AT Protocol, there is enormous scope for variety, recreating familiar settings or creating entirely new ones, all while leveraging the underlying network connections and the shared interoperability layer.

Introducing WhiteWind

One interesting project that has explored this area is WhiteWind, which is a blogging platform for markdown content that is built on top of the AT Protocol. WhiteWind brings the familiar blogging experience to atproto but additionally leverages the protocol’s interoperability to add an interesting twist – comments on blog posts are also threads in Bluesky.

A blog entry using the com.whtwnd.blog.entry lexicon, as displayed on WhiteWind.
Comments on the WhiteWind blog are also threads on Bluesky.

Privacy challenges

WhiteWind demonstrates many of atproto’s benefits, but it is also a case study in one of the challenges of building on the AT Protocol – all your WhiteWind entries are publicly readable, even if you haven’t published them yet, since all data on atproto is public.

WhiteWind tries to work around this by adding a “visibility” field to the lexicon specifying whether data should be either public, viewable if you know the URL, or only for the logged-in author. The hope is that all AppViews will respect this field and that no person (or AI) will query and look at data where “visibility” isn’t “public”. Here is the WhiteWind blog entry lexicon, showing this method of handling private/public data visibility:

{
    "lexicon": 1,
    "id": "com.whtwnd.blog.entry",
    "defs": {
        "main": {
            "type": "record",
            "description": "A declaration of a post.",
            "key": "tid",
            "record": {
                "type": "object",
                "required": [
                    "content"
                ],
                "properties": {
                    "content": {
                        "type": "string",
                        "maxLength": 100000
                    },
                    "createdAt": {
                        "type": "string",
                        "format": "datetime"
                    },
                    "title": {
                        "type": "string",
                        "maxLength": 1000
                    },
                    [...fields excluded for brevity...]
                    "visibility": {
                        "type": "string",
                        "enum": [
                            "public",
                            "url",
                            "author"
                        ],
                        "default": "public",
                        "description": "Tells the visibility of the article to AppView."
                    }
                }
            }
        }
    }
}

Unfortunately, asking people to follow gentlemanly guidelines and “just not look” is the kind of request that good actors comply with and bad actors and agents ignore. We can’t realistically expect compliance. However, a local-first AppView for editing could offer us a way to avoid the problem altogether!

Introducing Groundmist Editor & Notebook

To explore this idea, I created Groundmist Editor2 a simple local-first writing tool for drafting and collaborating on markdown documents and then publishing them when ready. It uses the WhiteWind blog entry lexicon com.whtwnd.blog.entry, making it simple to create alternate interfaces for interacting with the same local data or to create additional interfaces composing this data with other personal data, such as analysing the number of updates I made in a week against the amount of sleep I got.

Groundmist Editor is a local-first editor for publishing content to atproto using the WhiteWind blog entry lexicon.

To make publishing single documents effortless, I used atproto as a distribution layer by publishing documents to the user’s atproto PDS, as I outlined in my previous post.

Because I used the existing WhiteWind lexicon, documents are published to the com.whtwnd.blog.entry collection on atproto, which means that WhiteWind acts as a public AppView for published entries. This also means that Groundmist Editor functions as a private editor for drafting WhiteWind blog posts, instead of them being publicly retrievable from the user’s PDS.

Because I wanted a simple and minimally-distracting interface for reading, I also created an alternative AppView for viewing published writing called Groundmist Notebook, which provides a basic view of the content without WhiteWind’s interactivity features. All WhiteWind blog entries and all entries published from the Groundmist Editor can be read there in addition to on WhiteWind.

Groundmist Notebook provides a simple alternative reader for content with the WhiteWind lexicon, including entries published from Groundmist Editor.

The interoperable system

The full setup for this exploration consists of one public lexicon (com.whtwnd.blog.entry, created by the WhiteWind team) and three public AppViews:

  1. 1. Groundmist Editor, the local-first AppView for drafting and collaborative editing (forked from the Ink & Switch team’s “tiny essay editor” then minimally modified)
  2. 2. Groundmist Notebook , a public AppView for reading published content on AT Protocol
  3. 3. WhiteWind, the original AppView for reading published content with this lexicon and interacting with it on Bluesky, created by the WhiteWind team

The diagram above shows what the system looks like with all three AppViews and the lexicon. Groundmist Editor uses a sync server (not shown) so that content can be shared between different devices by sharing the document link. Documents can be published to a user’s data repo in their PDS with one click, as authorised by their atproto identity. From the PDS, the public data is displayed in both Groundmist Notebook and WhiteWind. All three of these AppViews share a single lexicon.

Vision: an interoperable local-first ecosystem

An updated system diagram which includes Groundmist Library in addition to Groundmist Editor and imagines an additional local-first AppView associated with each app’s lexicons. This begins to hint at the potential for interoperability and composibility enabled by using the Lexicon system for local-fist software

A first model for using local-first AppViews

The Groundmist Editor/Notebook project demonstrates one model for using local-first AppViews: local-first software applications can provide AppViews of private data and be connected via a lexicon to public AppViews that display data that was published to the AT Protocol.

Local-first software applications can provide AppViews of private data and be connected via a lexicon to public AppViews that display data that was published to the AT Protocol.

This enables a private software layer for the AT Protocol that in the long run will offer better UX than simply adding private data to the PDS, since it offers benefits of local-first software such as low latency and offline capability.

A second model for using local-first AppViews

There’s a second interesting model for using local-first AppViews, which is to have a local-first ecosystem where local-first applications are connected by shared lexicons and data sets but operate independently. In this world, local-first software applications can provide multiple AppViews for private data that never gets published. This enables the same interface flexibility for interacting with local-first data that we experience when interacting with public data on atproto. It offers a way to achieve the dream of collaboration over a data set without being required to use the same application or interaction mode.

Local-first software applications can provide multiple AppViews for private data that never gets published. By using shared lexicons and data sets, independently operating local-first applications can form a local-first ecosystem.

As a motivating example, one thing I particularly want is more flexibility when interacting with all of my linked notes and writing, which are currently spread across Obsidian and Logseq (and more). I like Logseq’s structure (1 bullet point per line) for thinking, tracking my days, and connecting ideas, but I often want to write something longer form, which is better suited to Obsidian or a similar markdown editor. With the ease of creating LLM-generated software, I would ideally like to create custom personal interfaces for interacting with different types of writing and notes, while still keeping it in a unified home and having a single application that knows how to interact with all of it at once.

The Lexicon schema network offers a solution, since it could be used to provide a global layer that shares semantics and behaviour for local-first data and software. This would simplify creating new AppViews and ensure data consistency without requiring the data itself to be public or available. For general use, these interfaces will be local-first applications that can function offline, because Lexicon definitions are mainly needed when new AppViews are being created or at discrete update points.

This provides a public legibility layer over private data and enables the interoperability of the AT Protocol within the local-first software context.

A note on schema lenses

The idea of strictly using Lexicon for defining local-first data will inevitably be too restrictive in some cases. As applications develop and schemas change there are challenges around interacting with older and newer versions of data.

Additionally, a lexicon for public data may not match the one desired for private data. Even in the simple Groundmist Library application from my first Groundmist post these types differed slightly, as I couldn’t post the automerge urls of documents publicly to a user’s PDS without making those documents editable to anyone (since it uses a version of Automerge without an authorization layer).

Ink & Switch has done interesting work in this area with the Cambria project and their research into data lenses. In a case like this slight mismatch between public and private data, an ideal solution might be a publicly defined “data lens” that describes how to consistently transform a public lexicon to a private schema (e.g. add a field for the Automerge url).

The global schema network provided by Lexicon is extremely powerful, and exploring similar ways to manage and share data lenses would be an interesting area of future work. This idea of “Lexicon Lenses” is a project that the community has recently begun to work on.

Conclusion

Groundmist Editor explores the idea of flexible local-first AppViews over canonical data sets and the possibilities for interoperability, composability, and doing more with our data that we can unlock by leveraging atproto’s global Lexicon network for local-first software. It’s an example of how we can improve existing atproto applications and hints at how we could expand the world of local-first software into an interoperable ecosystem.

As a final note, you may have noticed that in order to realise these ideas about local-first AppViews and local-first data composability the legibility that I discussed in this post is required but not sufficient. We also need some way for local-first applications to locate and access these data sets. Groundmist is a series of progressive experiments combining the local-first software paradigm with the AT Protocol, and the next exploration will focus on this challenge.

If you’re interested in these ideas or the possibilities of combining local-first software with the AT Protocol, please reach out to me on Bluesky.

We welcome your feedback: @grjte.sh or grjte@baincapital.com.

Thanks to Goblin Oats and Blaine Cook for helpful feedback on this work.

  1. 1. Note: I used the Automerge CRDT library for local-first data management. In addition to conforming to the lexicon, alternative local-first AppViews would need to be able to interoperate with these data structures, e.g. by also using Automerge. ↩︎
  2. 2. Forked and modified from https://github.com/inkandswitch/tiny-essay-editor, an existing a local-first markdown editor. ↩︎
Share
  • Share on Twitter
  • Copy Link
Contents
  • Lexicons enable interoperability
  • Does AI need schemas?
  • Building a local-first "AppView"
  • Vision: an interoperable local-first ecosystem
  • A note on schema lenses
  • Conclusion
BainCapital
  • Twitter
  • LinkedIn
  • Terms of Use
  • Disclosures