Data DAOs: The Path Towards a User-Owned Internet

A massive economic shift is underway as AI becomes more and more capable of doing valuable economic work. Large tech companies have already trained AI models based on your public work, writing, artwork, photos, and other data, combined with everyone else's, and started earning billions of dollars a year (1). They’re now going after your data that isn’t available on the public internet, buying up your private data from companies like Reddit so they can grow revenue from AI to trillions of dollars a year (2, 3). 

Shouldn't you own a piece of the AI models that your data helps create?

This is where data DAOs come in. A data DAO is a decentralized entity that allows users to pool and govern their data, rewarding contributors with a dataset-specific token that represents ownership of the particular dataset. It’s a bit like a labor union for data. These datasets can replicate or even surpass those that big tech companies sell for hundreds of millions of dollars (4). The DAO has full control over the dataset and can choose to rent it out or sell anonymized copies. Reddit data, for example, could even be used to seed new, user-owned platforms, complete with friends, your past posts, and other data, ready-to-go on the new platform. 

If you’re interested in technical details: A data DAO has two main components: 1) onchain governance, with tokens earned for data contributions, and 2) a secure server, with a public-private key pair for encryption, where the community-owned dataset resides. To contribute, you first validate your data to prove ownership and estimate its value. Then, you encrypt your data in-browser with the server's public key and store the encrypted data in the cloud. The data is only decrypted if the DAO approves a proposal granting access. For example, it could allow an AI company to rent the data to train a model. You can read more about the architecture of the Vana network, which is designed to enable collective ownership of datasets and models, here

Data DAOs don’t just benefit users—they also advance AI progress, making it possible to build AI like open source software in a way that benefits everyone who contributes. Open source AI is struggling to find a viable business model: it is expensive to pay for GPUs, data, and researchers. And once the model is trained, if it is open source, there is no way to recoup these costs. The technical architecture of data DAOs can be applied to model DAOs, where users and developers contribute data, compute, and research in exchange for ownership of the model.

The default option for society today is to allow big tech to take our data and use it to train AI models that do our jobs. They earn from these AI models as we are replaced by the models trained on our data. It’s a very bad deal for society, and a very good deal for big tech. The only way to prevent this is through collective action. Data is currency, and collective data is power. I encourage you to participate: The world’s first data DAO, focused on Reddit data, went live today on the Vana network. By breaking down data moats controlled by a privileged few, data DAOs offer a path towards a truly user-owned internet.