Decentralized Storage
April 14, 2021
Head in the Clouds
Information storage is a challenge humans have contended with as far back as we can remember (we can’t, of course, remember a time before we had information storage). Our preferred storage medium has evolved over time. From our fallible brains, to clay tablets, to pen and paper, to the modern computer, we’ve adopted technologies that enable us to store more information, more easily.
With the advent of the Internet in the 90s, we’ve quickly moved our information storage needs to the “cloud”— the “cloud” being warehouses of computers, maintained by private companies, which we each communicate with via the internet. Lumping huge amounts of computer storage together creates economies of scale that provide an unprecedented magnitude of information storage. We’ve adopted the cloud because it allows for individuals to store an amount of information that is physically and economically infeasible to maintain on their own personal devices.
Securing the World’s Knowledge
The biggest issue with cloud storage is that the livelihood of our information is tied to the livelihood of the private companies maintaining it (though one might argue that compromises in censorship and privacy are just as alarming). This isn’t, in fact, a new problem as the fate of the Library of Alexandria and its annals should be quick to remind us.
Entrepreneurs are looking to address this issue by building crypto-economic protocols that incentivize the decentralization of information storage. “Crypto-economic protocol” runs the risk of sounding like jargon or, even worse, marketing material, so to unpack:
A protocol is a piece of code that each node in a computer network must run in order to communicate with the other nodes
The protocol leverages cryptography to enable nodes to quickly verify the work or computation that other nodes have done
The protocol rewards nodes with the network’s currency upon proper execution of the protocol’s code
New crypto-economic protocols are being designed such that information can be stored across a distributed network of storage providers. Instead of relying on the economies of scale of private companies, storage capacity is aggregated across a network of decentralized computer nodes. At the highest of levels, each node in the network provides cryptographic proofs that they’re faithfully storing information and, in return, the nodes receive payment in the network’s currency. At the moment there are two models for decentralized storage: pay-as-you-go and permanent storage. Filecoin exemplifies the former and Arweave exemplifies the latter.
Decentralized Storage: The Models
The pay-as-you-go model works similarly to existing cloud solutions. Clients pay to 1) rent storage space and 2) access the information being stored. For example, Amazon S3 charges $0.023 per GB per month (for the first 5TB) as well as $0.005 per 1000 requests to access the underlying information. While the Filecoin network is also pay-as-you-go, it diverges from existing solutions in that the network maintains an order book between its users and the network’s nodes (the order book is recorded on the network’s blockchain). The network’s nodes tender “asks'' which specify their price of storage and the network’s users place “bids” which specify their willingness to pay.
Whereas Amazon’s price is set by corporate policy, the price of storage on the Filecoin network is set dynamically based on supply-and-demand. Proponents of Filecoin expect the price of storage to drop below Amazon’s once enough node operators join the network. Filecoin went live at the end of 2020, so it remains to be seen whether or not this happens in practice. New node operators are joining the network and the price of Filecoin is fluctuating, thus it’s difficult to discern where the price of storage will settle. Furthermore, Amazon S3 has a more mature developer model supported by widely adopted APIs. The introduction of Filecoin’s order book and its use of IPFS (a protocol which Filecoin depends on) adds complexity and makes Filecoin more difficult to use than the incumbents. If Filecoin is to gain wider adoption, it must provide stable pricing and easy-to-use developer tooling. If there is too much friction to using the decentralized solution, most developers will stick with incumbents.
The permanent storage model, pioneered by Arweave, provides users storage in perpetuity in exchange for a one-time fixed fee. This is primarily possible because 1) the cost of data storage decreases significantly over time (the trend is a derivative of Moore’s Law) and 2) the Arweave network uses a clever payment mechanism that allows for it to pay its nodes indefinitely. 83% of all user payments are sent to an “endowment pool” which is periodically drawn upon by the network to pay its storage providers for their services. The endowment pool’s payment schedule is based on a conservative extrapolation of how the cost of data storage will decrease in the future.
Permanent, decentralized storage is a necessary building block for decentralized applications because it obviates the intermediaries' role in information storage. While the pay-as-you-go model also provides decentralized storage, the maintenance of the order book is susceptible to centralization. This is because 1) users won’t necessarily have the time or technical competency to maintain the order book themselves 2) the ask prices on the order book are liable to change drastically over time, making purely automated maintenance difficult. Take, for example, an NFT for a digital collectible. If the digital collectible is stored on the Filecoin network, the owner of the NFT will need to make on-going payments for the storage of the underlying asset. NFT owners will naturally outsource this responsibility to other parties, creating a new intermediary in the ownership of digital assets. If, instead, the underlying asset is recorded in permanent storage for a one-time fee, no intermediary is needed to maintain the underlying asset. This is truly disruptive. By eliminating the need for an intermediary, the permanent storage model will unlock a swath of new decentralized applications.
Cachet of Cache
Currently, decentralized file storage systems are being used primarily in two ways.
1) To archive vast troves of blockchain ledger data
2) To secure high-value digital items.
Solana (Layer 1), Skale (Ethereum application), and Blockswap (Binance Smart Chain application) have unveiled bridges to permanent storage. For high-throughput blockchains like Solana, which produce roughly double the transaction data of Ethereum, Bitcoin, Polkadot, Algorand, and Cosmos combined, it’s sensible to outsource storage to a specialist. If performance blockchains are to operate as intended, they’ll need to populate and recall massive sets of transaction data in perpetuity. Solana has settled on Arweave, “ensuring that the ledger has a highly fault-tolerant, decentralized storage solution.” (Anatoly Yakovenko). For specialized chains and applications to secure value, scale, and interoperate in a multi-chain ecosystem, they’ll look towards publicly accessible and permanent storage.
Buyers of high-value digital art, collectibles, and in-game assets have equally strong rationale for a potent storage solution. Owners of a Monet or Basquiat will pay good money for a climate-controlled and secure vault—peace of mind. On the other hand, NFT marketplaces will often store its users’ digital files (art, in-game accessory, event ticket etc.) on a centralized or pseudo-decentralized server despite the underlying token (ERC-721, ERC-1155) being embedded in a blockchain. For lower value items, the creators and consumers are probably indifferent to this oversight. There is simply not enough value at risk for such a demographic to care. These users are concerned primarily with seamlessness (fiat onramps, no complicated wallet integrations), the UX, and product range of the marketplace. But owners of scarcer assets can’t afford to shirk on security and look beyond the product and distribution layers. They seek robust property rights and more importantly, permanence.
Permanence comes from a combination of bona fide decentralization and redundancy across time.
Bona fide decentralization means that the file lives on servers that are uncorrelated (at the very least uncorrelated with the platforms that they were minted and sold on. E.g. NiftyGateway hosts files on a Nifty-controlled IPFS server).
Redundancy means that a single file should be stored and retrievable from multiple sources.
A permanent storage protocol needs to properly incentivize both decentralization and redundancy in perpetuity. FileCoin and its peer set are decentralized but their pricing models are contract-based (pay as you go) which doesn’t truly incentivize redundancy over time. Arweave users pay a lump sum up front and only a fraction of it goes immediately to storage providers. The remainder of the sum is dripped out over time to those who can prove they are maintaining files. Furthermore, the BlockWeave protocol forces storage providers to recall random past files to participate in the next block. The endowment pool and random recall mechanisms more tightly promote permanent storage and thus align the behavior of providers with the desires of users.
To Satisfy will Suffice
To portend how users and developers will decide between the different centralized and decentralized storage solutions, we should consider what each agent is optimizing for. The motivations of those using the purest decentralized solutions will clearly differ from those for whom a cheap, centralized option will suffice. Permanent storage, paid for up-front, does not attempt to compete with traditional alternatives on cost, so we can impute an incremental appeal to the user above and beyond affordability, scale, and quick recall. Contract solutions like Filecoin will scale and as bidding from storage providers increases, prices may approach those of incumbent centralized providers. In a dynamic similar to that of centralized storage, the latent dependency risk of relying on an intermediary will only be realized when something goes wrong. A further risk lies in the volatility in pricing for contract storage. For now, most users of contract solutions are sensitive to the risks of centralized storage but are equally if not more sensitive to the premiums demanded by more robust decentralized solutions. Agents are satisficers: they’ll elect for the cheapest option that they feel will enable them to adequately achieve their mission.
Storage Taxonomy: Who’s Going to Use What?
Centralized solutions will continue to command market share over the medium term. For most business applications, managers need scale at an affordable cost. AWS, Google Cloud, Baba cloud and the like generate the economies of scale to serve lots of data quickly and reliably. That centrally held data is tractable only becomes a deterrent in the case that these vulnerabilities are exploited. Until then, many business applications will keep on keeping on.
Pay-as-you go solutions will provide the backbone for much of Web 3.0 archival activity. Web 3.0 is built on distributed fundamentals and philosophy. Applications, especially the younger ones, will seek an affordable solution outside of walled servers. Most of the contemporary storage solutions have taken a contract-based tack and this approach will thus be most familiar and accessible to developers. Filecoin, Storj, and Sia protocols will compete for miners and thus storage costs, driving a pricing race to the bottom and stimulating demand.
We think a few user groups and apps will elect for an Arweave-like (upfront cost) solution imminently. First, blockchains with gargantuan public ledgers. Just like in any industry, blockchains will specialize in their core competencies. As they concentrate, they’ll be rewarded by users drawn to their domain-specific benefits. As this feedback loop plays out and more working capital is freed up, blockchains will increasingly outsource expensive and technically complex archival and recall to storage specialists. Furthermore, for blockchain applications (esp. In DeFi) to grow unbridled, they’ll need a rigorous and auditable trail.
Those uniquely vulnerable to censorship and suppression will also opt for a more permanent decentralized solution. Pseudo-independent nation states (Hong Kong, Taiwan, Ukraine), companies operating within authoritarian regimes, dissidents, authors, and journalists will be keen to circumvent centralized platforms, mediums, and data centers. As can be witnessed by Google’s kowtowing in China, search engines and other data highways are bottlenecks for the honest promulgation of information. For the heretic, paying the premium to streamline and protect the supply chain from brains to eyeballs is well worth it. Instead of smuggling Dr. Zhivago to Milan, Boris Pasternak might have preferred a more elegant solution: Permastorage.