The Distributed Storage Competition – Part 2

In our previous piece The Distributed Storage Competition – Part 1 we introduce five distributed storage networks, and provide an overview for each’s size, backing, and approach.

We also introduced ScPrime, the network underlying Xa Net Services. It is not covered in detail here, but can be learned about in Part 1 and is referenced in various parts of this article.

Sybil attacks are when an entity creates numerous real-looking nodes to overwhelm a network through numbers

Sia picture

Sia (Sigh-Uh) is one of the earliest projects in the space and implemented the original Renter-Host Protocol. It would not be improper to characterize much of the early philosophy around Sia as “Bitcoin for storage,” with formulating on a Bitcoin forum providing technical and ideological similarities [1][2][3]. Sia enjoyed an early lead pioneering to a proof-of-concept stage and becoming a top 20 crypto project by market cap for much of 2016-2017. In the years since, Sia (and through the spinoff Skynet) has continued pushing forward on Web3 ideals to mixed success and a fall in market cap ranking out of the top 100. The storage network has struggled to gain much traction, with recent network charts portraying stagnation or decline.

For the unfamiliar, Web 3 is the anticipated successor to the client-server model made popular in the late 90s/early 00s (Web 2). This older model enabled straightforward and low-barrier involvement in the internet and its services, but also consolidated power toward several tech giants to which many of the responsibilities around content distribution and access has become outsourced. Still emerging, Web 3 (typically stylized as Web3) democratizes these responsibilities and retains greater control around data and access for the user. Much of the development to date on Web3 in the storage space has been in web hosting and content delivery. A focus on content being difficult to take down and easily sharable links led to Sia/Skynet hosting many types of content unwelcome on the rest of the internet, at one point becoming deplatformed at a DNS level due to the quantity and severity of complaints.

Fundamental to the Renter-Host Protocol are blockchain-level smart-contracts that outline terms like involved parties and pricing, as well as enforcing ‘storage proofs’ that provide cryptographic evidence that stored data is intact and available. Storage providers must lock ‘collateral’ equivalent to at least the value of the contract which is returned at successful completion/renew, or lost if a proof fails. This ensures an alignment of renter and host interests and closes common attack vectors like Sybil. Sia developed the storage renter and host to a proof-of-concept stage, allowing a proto storage marketplace to form in the 2016–2017 time frame.

ScPrime learned from Sia’s early storage marketplace and network effects. After it was clear many providers did not adequately maintain pricing amid coin volatility, we pioneered autopricing and incentives to motivate high quality configurations. Traditionally too technical to reach a wide audience, we streamlined storage providing with mature hardware and software options. Prioritizing the long-neglected renter functionality was a raison d’être, core to enabling broad and meaningful network use. We are pleased our implementation of the Renter-Host Protocol achieves everything we set out and is positioned to power many of the most common cloud storage uses.

Comparing ScPrime and Sia directly, the most apparent difference is in the renter architecture, which informs how data is stored, accessed, and paid for. On Sia each individual user is expected to fund and manage their own blockchain-level storage contracts with providers, and compatibility with common storage related applications is non-existent. ScPrime innovates by enabling Xa Net Services (XNS) customers to store and access data using their favourite apps directly to thousands of independent providers without having to handle on-chain negotiations, which are instead performed by XNS contract servers. This allows users to harness all the performance and availability advantages of distribution without taking on the responsibility of blockchain, contract, and coin management. Delivering Web3 functionality with Web 2 ease, this methodology bridges the strengths of both in a manner sometimes called Web2.5. This architecture also allows for a future handoff of contract management when customers may desire to take this responsibility in-house. Our slogan Evolution, then Revolution is inspired by this approach.

Edge is a paradigm that brings data storage closer to where it is used. This improves response times and can save bandwidth.

Filecoin picture

Filecoin is a monetization/incentive layer for the associated Interplanetary File System (IPFS) protocol, who’s connection earned Filecoin one of the highest earning public coin offerings of all time at $257M. IPFS is a protocol for accessing public-facing web content in a decentralized manner and has gained some traction within the Web3 space. It’s worth noting that IPFS is often used through dedicated gateways independent of Filecoin. These gateways however must handle payments for the services they provide, which is what Filecoin addresses with its native blockchain and FIL token.

Since Filecoin is ultimately serving IPFS it is important to understand what IPFS is suited to and what it is not. IPFS is best suited to static websites, NFTs, or public datasets and information; this data is intended to be shared broadly with no authentication or permissions. It is not well suited to many cloud storage uses such as backups, sensitive documents & media, or where high performance is required.

On tokenomics, Filecoin has a total supply of 2B tokens, 45% of which is managed, owned, or distributed by the founding entities on a primarily 6-year vest. This 900M token supply is broken down in to various tranches.

The specifics of token emission are complex on Filecoin, but notable is a mechanic that increases the mining reward provided the network remains growing (“Baseline” minting). This has incentivized large capacity figures ahead of demand for the network.

Looking into the architecture and design of Filecoin, the model is complex. Instead of a more traditional approach to blockchain consensus (the process to create new blocks) Filecoin involves storage providers, requiring them to also bring significant compute power. The outcome is that hardware requirements to become a provider on Filecoin are steep: $7,500-$10,000 per machine to have a decent chance at earning anything. These setups draw significant amounts of electricity and are complicated to properly configure. This high bar limits who can participate as a provider in this network, and how ‘edge‘ the data can be stored.

As opposed to bringing revenue in for the services provided by the network, earnings for providers on Filecoin are almost entirely driven by the block mining reward. Since Filecoin rewards increase with network use, many providers fill the network with arbitrary data to drive earnings higher. This makes measuring real utilization and capacity on the network difficult. Filecoin analytics browser filters for only miners listing successful storage deals, reporting network use and capacity of 24 PB and 872 PB; orders of magnitude below the claimed 214 PB of 18 EB (though still the largest decentralized storage network). This choice to incentivize storage providers through the block reward means there is no demand for the coin itself, as renters hardly have to pay them. Using the block reward to subsidize use draws the real cost of storage from inflation, which Filecoin is still in the steep part of. Its been able to survive at such scale thus far due to financial backing, but a storage network with no revenue plans would not appear to be sustainable.

Were one to store some data on the network there is no S3 compatibility as IPFS is not positioning itself for that type of use. As such there is also no native encryption, no file redundancy controls, and data is likely destined for China where most Filecoin mining happens. Access to stored data is made challenging due to a “sealing” process which transforms data in such a way that it is costly and time consuming to produce the original data again (frequently hours) for on-demand access.

Filecoin’s primary attraction to IPFS users is an incredibly low price thanks to an embracing of race-to-the-bottom pricing and the block reward subsidy, as well as storage past when a typical IPFS file may no longer be ‘pinned.’ Filecoin does involve Renter-Host Protocol-like elements such as collateralization and storage proofs, but the sealing process increases friction and cost to retrieve all but the most frequently accessed files.

Arweave picture

Arweave too is primarily pursuing Web3, but is the most architecturally dissimilar of the projects considered. All the networks outlined thus far use blockchain to handle the accounting for the storage marketplace that ultimately places the files on off-chain (or chain-adjacent) hard drives. Arweave places the files on the blockchain itself, which immediately challenges the ability to scale as networks typically require block-producing nodes to maintain a complete copy of a blockchain. While a typical blockchain can be measured in single- or double-digit gigabytes, Arweave’s is over 90 TB and growing. Arweave gets around this by devising a novel mining protocol that doesn’t require providers keep a whole copy of the chain. This protocol only requires the block finder prove they have some of the blockchain, and thus, somebody’s file. The hope is this adequately incentivizes all of the blockchain (and thus data) to be made available across its block miners, but it can’t be known yet if this novel mechanism scales as intended; there are already comparatively few storage providing nodes. The primary appeal for Arweave is that like most blockchains, one only has to pay upfront to get a place in the ledger forever, allowing small files to pay once and become accessible long term. Arweave has won over some NFT use, as being small and expensive they are well suited to this model.

Storj picture

After raising $30M in a 2017 public token offering, Storj (Storage) has pivoted several times, as indicated by the network’s Version 3 (“V3”) moniker. Storj too uses blockchain for payments to providers, but does not use smart contracts for trustless transparent accountability and accounting. One of the larger departures Storj takes is that the blockchain they rely on is not their own; unlike conventional storage networks that utilize their own ‘layer 1’ network, Storj exist as a ‘layer 2’ on top of Ethereum. This does have the benefit of outsourcing blockchain development, but at the cost of becoming involved in the ups and downs of that other network. Due to the frequent large fees/congestion on the Ethereum chain, payments to providers are often impacted and made to look for alternatives.

This also introduces concerns regarding Storj’s Byzantine fault tolerance (BFT) – a term that considers the resilience of a network to errors and attack. BFT is central to the transparent and trustless nature of decentralized networks – Bitcoin’s Proof of Work blockchain concept solved this problem and inspired a wave of unique implementations. Without BFT trust in a system is required, providing little transparency for users to verify fair accounting or retain ownership over owed tokens. BFT is in the interest of both renter and host, ensuring payment and services are timely, correct, and in public permanent record. Storj has no analogue to the Renter-Host Protocol, and does not involve blockchain and smart-contracts for storage proofs or collateral locking. Payments, pricing, and access to the network is wholly running on Storj’s central servers. Storage providers being paid in an ERC-20 token instead of dollars, and storing data across numerous independent nodes (resembling Web3 concepts) are the main reasons Storj is associated with blockchain/crypto projects – it is not decentralized in many of the most important ways a network could be.

The V3 whitepaper addresses why Storj avoids BFT: “Ultimately, all of the existing solutions [to BFT] fall short of our goal of minimizing coordination… Until it is clear that one has arisen, we are reducing our risk by avoiding the problem entirely.” Storj may intend to adopt BFT mechanisms one day, but there is no straightforward way to bring this about without significant architectural overhaul. However, as the V3 suggests, overhaul is an option they’ve exercised before.

With Storj having no blockchain, there can be no natural token minting process. All ~425M Storj tokens were created at launch, with 245M (~57.6%) of the supply going to Storj Labs. 72M tokens were sold in the aforementioned ICO for a total of $30M. As of September 2022, Storj retains 193.2M tokens (~45%).

Storj Network Stats, Sept 2022

Storj has a network that you can point S3-capable applications to, use them as expected, and get billed in dollars monthly; it is for this reason Storj is the only network we consider a real competitor to ours. Apart from the lack of meaningful decentralization, there are some differences between Storj’s storage client and Xa Net Services’ storage client (commonly referred to as “Relayer”) that illustrate a gap in usability.

Starting on Storj you can opt for either a centrally hosted gateway, which is pushed as what seems to be the preferred option, or you can set up and host your own. We were much more interested in the self-hosted client even though it is a higher bar to set up, as having high levels of control and sovereignty maximizes the many advantages of this new type of cloud. Opting for the hosted gateway removes the direct-to-provider ability, negating most of the advantages (performance or otherwise) of using distributed cloud. Thus we pursued configuring self-hosted, but still found few controls over some important aspects of using a network of this nature, such as erasure code scheme and provider filtering/selection.

Erasure codes (EC) are critical for data storage at any modest scale. The EC process takes in normal data and outputs it alongside redundant (“parity”) shards that allow you to rebuild the original file even with just a fraction of the shards. It is typical to distribute these data and parity shards across multiple storage providers to allow file access even while some providers are offline, but there are performance and availability implications to how this is implemented (read more here).

Storj’s EC scheme enforces a 29-data 51-parity shard configuration (~2.76x), though there is no available explanation for how these figures were reached. Of the 80 shards to be spoken for there are truly 110 created. All of them are uploaded out to providers on the network, but when the first 80 finish uploading the remaining 30 are abandoned. Storj does this as a primitive provider filter (so one is never stuck waiting on a slow upload) but this creates inefficiency and overhead. The same is done for downloads, where it will attempt to pull 39 shards and stop after it has the required 29.

Erasure codes aside, the difference in settings presented between Storj and the Relayer apps was prominent. In the Xa Net ecosystem parameters for controlling EC settings, on-machine cache, metadata backups, and storage provider selection are not only presented but encouraged to get the most out of a configuration. There’s merit in simplicity, but when performance is left on the table as a result, why limit users?

Xa Net Services is proud to deliver the most compelling blend of distribution, performance, sovereignty, and compatibility in the industry

Image for ScPrime website. ScPrime is the project behind the distributed S3 cloud provider, Xa Net Services.

We’ve now covered the four other networks working on unique flavours of distributed storage. All of them would agree this new type of cloud is a brilliant concept, but envision different approaches and applications; it is both the differences and similarities that provide insight to the observer. It is our belief that the ScPrime & Xa Net platform brings the most of what makes this new type of cloud compelling, appropriately balancing the capabilities of distributed net with the straightforward plug-and-play of traditional cloud. Read further and see our network in action in our Network & Product Demo.