What is this?

Popular IPFS implementations like Kubo and Helia support the wide array of data that is already content addressed by developing sufficient specs and tooling to be able to support incrementally verifiable large blocks. The utility of IPFS moves closer new userbases, with content-addressable data from Docker containers to Blockchains state-trees now being viable for distribution with IPFS protocols without requiring any additional or duplicative processing.

Specific examples:

It is possible for Iroh to have meaningful interop with major existing IPFS tooling (e.g. kubo, helia, boost).

Even if the integration isn’t completed, making meaningful progress such that there are no non-implementation specific blockers is great progress.

Large raw blocks can operate as files just like small raw blocks
It is possible to support IPLD DAGs with large blocks even if those blocks are not raw

Note: this support is less critical than the others given that once we have raw blocks people can build their own data processing tooling.

Why is this a good idea?

Supporting large blocks has been one of the main action items from the data-transfer or data-structure related tracks over the last two (i.e., all of) IPFS 202#Things.

Please don’t make us come to the next event without having made any progress here.

See https://discuss.ipfs.tech/t/supporting-large-ipld-blocks/15093 particularly

The highlight is that block limits restrict IPLD (and IPFS) from being able to describe all content addressable data to instead just some of it which means leaving out the large amounts of existing content addressable data, limiting IPFS’ reach with newly created content addressable data, limiting the creativity/options of people trying to build within the IPFS ecosystem.

Not doing this makes it incredibly hard to support https://specs.ipfs.tech/architecture/principles/ since people will continue to make comments about iroh ↔ everything-else lack of compatibility. We’ve tried the “conceptual compatibility” approach with Filecoin and lotus and despite the fact that lotus and boost are (and have been) IPFS implementations, selling that to the community has been incredibly hard.

This is sneaking up on us in other places as well. People are starting to move around Filecoin data by CommP CID more and more which is effectively one large raw block.

https://github.com/filecoin-project/FIPs/blob/master/FRCs/piece-gateway.md could (with some small tweaks) be fully described by the HTTP Gateway API (i.e. getting the full data as a deserialized response, or getting an incrementally verifiable version as a CAR).

What do we need to do?

Specification for storing and transporting incrementally verifiable large blocks in CAR files

Unlocks incrementally sending large raw blocks (i.e. files) with the Trustless HTTP Gateway
Unlocks indexing (and serving) of trustless CARs by Boost (for blocks smaller than a single piece)

boxo and helia based implementations for being able to fetch data (including large blocks) without exposing out-of-memory errors or attack vectors

Ideally this is supportable within kubo (and bifrost-gateway). While supporting those binaries is not strictly necessary, users who were hoping for iroh to be compatible with kubo or ipfs.io will be unhappy without this.

Welcome pre-existing content addressed data via supporting large blocks

What is this?

Why is this a good idea?

What do we need to do?