BTFS Gateway
A BTFS Gateway acts as a bridge between traditional web browsers and BTFS. Through the gateway, users can browse files and websites stored in BTFS as if they were stored in a traditional web server.
This page discusses:
- The BTFS gateway request lifecycle.
- The several types of gateways.
- Best practices for HTTP Gateways.
Gateway request lifecycle
When a client request for a CID reaches a BTFS gateway, the gateway first checks whether the CID is cached locally. At this point, one of the following occurs:
- If the CID is cached locally, the gateway responds with the content referred to by the CID, and the lifecycle is complete.
- If the CID is not in the local cache, the gateway will attempt to retrieve it from the network.
The CID retrieval process is composed of two parts, content discovery/routing and content retrieval:
- In the content discovery/routing step, the gateway will determine provider location; that is, where the data specified by the CID can be found:
- Asking peers that it is directly connected to if they have the data specified by the CID.
- Query the DHT for the IDs and network addresses of peers that have the data specified by the CID.
- Next, the gateway performs content retrieval, which can be broken into the following steps:
- The gateway connects to the provider.
- The gateway fetches the CIDs content.
- The gateway streams the content to the client.
Gateway providers
Regardless of who deploys a gateway and where any BTFS gateway resolves access to any requested BTFS content identifier. Therefore, for best performance, when you need the service of a gateway, you should use the one closest to you.
Gateway types
There are multiple gateway types, each with a specific use case, security, performance, and functional implications.
- Read support
- Authentication support
- Resolution style
- Service
Read-only gateways
Read-only gateways are the simplest kind of gateway. This gateway type provides a way to fetch BTFS content using the HTTP GET method.
Authenticated gateways
If a gateway provider wants to limit access to requests with authentication, they may need to configure a reverse proxy, develop a BTFS plugin, or set a cache layer above BTFS.
Configuring a reverse proxy is the most popular way for providers to handle authentication.
Providers can design their own centralized authentication service like Infura Auth.
Resolution style
Three resolution styles exist:
- Path
- Subdomain
- DNSLink
Path
The examples discussed above employed path resolution:
https://{gateway URL}/btfs/{content ID}/{optional path to resource}
Path-resolving gateways, however, violate the same-origin policy that protects one website from improperly accessing the session data of another website.
Subdomain
The subdomain resolution style maintains compliance with the single-origin policy. The canonical form of access, https://{CID}.btfs.{gatewayURL}/{optional path to resource}
, causes the browser to interpret each returned file as being from a different origin.
DNSlink
Whenever the content of data within BTFS changes, BTFS creates a new CID based on the content of that data. Many applications require access to the latest version of a file or website but will not know the exact CID for that latest version. BTNS allows a version-independent BTNS identifier to resolve into the current version's BTFS CID.
The version-independent BTNS identifier contains a hash. When a gateway processes a request in the form https://{gatewayURL}/btns/{BTNS identifier}/{optional path}
, the gateway employs BTNS to resolve the BTNS identifier into the current version's CID and then fetches the corresponding content.
But the BTNS identifier may instead refer to a fully-qualified domain name in the usual form of example.com
.
DNSLink resolution occurs when the gateway recognizes a BTNS identifier containing example.com
. For example, the URL https://btfs.io
returns the current version of that website — a site stored in BTFS — as follows:
- The gateway receives a request in the form:
https://{gateway URL}/btns/{example.com}/{optional path}
- The gateway searches the DNS TXT records of the requested domain
{example. com}
for a string of the formdnslink=/btfs/{CID}
or_dnslink=/btfs/{CID}
. If found, the gateway uses the specified CID to serve upbtfs://{CID}/{optional path}
. As with path resolution, this form of DNSLink resolution violates the single-origin policy. The domain operator may ensure single-origin policy compliance — and the delivery of the current version of content — by adding anAlias
record in the DNS that refers to a suitable BTFS gateway; e.g.,gateway.btfs.io
. - The
Alias
record redirects any access to thatexample.com
to the specified gateway. Hence the browser's request tohttps://{example.com}/{optional path to resource}
redirects to the gateway specified in theAlias
. - The gateway employs DNSLink resolution to return the current content version from BTFS.
- The browser does not perceive the gateway as the origin of the content and therefore enforces the single-origin policy to protect
example.com
.
Gateway services
Currently, HTTP gateways may access both BTFS and BTNS services:
Service | Style | The canonical form of access |
---|---|---|
BTFS | path | <https://{gateway> URL}/btfs/{CID}/{optional path to resource} |
BTFS | subdomain | <https://{CID}.btfs.{gatewayURL}/{optional> path to resource} |
BTFS | DNSLink | <https://{example.com}/{optional> path to resource} preferred, or <https://{gateway> URL}/btns/{example.com}/{optional path to resource} |
BTNS | path | <https://{gateway> URL}/btns/{BTNS identifier}/{optional path to resource} |
BTNS | subdomain | <https://{BTNS> identifier}.btns.{gatewayURL}/{optional path to resource} |
BTNS | DNSLink | Useful when BTNS identifier is a domain:https://{example.com}/{optional path to resource} preferred, orhttps://{gateway URL}/btns/{example.com}/{optional path to resource} |
Best practices
Various best practices for the use of BTFS gateways are listed below.
Selecting a gateway type to use
The preferred form of gateway access varies depending on the nature of the targeted content. Learn more about each gateway type and how it works here.
Target | Preferred gateway type | The canonical form of access features & considerations |
---|---|---|
Current version of potentially mutable root | BTNS subdomain | https://{BTNS identifier}.btns.{gatewayURL}/{optional path to resource} + supports cross-origin security + supports cross-origin resource sharing + suitable for both domain BTNS names ( {domain.tld} ) and hash BTNS names |
BTFS DNSLink | https://{example.com}/{optional path to resource} + supports cross-origin security + supports cross-origin resource sharing + requires DNS update to propagate the change to root content • DNSLink, not user/app, specifies the gateway to use, opening up potential gateway trust and congestion issues | |
Immutable root or content | BTFS subdomain | https://{CID}.btfs.{gatewayURL}/{optional path to resource} + supports cross-origin security + supports cross-origin resource sharing |
Any form of gateway provides a bridge for apps without the native support of BTFS. Better performance and security results from native BTFS implementation within an app.
Self-hosting a gateway
If you are running a BTFS node that is also configured as a BTFS gateway, each of the tips below can help improve the discovery and retrievability of your CIDs.
- Pin your CIDs to multiple BTFS nodes to ensure reliable availability and resilience to failures of nodes and network partitions.
- Use a custom domain that you control as your BTFS gateway for flexibility in implementing performance optimizations. You can do this using one of the following methods:
- Point a domain you control like
mydomain.btfs.yourdomain.io
to a reverse proxy like nginx, which will proxy requests to a public gateway, allowing you to switch public gateways if there's downtime. - Use a service like Cloudflare Workers or Fastly Compute@Edge to implement a lightweight reverse proxy to a gateway.
- Point a domain you control like
- Set up peering with the pinning services that pin your CIDs.
- Make sure that your node is publicly reachable.
- You can check the reachability of your node by running
btfs id
and checking for the/btfs/kad/1.0.0
value in the list of protocols (or, in one command, by runningbtfs id | grep btfs\/kad
). - If your node is not reachable because you are behind NAT, see the NAT configuration docs.
- You can check the reachability of your node by running
- Ensure that you are correctly returning HTTP cache headers to the client if the BTFS gateway node is behind a reverse proxy. Pay extra attention to
Etag
,Cache-Control
, andLast-Modified
headers. Consider leveraging the list of CIDs inX-Ipfs-Roots
for smarter HTTP caching strategies. - Put a CDN like Cloudflare in front of the BTFS gateway.
- Test and monitor your internet connection speed, with a tool like Speedtest CLI.
- Monitor disk I/O and make sure that no other processes are causing disk I/O bottlenecks with a tool like iotop or iostat.
Avoiding centralization
Use of a gateway requires location-based addressing: https://{gatewayURL}/btfs/{CID}/{etc}
All too easily, the gateway URL can become the handle by which users identify the content; i.e., the uniform reference locator (URL) equates (improperly) to the uniform reference identifier (URI). Now imagine that the gateway goes offline or cannot be reached from a different user's location because of firewalls. At this moment, content improperly identified by that gateway-based URL also appears unreachable, defeating a key benefit of BTFS: decentralization.
Similarly, the use of DNSLink resolution with Alias
forces requests through the domain's chosen gateway, as specified in the dnslink={value}
string within the DNS TXT record. If the specified gateway becomes overloaded, goes offline, or becomes compromised, all traffic with that content becomes deleted, disabled, or suspected.
Use subdomain gateway resolution for origin isolation
To prevent one website from improperly accessing HTTP session data associated with a different website, the same-origin policy permits script access only to pages that share a common domain name and port.
Consider two CIDs each representing a different website accessed with the path resolution style:
https://btfs.io/{CID A}/{website A}
https://btfs.io/{CID B}/{website B}
Because their origin (hostname and port) are the same, the same-origin policy does not apply.
To ensure the security provided by the same-origin policy, use the subdomain gateway:
https://{CID A}.btfs.{gatewayURL}/{website A}
https://{CID B}.btfs.{gatewayURL}/{website B}
A browser employing one gateway to access both sites, however, might not enforce that security policy. From that browser's perspective, both pages share a common origin: the gateway as identified in the URL `https://{gatewayURL}/...`.
The use of subdomain gateways avoids violating the same-origin policy. In this situation, the gateway's reference to the two pages becomes:
https://{CID A}.btfs.{gatewayURL}/{webpage A}
https://{CID B}.btfs.{gatewayURL}/{webpage B}
These pages do not share the same origin. Similarly, the use of DNSLink gateway avoids violating the same-origin policy.
Cross-origin resource sharing (CORS)
CORS allows a webpage to permit access to specified data by pages with a different origin.
Gateway man-in-the-middle vulnerability
Employing a public or private HTTP gateway sacrifices end-to-end cryptographic validation of the delivery of the correct content. Consider the case of a browser fetching content with the URL https://ExampleGateway.com/btfs/{cid}
. A compromised ExampleGateway.com
provides man-in-the-middle vulnerabilities, including:
- Substituting false content in place of the actual content retrieved via the CID.
- Diverting a copy of the query and response, as well as the IP address of the querying browser, to a third party.
A compromised writeable gateway may inject falsified content into the BTFS network, returning a CID that the user believes to refer to the true content. For example:
- Alice posts a balance of 123.54 to a compromised writable gateway.
- The gateway is currently storing a balance of 0.00, so it returns the CID of the falsified content to Alice.
- Alice gives the falsified content CID to Bob.
- Bob fetches the content with this CID and cryptographically validates the balance of 0.00.
To partially address this exposure, you may wish to use the public gateway gateway.btfs.io as an independent, trusted reference with both same-origin policy and CORS support.
Assumed filenames when downloading files
When downloading files, browsers will usually guess a file's filename by looking at the last component of the path, e.g., https://{domainName/tld}/{path}/userManual.pdf
downloads a file stored locally with the name userManual.pdf
. Unfortunately, when linking directly to a file with no containing directory in BTFS, the CID becomes the final component. Storing the downloaded file with the filename set to the CID fails the human-friendly design test.
To work around this issue, you can add a ?filename={filename.ext}
parameter to your query string to preemptively specify a name for the locally-stored downloaded file:
Style | Query |
---|---|
Path | https://{gatewayURL}/btfs/{CID}/{optional path to resource}?filename={filename.ext} |
Subdomain | https://{CID}.btfs.{gatewayURL}/{optional path to resource}?filename={filename.ext} |
DNSLink | https://{example.com}/{optional path to resource} or https://{gatewayURL}/btns/{example.com}/{optional path to resource}?filename={filename.ext} |
Stale caches
A gateway may cache DNSLinks from DNS TXT records, which default to a one-hour lifetime. After content changes, cached DNSLinks continue to refer to the now-obsolete CID. To limit the delivery of obsolete cached content, the domain operator should change the DNS record's time-to-live parameter to a minute 60.
Updated 9 months ago