Set up a Full Node
If you're building dApps or products on a Substrate-based chain like Polkadot, Kusama, or a custom Substrate implementation, you want the ability to run a node-as-a-back-end. After all, relying on your infrastructure is always better than a third-party-hosted one in this brave new decentralized world.
This guide will show you how to connect to Polkadot network, but the same process applies to any other Substrate-based chain. First, let's clarify the term full node.
Types of Nodes
A blockchain's growth comes from a genesis block, extrinsics, and events.
When a validator seals block 1, it takes the blockchain's state at block 0. It then applies all pending changes on top of it and emits the events resulting from these changes. Later, the chain’s state at block one is used the same way to build the chain’s state at block 2, and so on. Once two-thirds of the validators agree on a specific block being valid, it is finalized.
An archive node keeps all the past blocks and their states. An archive node makes it convenient to query the past state of the chain at any point in time. Finding out what an account's balance at a particular block was or which extrinsics resulted in a specific state change are fast operations when using an archive node. However, an archive node takes up a lot of disk space - around Kusama's 12 millionth block, this was around 660 GB.
On the Paranodes or Stakeworld websites, you can find lists of the database sizes of Polkadot and Kusama nodes.
Archive nodes are used by utilities that need past information - like block explorers, council scanners, discussion platforms like Polkassembly, and others. They need to be able to look at past on-chain data.
A full node prunes historical states: all finalized blocks' states older than a configurable number except the genesis block's state. This is 256 blocks from the last finalized one by default. A pruned node this way requires much less space than an archive node.
A full node could eventually rebuild every block's state without additional information and become an archive node. This still needs to be implemented at the time of writing. If you need to query historical blocks' states past what you pruned, you must purge your database and resync your node, starting in archive mode. Alternatively, you can use a backup or snapshot of a trusted source to avoid needing to sync from genesis with the network and only need the states of blocks past that snapshot.
Full nodes allow you to read the current state of the chain and to submit and validate extrinsics directly on the network without relying on a centralized infrastructure provider.
Another type of node is a light node. A light node has only the runtime and the current state but does not store past blocks and so cannot read historical data without requesting it from a node that has it. Light nodes are useful for resource-restricted devices. An interesting use-case of light nodes is a browser extension, which is a node in its own right, running the runtime in WASM format, as well as a full or light node that is completely encapsulated in WASM and can be integrated into web apps: https://github.com/smol-dot/smoldot.
Substrate Connect provides a way to interact with substrate-based blockchains in the browser without using an RPC server. It is a light node that runs entirely in Javascript. Substrate Connect uses a smoldot WASM light client to securely connect to the blockchain network without relying on specific 3rd parties. Substrate Connect is available on Chrome and Firefox as a browser extension.