A blockchain is essentially a type of data storage, but very different from a ‘traditional’ database that many of us are used to. The below characteristics are quite different from the well-known traditional, relational and noSQL databases. It is important to understand these differences when considering if blockchain technology is appropriate for your use case.
A blockchain is designed to store transactions grouped into blocks that may then be queried concerning the characteristics of those blocks. Blockchains can be public or private. In a public blockchain, the database is read/append by anyone on the network, and therefore “uncontrolled”. In contrast, within a private blockchain, read and append capabilities are “controlled”.
A blockchain is also a distributed and decentralized type of data storage in that it propagates multiple copies of the shared ledger across blockchain nodes. Each node on the blockchain network stores and maintains its own copy of the shared ledger. Through a network consensus protocol unique to the particular blockchain technology, network nodes agree to which transaction blocks are valid and will be appended to the shared ledger and then synchronize each of their copies of the shared ledger accordingly. This architecture is what contributes to the resilience of the blockchain. Multiple nodes can fail and as long as a critical mass of blockchain nodes remain, this will not impact the data integrity of the shared ledger.
The amount and types of data that can be practically stored on the blockchain are determined by the particular blockchain technology. Data storage restrictions imposed by public blockchains typically are enforced by technological or practical limitations or for economic or ethical reasons. In private networks, these restrictions may not necessarily exist. For example, it may not be economically feasible to store a large data set on a public blockchain. However, it is potentially feasible or desired using one of the private, permissioned technologies. Performance, as well as privacy, security, and compliance should be taken in to consideration in determining what data should be stored on blockchains. General guidance is to store only minimal but sufficient (for the use case) data on blockchain.
The data stored on the blockchain shared ledger are immutable, meaning that changing any previously recorded transactions would require considerable computational power, control, collusion, and expense making it impractical to do so especially as blockchain networks grow larger.
Off-chain storage refers to the storing of data off the blockchain, for example, in a relational database. The on-chain blockchain data can then store metadata about this off-chain data, together with pointers to where the actual data resides, and hash codes that may be used to verify the integrity of the off-chain data. Blockchain can also be used for identify and access control, in other words as a mechanism to control access privileges to this data stored off-chain.
Several distributed, decentralized data storage technologies have emerged to complement blockchain including the InterPlanetary File System (IPFS), Storj, and Filecoin, deriving inspiration from the successful torrent protocol. Like blockchain mining, anyone can “lease” storage space from their computers and participate in the various distributed storage ecosystems. These systems combine various technologies such as encryption, file sharding, distributed hash tables (DHT) for finding the encrypted shards, and in some cases, some form of consensus protocol, such as proof of storage.