Architecture
Last updated
Last updated
Streams are the core method of passing, updating, transforming, and merging data on the platform. All the streams are append-only and allow communicating data between a single producer and multiple consumers.
There are two types of storage our platform can work with: Kafka and our own storage called StreamDB.
Kafka does a good job storing streaming data and serving it for distributed applications. However it doesn't fully fit our needs for the following reasons.
Its notion of event offsets is implementation-defined. It makes it hard to search for an event by it's number in the stream.
Kafka doesn't provide an easy way to read the stream backwards. It makes it really painful to work with our design of reorg handling because for locating a point in the stream history we need to perform searches by a non-increasing properties of events.
All those problems may be solved on the application level but it becomes computationally demanding really fast. Therefore we came with a design of our own service solving the mentioned problems.
StreamDB is a simple centralized stream storage based on flat files and designed specifically for the blockchain use-case. It natively supports the concept of rollbacks in the form of "undo" events. For that we introduce the notion of state_id
that is a unique identifier of the point in the events history unaffected by the forks happening before that point. For example, the series of events "A, B, undo B, C" and "A, C, D, undo D" lead to the states with the same state_id
. It helps to save the application state and continue processing streams even if the input has changed for some reason.
Streams are created and consumed by streaming applications. These applications have the capacity to:
Fetch external data and produce a stream from it (e.g., discover Ethereum block headers, pricing data, etc);
Transform and filter existing streams (e.g., gain ERC20 transfers from Ethereum blocks, filter out single account transactions, etc);
Aggregate streams into a stream that summarizes the most important information into a smaller series of events (e.g., total exchange volume within a time period);
Merge many streams into a single one (e.g., combine DEX events from multiple chains into one stream).
Blockchain nodes are inconsistent; they lose data or shut down often. In order to overcome these issues, Proxima uses blockchain indexers and blockchain native APIs that transform blocks and transactions into events for Proxima streams. This serves as a caching layer between blockchains and our services and enables faster processing of blocks in streaming applications. Access to these indexers is still limited to streaming applications run by Proxima.