Shard

SQLstream s-Server pipeline processing can be scaled up by running multiple instances or "shards" of a pipeline.

Each shard executes an identical SQL pipeline, but each selects a different subset of the source data. Data partitions are distributed across the shards by configuration (details of the how the data is partitioned, and how the shards are configured, varies by plugin).

The shard could be executed on a bare-metal service or could be containerized in a pod. When working on a kubernetes cluster, usually each shard is mapped to a single pod. The pod contains the SQL to be executed, and is configured with the details of that define the shard and which partitions should be processed.

For more information about how SQLstream s-Server assists in sharding SQLstream pipelines see the individual plugins:

If you launch N shards to process your P partitions, the partitions should be distributed as uniformly as possible among the N shards. Each shard operates independently of the other shards - there is no runtime coordination among multiple shards.

Each shard can process multiple partitions, however, a single partition should not be processed by more than one shard. The number of partitions should always be greater than or equal to the number of shards i.e. P >= N. If you launch more shards than there are partitions (N > P), then some shards would remain idle as no partitions would be assigned to them.

Below are some sample scenarios for better understanding of distribution of partitions among shards: (shards and partitions are 0 indexed).

No. of Partitions (P) No. of shards (N) Notes Shard -> {Partition list}
10 1 All partitions assigned to single shard. 0 → {0,1,2,3,4,5,6,7,8,9}
10 2 Even distribution of partitions as P is a multiple of N. 0 → {0,1,2,3,4}
1 → {5,6,7,8,9}
10 3 Uneven distribution of partitions as P is NOT a multiple of N. 0 → {0,1,2,3}
1 → {4,5,6}
2 → {7,8,9}
10 4 Uneven distribution of partitions as P is NOT a multiple of N. 0 → {0,1,2}
1 → {3,4,5}
2 → {6,7}
3 → {8,9}
10 5 Even distribution of partitions as P is a multiple of N. 0 → {0,1}
1 → {2,3}
2 → {4,5}
3 → {6,7}
4 → {8,9}
2 1 All partitions assigned to single shard. 0 → {0,1}
2 2 Even distribution of partitions as P = N. 0 → {0}
1 → {1}
2 3 More shards than partitions: P < N; Shard 2 has no partitions assigned so remains idle. 0 → {0}
1 → {1}
2 → {}