Instead, I'm going to characterize the various "shared-nothing" style systems by their query evaluation architectures. Most systems fall into one of the following buckets:
- Centralized compute
- Limited distributed compute
- Fully distributed compute
Centralized Compute: MySQL Cluster
MySQL Cluster consists of two basic roles used for servicing user queries: a compute role and a storage/data role. The compute node is the front end which takes in the query, plans it, and executes it. The compute node will communicate with the storage nodes remotely to fetch any data relevant to the query.
In the distributed storage model, data is no longer shared between the nodes at the page level. Instead, the storage nodes expose a higher level API which allows the compute node to fetch row ranges based on the available access paths (i.e. indexes).
In such a system, storage level locks associated with the data are now managed exclusively by the storage node itself. A compute node does not cache any data; instead, it always asks the set of storage nodes responsible for the data. The system solved the cache coherence overhead problem.
However, it still suffers from extensive data movement and centralized query evaluation.
Cache coherence overhead- Extensive data movement
- Centralized query evaluation
MySQL Query Evaluation in Action
Consider the following example:
SELECT count(*) FROM mytable WHERE acol = 1 and bcol = 2
Assumptions:
- an index over acol
- 10% of the rows in the table match acol = 1
- 3% of the rows match acol = 1 and bcol = 2
- total table size 1 Billion rows
In the diagram above, the arrows represent the flow of data through the system. As you can see, most of the query evaluation in the example is done by a single compute node. The system generated a data movement of 100 million rows, and only a single node performed additional filtering and aggregate count.
It's an improvement over a shared disk system, but it still has some serious limitations. Such a system could be well suited for simple key access (i.e. query touches a few specific rows), but any more complexity will generally result in poor performance
As with the shared disk system, adding more nodes will not help improve single query execution, and queries which operate over large volumes of data have the potential to saturate the message bus between the nodes.
MySQL Cluster does two types of condition pushdowns.
ReplyDeleteThe first was introduced in MySQL 5.0 and solves the
problem described above. So the actual filtering is
done by the data node, thus in this case 30 million
rows would be transported.
The second type of condition pushdown is introduced in
MySQL Cluster 7.2 where an entire join query can be
pushed to the storage nodes and evaluated.
Miakel, it's interesting to see that MySQL Cluster does some limited predicate pushdowns. I suppose I could change my example to having a non-trivial expression constraint (e.g. a + b < 10), and that wouldn't get pushed down.
ReplyDeleteI'll have to read about the distributed join processing in 7.2. Looks like it's only used for a few select cases...
I'll correct my post and move MySQL cluster to the limited distributed evaluation category.