Friday, November 4, 2011

Distributed Database Architectures: Distributed Storage / Centralized Compute

In my previous post I wrote about shared disk architectures and the problems they introduce. It's common to see comparisons between shared disk and shared nothing architectures, but that distinction is too coarse to capture the differences between various shared nothing approaches.

Instead, I'm going to characterize the various "shared-nothing" style systems by their query evaluation architectures. Most systems fall into one of the following buckets:
  • Centralized compute
  • Limited distributed compute
  • Fully distributed compute

Centralized Compute: MySQL Cluster

MySQL Cluster consists of two basic roles used for servicing user queries: a compute role and a storage/data role. The compute node is the front end which takes in the query, plans it, and executes it. The compute node will communicate with the storage nodes remotely to fetch any data relevant to the query.


In the distributed storage model, data is no longer shared between the nodes at the page level. Instead, the storage nodes expose a higher level API which allows the compute node to fetch row ranges based on the available access paths (i.e. indexes).

In such a system, storage level locks associated with the data are now managed exclusively by the storage node itself. A compute node does not cache any data; instead, it always asks the set of storage nodes responsible for the data. The system solved the cache coherence overhead problem.
However, it still suffers from extensive data movement and centralized query evaluation. 

  • Cache coherence overhead
  • Extensive data movement
  • Centralized query evaluation

MySQL Query Evaluation in Action

Consider the following example:

SELECT count(*) FROM mytable WHERE acol = 1 and bcol = 2

Assumptions:
  • an index over acol
  • 10% of the rows in the table match acol = 1
  • 3% of the rows match acol = 1 and bcol = 2
  • total table size 1 Billion rows

In the diagram above, the arrows represent the flow of data through the system. As you can see, most of the query evaluation in the example is done by a single compute node.  The system generated a data movement of 100 million rows, and only a single node performed additional filtering and aggregate count.

It's an improvement over a shared disk system, but it still has some serious limitations. Such a system could be well suited for simple key access (i.e. query touches a few specific rows), but any more complexity will generally result in poor performance

As with the shared disk system, adding more nodes will not help improve single query execution, and queries which operate over large volumes of data have the potential to saturate the message bus between the nodes.

2 comments:

  1. MySQL Cluster does two types of condition pushdowns.
    The first was introduced in MySQL 5.0 and solves the
    problem described above. So the actual filtering is
    done by the data node, thus in this case 30 million
    rows would be transported.

    The second type of condition pushdown is introduced in
    MySQL Cluster 7.2 where an entire join query can be
    pushed to the storage nodes and evaluated.

    ReplyDelete
  2. Miakel, it's interesting to see that MySQL Cluster does some limited predicate pushdowns. I suppose I could change my example to having a non-trivial expression constraint (e.g. a + b < 10), and that wouldn't get pushed down.

    I'll have to read about the distributed join processing in 7.2. Looks like it's only used for a few select cases...

    I'll correct my post and move MySQL cluster to the limited distributed evaluation category.

    ReplyDelete