Partitioning in Large Data Systems
Achieving scalability in databases has meant exploiting large
parallel/distributed systems. All top TPC machines have implemented
such strategy for a while now. Data is partitioned across
nodes. Queries run at nodes and results are aggregated. This strategy
scales linearly in several common scenarios.
Often, though, that data partitioning scheme is static. A partition is
always assigned to a same node. The set of nodes that belong to the
system never changes. What if one wanted to improve the overall system
capacity incrementally, by adding nodes to the system as need be? What
if the original partitioning of data wouldn't divide the current load
evenly anymore? One may like to solve these problems without much
intervention or downtime.
Recent systems created and deployed by groups in the industry have
implemented data partitioning schemes that achieved the desired
incremental scalability and the adaptive load balancing. Apparently,
some functionality was sacrificed in the process, though (e.g. little
support for high-level query languages). We want to look into the
architecture of these groundbreaking systems and understand the
algorithms and techniques they used -- and then check if a different
set of functionality trade-offs exists that would allow a richer
support for querying in the mix.
Metis - An Extensible Streaming-Data Querying System
(Project conducted at the IBM T.J. Watson Research Division)
Take me straight to the Metis's talk
A streaming-data querying system is essentially a database system that
allows data to be queried which is not statically stored. Rather than
acting on tuple sets sitting on disks, queries act on tuple sequences
flowing on streams -- on the very moment they are flowing. This near
real-time querying capability makes the need to permanently store data
optional. Queries in this context do not follow the usual
request-response paradigm. They are registered on the system and keep
running in a continuous fashion. In Finance, to give a concrete
example application, streams of stocks' trades and quotes can be
queried so to uncover windows of opportunities as they occur. Other
fields currently benefiting from such systems are network monitoring
and sensor networks, to cite a few.
Metis is a toolkit to implement streaming-data querying engines. There
are two stages in setting up a Metis server. First, a developer
defines which streaming operators (e.g., moving aggregates,
demultiplexors, etc) he or she wants to work with. Metis provides
templates and support classes that make it simple to build these
operators. Alternatively, the developer can pick from an existing
selection of operators. In the second stage, Metis takes a set of
queries defined over the operators and efficiently executes it.
Our goals here are two-fold. We want to develop domain-oriented sets
of operators that would make the query's writer job easier. If the
users quickly grasp how to query streams, then our chance of deploying
Metis in large scale improves. We have started by the financial domain
and have been successfully querying, for instance, the NASDAQ
feeds. The second goal of Metis is to be able to scale up to a very
large number of queries. Keeping up with the speed of streams may not
always be possible, especially when dealing with very spikey
streams. However, we believe that a combination of multi-query
optimization and efficient resource management can greatly raise the
threshold above which Metis would need to resort to load shedding to
keep up with the speed of streams.
Joint work with Dennis Shasha (New York University)
Take me straight to the AQuery's talk
Any query expressible on a set of records can be expressed on an array
of records. The reverse doesn't hold; the notion of order is
missing. Queries like ranks (select the first n), n-tiles (select all
in the top n-tile), deltas (select last - first in a group), or vector
aggregate (moving average) all rely on some ordering on input or on
resulting data. And Finance, Biology, and Linguistics domains, for
instance, have long taken advantage of such a
possibility. Manipulating data as sequences, as opposed to sets, is a
critical new feature if we are to incorporate order into Database
Systems.
Previous attempts were successful in showing that sequences do "fit"
in a relational context. They can be manipulated by extended versions
of relational algebra. They can be queried with extensions of
SQL. Nevertheless, the approaches taken to do that had led to queries
that are somehow difficult to express and thus hard to optimize.
The notion of order is intrinsic to AQuery's data model and query
language. Data is represented by "array tables," called arrables,
which capture order relationship. Every algebra operator has its
impact on order, or absence thereof, clearly defined. The query
language is as faithful as possible to SQL, differing only in its
array-flavor. As a result, AQuery's optimizer often finds liner-time
plans for queries that would otherwise take more complex solutions.