Partitioning in Large Data Systems
Two invited lectures (here and here) given at NYU and some notes.
Achieving scalability in databases has meant exploiting large parallel/distributed systems. All top TPC machines have implemented such strategy for a while now. Data is partitioned across nodes. Queries run at nodes and results are aggregated. This strategy scales linearly in several common scenarios.
Often, though, that data partitioning scheme is static. A partition is always assigned to a same node. The set of nodes that belong to the system never changes. What if one wanted to improve the overall system capacity incrementally, by adding nodes to the system as need be? What if the original partitioning of data wouldn't divide the current load evenly anymore? One may like to solve these problems without much intervention or downtime.
Recent systems created and deployed by groups in the industry have implemented data partitioning schemes that achieved the desired incremental scalability and the adaptive load balancing. Apparently, some functionality was sacrificed in the process, though (e.g. little support for high-level query languages). We want to look into the architecture of these groundbreaking systems and understand the algorithms and techniques they used -- and then check if a different set of functionality trade-offs exists that would allow a richer support for querying in the mix.
Metis - An Extensible Streaming-Data Querying System
(Project conducted at the IBM T.J. Watson Research Division)
Take me straight to the Metis's talk
A streaming-data querying system is essentially a database system that allows data to be queried which is not statically stored. Rather than acting on tuple sets sitting on disks, queries act on tuple sequences flowing on streams -- on the very moment they are flowing. This near real-time querying capability makes the need to permanently store data optional. Queries in this context do not follow the usual request-response paradigm. They are registered on the system and keep running in a continuous fashion. In Finance, to give a concrete example application, streams of stocks' trades and quotes can be queried so to uncover windows of opportunities as they occur. Other fields currently benefiting from such systems are network monitoring and sensor networks, to cite a few.
Metis is a toolkit to implement streaming-data querying engines. There are two stages in setting up a Metis server. First, a developer defines which streaming operators (e.g., moving aggregates, demultiplexors, etc) he or she wants to work with. Metis provides templates and support classes that make it simple to build these operators. Alternatively, the developer can pick from an existing selection of operators. In the second stage, Metis takes a set of queries defined over the operators and efficiently executes it.
Our goals here are two-fold. We want to develop domain-oriented sets of operators that would make the query's writer job easier. If the users quickly grasp how to query streams, then our chance of deploying Metis in large scale improves. We have started by the financial domain and have been successfully querying, for instance, the NASDAQ feeds. The second goal of Metis is to be able to scale up to a very large number of queries. Keeping up with the speed of streams may not always be possible, especially when dealing with very spikey streams. However, we believe that a combination of multi-query optimization and efficient resource management can greatly raise the threshold above which Metis would need to resort to load shedding to keep up with the speed of streams.
Joint work with Dennis Shasha (New York University)
Any query expressible on a set of records can be expressed on an array of records. The reverse doesn't hold; the notion of order is missing. Queries like ranks (select the first n), n-tiles (select all in the top n-tile), deltas (select last - first in a group), or vector aggregate (moving average) all rely on some ordering on input or on resulting data. And Finance, Biology, and Linguistics domains, for instance, have long taken advantage of such a possibility. Manipulating data as sequences, as opposed to sets, is a critical new feature if we are to incorporate order into Database Systems.
Previous attempts were successful in showing that sequences do "fit" in a relational context. They can be manipulated by extended versions of relational algebra. They can be queried with extensions of SQL. Nevertheless, the approaches taken to do that had led to queries that are somehow difficult to express and thus hard to optimize.
The notion of order is intrinsic to AQuery's data model and query language. Data is represented by "array tables," called arrables, which capture order relationship. Every algebra operator has its impact on order, or absence thereof, clearly defined. The query language is as faithful as possible to SQL, differing only in its array-flavor. As a result, AQuery's optimizer often finds liner-time plans for queries that would otherwise take more complex solutions.