Druid dating site
Once we have completed the optimization, the (sub)plan of operators that needs to be executed by Druid is translated into a valid Druid JSON query, and passed as a property to the Hive physical Table Scan operator.
The Druid query will be executed within the Table Scan operator, which will generate the records out of the Druid query results.
We generate a single Hive split with the corresponding Druid query for query to Druid, which basically will read all the segments from Druid, generate records, and then execute the rest of Hive operations on those records.
This is also the approach that will be followed if the cost optimizer is disabled ().
Completing this work will bring benefits to the Druid and Hive systems alike: The initial implementation, started in HIVE-14217, focused on 1) enabling the discovery of data that is already stored in Druid from Hive, and 2) being able to query that data, trying to make use of Druid advanced querying capabilities.
For instance, we put special emphasis on pushing as much computation as possible to Druid, and being able to recognize the type of queries for which Druid is specially efficient, e.g. Future work after the first step is completed is being listed in HIVE-14473.
If first argument is an object, then just use it as Dimension Spec.
If not depending on arguments length creates default or extraction dimension spec.
In these cases, we end up with a simple plan consisting of a Table Scan and a Fetch operator on top.
In Druid, the timestamp column plays a central role.
In fact, Druid allows to filter on the time dimension using the property for all those queries.
Druid is an open-source analytics data store designed for business intelligence (OLAP) queries on event data.
Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation.