Converting a CSV file to parquet and querying it with Apache Drill


Martin's Developer World

The Apache Drill project provides SQL-like access to different kinds of data stores. The supported data stores span relational as well as NoSQL databases and the file system. Hence you can query data from HBase, MongoDB, HDFS and the local file system without the need to convert the data before. Even joins between the different formats are possible. Internally Apache Drill prepares the potentially nested data (from for example JSON files) in a columnar representation as described in Google’s Dremel paper. This columnar data structures allow queries that only select a subset of the available columns to perform much faster, as only these selected columns have to be read from the data structure. In contrast to traditional relational databases the whole row with data does not have to be loaded from disc.

The data structure described in Google’s Dremel paper is also available as file format called parquet and…

View original post 619 more words

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s