DataRoket’s Hadoop functionality and connector are delivered as two different connectors;
ADS File Stores
The Hadoop connector delivers the ability to access distributed Hadoop Files systems by HDFS access. Both uncompressed and map reduce files can be accessed, queried, and rendered to XML, real time LZO compression (used within Hadoop’s wrapper to the Oburhumer C programming libraries).
- Extract Hadoop File system or HDFS data files in uncompressed or Map Reduced state.
- Associate and query Hadoop data directly by DataRoket’s distributed Engines (data byte correlation and in memory real time compression with cell pointer system.
- Sort, validate, and partition query data back into Hadoop map reduce, direct LZO compressed XML in block partitions (ie: 2-25 million Record set chunks.)
Persisted or direct in memory resident Data sets.
Ability to correlate/associate Hadoop data with any data source connector DR has available (handled at both string and byte key level)
Custom connectors such as unstructured Data Algorithms have also been developed for past customers using the DataRoket ADS ( Analytical data store) Hadoop approach.
Screensho showing DataRoket feeding Orcale data into Hive.
For example the following are two USE cases we have delivered:
Wachovia Mainframe derived records existing in flat file structure. DR developed a connector which looks up Composite key mappings from SQL databases which then are correlated with Wachovia flat file records based on data algorithms against relationship mapping rules.
PeopleSoft General Ledgers files, mixed metadata content, variable columns throughout the files. DataRoket delivered the architecture and data algorithms which were able to read 70+ PeopleSoft general ledgers, totaling in capacity 4.6 GBs, read and sort the data amongst DR distributed engines within 7 minutes, total of 100 million+ records.
DataRoket parallel processed the data files and then based on metadata lookup associated/correlated the datasets in columnar fashion delivering LZO compressed and order datasets through Hadoop JNI wrapper.
Screenshot of Hive as a data source showing table.
Delivered as QlikView Data Pipe for seamless access through QlikView.