Monday Technology Series

What is Apache Sqoop?

Apache Sqoop is a tool designed for transferring bulk data between Apache Hadoop and relational databases. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in Hive or HBase. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. Exports can be used to put data from Hadoop into a relational database. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.

How Sqoop works?

Sqoop provides a pluggable mechanism for optimal connectivity to external systems. The Sqoop extension API provides a convenient framework for building new connectors which can be dropped into Sqoop installations to provide connectivity to various systems. Sqoop itself comes bundled with various connectors that can be used for popular database and data warehousing systems.

Screen Shot 2016-05-02 at 4.37.02 PM.png

For more information, please refer the Apache Project Page or the Hortonworks Page.

Like it? Tweet this!Tweet: What is Apache Sqoop? http://ctt.ec/g5bnh @themarketeng

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s