Q: Q1 What is the process to perform an incremental data load in Sqoop?

Answer : The process to perform incremental data load in Sqoop is to synchronize the modified or updated data (often referred as delta data) from RDBMS to Hadoop. The delta data can be facilitated through the incremental load command in Sqoop. Incremental load can be performed by using Sqoop import command or by loading the data into hive without overwriting it. The different attributes that need to be specified during incremental load in Sqoop are- Mode (incremental) –The mode defines how Sqoop will determine what the new rows are. The mode can have value as Append or Last Modified. Col (Check-column) –This attribute specifies the column that should be examined to find out the rows to be imported. Value (last-value) –This denotes the maximum value of the check column from the previous import operation.

Q: Q2 How Sqoop can be used in a Java program?

Answer : The Sqoop jar in classpath should be included in the java code. After this the method Sqoop.runTool () method must be invoked. The necessary parameters should be created to Sqoop programmatically just like for command line.

Q: Q3 What is the significance of using –compress-codec parameter?

Answer : To get the out file of a sqoop import in formats other than .gz like .bz2 we use the –compress -code parameter.

Q: Q4 How are large objects handled in Sqoop?

Answer : Sqoop provides the capability to store large sized data into a single field based on the type of data. Sqoop supports the ability to store- CLOB ‘s – Character Large Objects BLOB’s –Binary Large Objects Large objects in Sqoop are handled by importing the large objects into a file referred as “LobFile” i.e. Large Object File. The LobFile has the ability to store records of huge size, thus each record in the LobFile is a large object.

Q: Q5 What is a disadvantage of using –direct parameter for faster data load by sqoop?

Answer : The native utilities used by databases to support faster load do not work for binary data formats like Sequence File.

Q: Q6 Is it possible to do an incremental import using Sqoop?

Answer : Yes, Sqoop supports two types of incremental imports- Append Last Modified To insert only rows Append should be used in import command and for inserting the rows and also updating Last-Modified should be used in the import command.

Q: Q7 How can you check all the tables present in a single database using Sqoop?

Answer : The command to check the list of all tables present in a single database using Sqoop is as follows- Sqoop list-tables –connect jdbc: mysql: //localhost/user;

Q: Q8 How can you control the number of mappers used by the sqoop command?

Answer : The Parameter –num-mappers is used to control the number of mappers executed by a sqoop command. We should start with choosing a small number of map tasks and then gradually scale up as choosing high number of mappers initially may slow down the performance on the database side.

Q: Q10 How can we import a subset of rows from a table without using the where clause?

Answer : We can run a filtering query on the database and save the result to a temporary table in database. Then use the sqoop import command without using the –where clause.

Q: Q11 When the source data keeps getting updated frequently, what is the approach to keep it in sync with the data in HDFS imported by sqoop?

Answer : qoop can have 2 approaches. a − To use the –incremental parameter with append option where value of some columns are checked and only in case of modified values the row is imported as a new row. b − To use the –incremental parameter with lastmodified option where a date column in the source is checked for records which have been updated after the last import.

Question 1

Q1 What is the process to perform an incremental data load in Sqoop?

Accepted Answer

Answer: The process to perform incremental data load in Sqoop is to synchronize the modified or updated data (often referred as delta data) from RDBMS to Hadoop. The delta data can be facilitated through the incremental load command in Sqoop.

Incremental load can be performed by using Sqoop import command or by loading the data into hive without overwriting it. The different attributes that need to be specified during incremental load in Sqoop are-

Mode (incremental) –The mode defines how Sqoop will determine what the new rows are. The mode can have value as Append or Last Modified.
Col (Check-column) –This attribute specifies the column that should be examined to find out the rows to be imported.
Value (last-value) –This denotes the maximum value of the check column from the previous import operation.

Question 2

Q2 How Sqoop can be used in a Java program?

Accepted Answer

Answer: The Sqoop jar in classpath should be included in the java code. After this the method Sqoop.runTool () method must be invoked. The necessary parameters should be created to Sqoop programmatically just like for command line.

Question 3

Q3 What is the significance of using –compress-codec parameter?

Accepted Answer

Answer: To get the out file of a sqoop import in formats other than .gz like .bz2 we use the –compress -code parameter.

Question 4

Q4 How are large objects handled in Sqoop?

Accepted Answer

Answer: Sqoop provides the capability to store large sized data into a single field based on the type of data. Sqoop supports the ability to store-

CLOB ‘s – Character Large Objects
BLOB’s –Binary Large Objects

Large objects in Sqoop are handled by importing the large objects into a file referred as “LobFile” i.e. Large Object File. The LobFile has the ability to store records of huge size, thus each record in the LobFile is a large object.

Question 5

Q5 What is a disadvantage of using –direct parameter for faster data load by sqoop?

Accepted Answer

Answer: The native utilities used by databases to support faster load do not work for binary data formats like Sequence File.

Question 6

Q6 Is it possible to do an incremental import using Sqoop?

Accepted Answer

Answer: Yes, Sqoop supports two types of incremental imports-

Append
Last Modified

To insert only rows Append should be used in import command and for inserting the rows and also updating Last-Modified should be used in the import command.

Question 7

Q7 How can you check all the tables present in a single database using Sqoop?

Accepted Answer

Answer: The command to check the list of all tables present in a single database using Sqoop is as follows-

Sqoop list-tables –connect jdbc: mysql: //localhost/user;

Question 8

Q8 How can you control the number of mappers used by the sqoop command?

Accepted Answer

Answer: The Parameter –num-mappers is used to control the number of mappers executed by a sqoop command. We should start with choosing a small number of map tasks and then gradually scale up as choosing high number of mappers initially may slow down the performance on the database side.

Question 9

Q9 What is the standard location or path for Hadoop Sqoop scripts?

Accepted Answer

Answer: /usr/bin/Hadoop Sqoop.

Question 10

Q10 How can we import a subset of rows from a table without using the where clause?

Accepted Answer

Answer: We can run a filtering query on the database and save the result to a temporary table in database.

Then use the sqoop import command without using the –where clause.

Question 11

Q11 When the source data keeps getting updated frequently, what is the approach to keep it in sync with the data in HDFS imported by sqoop?

Accepted Answer

Answer: qoop can have 2 approaches.

a − To use the –incremental parameter with append option where value of some columns are checked and only in case of modified values the row is imported as a new row.

b − To use the –incremental parameter with lastmodified option where a date column in the source is checked for records which have been updated after the last import.

Question 12

Q12 What is a sqoop metastore?

Accepted Answer

Answer: It is a tool using which Sqoop hosts a shared metadata repository. Multiple users and/or remote users can define and execute saved jobs (created with sqoop job) defined in this metastore.

Clients must be configured to connect to the metastore in sqoop-site.xml or with the –meta-connect argument.

Question 13

Q13 Can free form SQL queries be used with Sqoop import command? If yes, then how can they be used?

Accepted Answer

Answer: Sqoop allows us to use free form SQL queries with the import command. The import command should be used with the –e and – query options to execute free form SQL queries. When using the –e and –query options with the import command the –target dir value must be specified.

Question 14

Q14 Tell few import control commands:

Accepted Answer

Answer: –Append

–Columns

–Where

These command are most frequently used to import RDBMS Data.

Question 15

Q15 Can free form SQL queries be used with Sqoop import command? If yes, then how can they be used?

Accepted Answer

Answer: Sqoop allows us to use free form SQL queries with the import command. The import command should be used with the –e and – query options to execute free form SQL queries. When using the –e and –query options with the import command the –target dir value must be specified.

Question 16

Q16 How can you see the list of stored jobs in sqoop metastore?

Accepted Answer

Answer: sqoop job –list

Question 17

Q17 What type of databases Sqoop can support?

Accepted Answer

Answer: MySQL, Oracle, PostgreSQL, IBM, Netezza and Teradata. Every database connects through jdbc driver.

Question 18

Q18 What is the purpose of sqoop-merge?

Accepted Answer

Answer: The merge tool combines two datasets where entries in one dataset should overwrite entries of an older dataset preserving only the newest version of the records between both the data sets.

Question 19

Q19 HOw sqoop can handle large objects?

Accepted Answer

Answer: Blog and Clob columns are common large objects. If the object is less than 16MB, it stored inline with the rest of the data. If large objects, temporary stored in_lob subdirectory. Those lobs processes in a streaming fashion. Those data materialized in memory for processing. IT you set LOB to 0, those lobs objects placed in external storage.

Question 20

Q20 What is the importance of eval tool?

Accepted Answer

Answer: It allows user to run sample SQL queries against Database and preview the results on the console. It can help to know what data can import? The desired data imported or not?

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Top Sqoop Interview Questions

Headquarters

follow us

Quick Links

resources

About Us

Newsletter