Q: Q1 What is Hive ?

Answer: Hive is a data warehouse software which is used for facilitates querying and managing large data sets residing in distributed storage.Hive language almost look like SQL language called HiveQL.Hive also allows traditional map reduce programs to customize mappers and reducers when it is inconvenient or inefficient to execute the logic in HiveQL (User Defined Functions UDFS).

Q: Q2 How Facebook Uses Hadoop,Hive and Hbase ?

Answer: Facebook data stored on HDFS,everyday millions of photos uploaded into facebook with the help of Hadoop Facebook Messages,Likes and statues updates running on top of Hbase Hive to generate reports for third-party developers and advertisers who need to track the success of their applications or campaigns.

Q: Q3 What is the difference between HBase and Hive?

Answer: Both hive and hbase can be used in different technologies that are based on Hadoop. Hive happens to be a infrastructure warehouse of data which is used on Hadoop whereas HBase is NoSQL. The key value stores which run on Hadoop themselves. Hive will also help those who know about SQL run a few jobs in MapReduce when Hbase will also support 4 of the operations such as put, get, scan and delete. The Hbase happens to be good for querying for data but Hive on the other hand is good for querying data is analytical and is collected over a while.

Q: Q4 What is Hive Metastore ?

Answer: Hive Meta store is a database that stores metadata of your hive tables like table name,column name,data types,table location,number of buckets in the table etc.

Q: Q5 Hive new version supported Hadoop Versions ?

Answer: This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y

Q: Q7 Wherever (Different Directory) I run hive query, it creates new metastore_db, please explain the reason for it?

Answer: Whenever you run the hive in embedded mode, it creates the local metastore. And before creating the metastore it looks whether metastore already exist or not. This property is defined in configuration file hive – site.xml. Property is “javax.jdo.option.ConnectionURL” with default value “jdbc:derby:;databaseName=metastore_db;create=true”. So to change the behavior change the location to absolute path, so metastore will be used from that location.

Q: Q8 Is it possible to use same metastore by multiple users, in case of embedded hive?

Answer: No, it is not possible to use metastore in sharing mode. It is recommended to use standalone “real” database like MySQL or PostGresSQL.

Q: Q9 What is the functionality of Query Processor in Apached Hive ?

Answer: This component implements the processing framework for converting SQL to a graph of map/reduce jobs and the execution time framework to run those jobs in the order of dependencies.

Q: Q11 What is the functionality of Query Processor in Apache Hive?

Answer: This components implements the processing framework for converting SQL to graph of map/reduce jobs and the execution time framework to run those jobs in the order od dependencies.

Q: Q12 what is a Hive Metastore?

Answer: Hive Metastore is a central repository that stores metadata in external database.

Question 1

Q1 What is Hive ?

Accepted Answer

Answer:

Hive is a data warehouse software which is used for facilitates querying and managing large data sets residing in distributed storage.Hive language almost look like SQL language called HiveQL.Hive also allows traditional map reduce programs to customize mappers and reducers when it is inconvenient or inefficient to execute the logic in HiveQL (User Defined Functions UDFS).

Question 2

Q2 How Facebook Uses Hadoop,Hive and Hbase ?

Accepted Answer

Answer:

Facebook data stored on HDFS,everyday millions of photos uploaded into facebook with the help of Hadoop Facebook Messages,Likes and statues updates running on top of Hbase Hive to generate reports for third-party developers and advertisers who need to track the success of their applications or campaigns.

Question 3

Q3 What is the difference between HBase and Hive?

Accepted Answer

Answer:

Both hive and hbase can be used in different technologies that are based on Hadoop. Hive happens to be a infrastructure warehouse of data which is used on Hadoop whereas HBase is NoSQL. The key value stores which run on Hadoop themselves. Hive will also help those who know about SQL run a few jobs in MapReduce when Hbase will also support 4 of the operations such as put, get, scan and delete. The Hbase happens to be good for querying for data but Hive on the other hand is good for querying data is analytical and is collected over a while.

Question 4

Q4 What is Hive Metastore ?

Accepted Answer

Answer:

Hive Meta store is a database that stores metadata of your hive tables like table name,column name,data types,table location,number of buckets in the table etc.

Question 5

Q5 Hive new version supported Hadoop Versions ?

Accepted Answer

Answer:

This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y

Question 6

Q6 Which companies are mostly using Hive ?

Accepted Answer

Answer:

Facebook,Netflix

Question 7

Q7 Wherever (Different Directory) I run hive query, it creates new metastore_db, please explain the reason for it?

Accepted Answer

Answer:

Whenever you run the hive in embedded mode, it creates the local metastore. And before creating the metastore it looks whether metastore already exist or not. This property is defined in configuration file hive – site.xml. Property is “javax.jdo.option.ConnectionURL” with default value “jdbc:derby:;databaseName=metastore_db;create=true”. So to change the behavior change the location to absolute path, so metastore will be used from that location.

Question 8

Q8 Is it possible to use same metastore by multiple users, in case of embedded hive?

Accepted Answer

Answer:

No, it is not possible to use metastore in sharing mode. It is recommended to use standalone “real” database like MySQL or PostGresSQL.

Question 9

Q9 What is the functionality of Query Processor in Apached Hive ?

Accepted Answer

Answer:

This component implements the processing framework for converting SQL to a graph of map/reduce jobs and the execution time framework to run those jobs in the order of dependencies.

Question 10

Q10 Is multi line comment supported in HIVE Script?

Accepted Answer

Answer:

NO

Question 11

Q11 What is the functionality of Query Processor in Apache Hive?

Accepted Answer

Answer:

This components implements the processing framework for converting SQL to graph of map/reduce jobs and the execution time framework to run those jobs in the order od dependencies.

Question 12

Q12 what is a Hive Metastore?

Accepted Answer

Answer:

Hive Metastore is a central repository that stores metadata in external database.

Question 13

Q13 Explain about the SMB Join in Hive.

Accepted Answer

Answer:

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

Question 14

Q14 What is ObjectInspector functionality?

Accepted Answer

Answer:

ObjectInspector is used to analyze the structure of individual columns and the internal structure of the row objects. ObjectInspector in Hive provides access to complex objects which can be stored in multiple formats.

Question 15

Q15 Is it possible to use same metastore by multiple users, in case of embedded hive?

Accepted Answer

Answer:

No, it is not possible to use metastore in sharing mode. It is recomended to use standalone “real” database like MySQL or PostGreSQL.

Question 16

Q16 Explain about the different types of join in Hive.

Accepted Answer

Answer:

HiveQL has 4 different types of joins – JOIN- Similar to Outer Join in SQL

FULL OUTER JOIN – Combines the records of both the left and right outer tables that fulfil the join condition.
LEFT OUTER JOIN- All the rows from the left table are returned even if there are no matches in the right table.
RIGHT OUTER JOIN-All the rows from the right table are returned even if there are no matches in the left table.

Question 17

Q17 Is it possible to change the default location of Managed Tables in Hive, if so how?

Accepted Answer

Answer:

Yes, we can change the default location of Managed tables using the LOCATION keyword while creating the managed table. The user has to specify the storage path of the managed table as the value to the LOCATION keyword.

Question 18

Q18 How can you connect an application, if you run Hive as a server?

Accepted Answer

Answer:

When running Hive as a server, the application can be connected in one of the 3 ways-

ODBC Driver-This supports the ODBC protocol
JDBC Driver- This supports the JDBC protocol
Thrift Client- This client can be used to make calls to all hive commands using different programming language like PHP, Python, Java, C++ and Ruby.

Question 19

Q19 Which classes are used by the Hive to Read and Write HDFS Files

Accepted Answer

Answer:

Following classes are used by Hive to read and write HDFS files:

TextInputFormat/HiveIgnoreKeyTextOutputFormat: These 2 classes read/write data in plain text file format.
SequenceFileInputFormat/SequenceFileOutputFormat: These 2 classes read/write data in hadoop SequenceFile format.

Question 20

Q20 What are the types of tables in Hive?

Accepted Answer

Answer:

There are two types of tables.

Managed tables.
External tables.

Only the drop table command differentiates managed and external tables. Otherwise, both type of tables are very similar.

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Top Hive Interview Questions

Headquarters

follow us

Quick Links

resources

About Us

Newsletter