Q: Q1 How will you explain co group in Pig?

Answer : COGROUP is found in Pig that works in several tuples. The operator can also be applied on several statements which contain or have a few relations at least a hundred and twenty seven relations at every time. When you are making use of the operator on tables, then Pig will immediately book two tables and post that it will join two of the tables on some of the columns that are grouped.

Q: Q2 what is pig?

Answer : Pig is a Apache open source project which is run on hadoop,provides engine for data flow in parallel on hadoop.It includes language called pig latin,which is for expressing these data flow.It includes different operations like joins,sort,filter ..etc and also ability to write UserDefine Functions(UDF) for proceesing and reaing and writing.pig uses both HDFS and MapReduce i,e storing and processing.

Q: Q3 What is BloomMapFile used for?

Answer : The BloomMapFile is a class that extends MapFile. So its functionality is similar to MapFile. BloomMapFile uses dynamic Bloom filters to provide quick membership test for the keys. It is used in Hbase table format.

Q: Q4 what is difference between pig and sql?

Answer : Pig latin is procedural version of SQl.pig has certainly similarities,more difference from sql.sql is a query language for user asking question in query form.sql makes answer for given but dont tell how to answer the given question.suppose ,if user want to do multiple operations on tables,we have write multiple queries and also use temporary table for storing,sql is support for subqueries but intermediate we have to use temporary tables,SQL users find subqueries confusing and difficult to form properly.using sub-queries creates an inside-out design where the first step in the data pipeline is the innermost query .pig is designed with a long series of data operations in mind, so there is no need to write the data pipeline in an inverted set of subqueries or to worry about storing data in temporary tables.

Q: Q5 What is the difference between logical and physical plans?

Answer : Pig undergoes some steps when a Pig Latin Script is converted into MapReduce jobs. After performing the basic parsing and semantic checking, it produces a logical plan. The logical plan describes the logical operators that have to be executed by Pig during execution. After this, Pig produces a physical plan. The physical plan describes the physical operators that are needed to execute the script.

Q: Q6 Does ‘ILLUSTRATE’ run MR job?

Answer : No, illustrate will not pull any MR, it will pull the internal data. On the console, illustrate will not do any job. It just shows output of each stage and not the final output.

Q: Q7 How Pig differs from MapReduce

Answer : In mapreduce,groupby operation performed at reducer side and filter,projection can be implemented in the map phase.pig latin also provides standard-operation similar to mapreduce like orderby and filters,group by..etc.we can analyze pig script and know data flows ans also early to find the error checking.pig Latin is much lower cost to write and maintain than Java code for MapReduce.

Q: Q8 Is the keyword ‘DEFINE’ like a function name?

Answer : Yes, the keyword ‘DEFINE’ is like a function name. Once you have registered, you have to define it. Whatever logic you have written in Java program, you have an exported jar and also a jar registered by you. Now the compiler will check the function in exported jar. When the function is not present in the library, it looks into your jar.

Q: Q9 Is the keyword ‘FUNCTIONAL’ a User Defined Function (UDF)?

Answer : No, the keyword ‘FUNCTIONAL’ is not a User Defined Function (UDF). While using UDF, we have to override some functions. Certainly you have to do your job with the help of these functions only. But the keyword ‘FUNCTIONAL’ is a built-in function i.e a pre-defined function, therefore it does not work as a UDF.

Q: Q10 How is Pig Useful For?

Answer : In three categories,we can use pig .they are 1)ETL data pipline 2)Research on raw data 3)Iterative processing Most common usecase for pig is data pipeline.Let us take one example, web based compaines gets the weblogs,so before storing data into warehouse,they do some operations on data like cleaning and aggregation operations..etc.i,e transformations on data.

Question 1

Q1 How will you explain co group in Pig?

Accepted Answer

Answer: COGROUP is found in Pig that works in several tuples. The operator can also be applied on several statements which contain or have a few relations at least a hundred and twenty seven relations at every time. When you are making use of the operator on tables, then Pig will immediately book two tables and post that it will join two of the tables on some of the columns that are grouped.

Question 2

Q2 what is pig?

Accepted Answer

Answer: Pig is a Apache open source project which is run on hadoop,provides engine for data flow in parallel on hadoop.It includes language called pig latin,which is for expressing these data flow.It includes different operations like joins,sort,filter ..etc and also ability to write UserDefine Functions(UDF) for proceesing and reaing and writing.pig uses both HDFS and MapReduce i,e storing and processing.

Question 3

Q3 What is BloomMapFile used for?

Accepted Answer

Answer: The BloomMapFile is a class that extends MapFile. So its functionality is similar to MapFile.

BloomMapFile uses dynamic Bloom filters to provide quick membership test for the keys. It is used in Hbase table format.

Question 4

Q4 what is difference between pig and sql?

Accepted Answer

Answer: Pig latin is procedural version of SQl.pig has certainly similarities,more difference from sql.sql is a query language for user asking question in query form.sql makes answer for given but dont tell how to answer the given question.suppose ,if user want to do multiple operations on tables,we have write multiple queries and also use temporary table for storing,sql is support for subqueries but intermediate we have to use temporary tables,SQL users find subqueries confusing and difficult to form properly.using sub-queries creates an inside-out design where the first step in the data pipeline is the innermost query .pig is designed with a long series of data operations in mind, so there is no need to write the data pipeline in an inverted set of subqueries or to worry about storing data in temporary tables.

Question 5

Q5 What is the difference between logical and physical plans?

Accepted Answer

Answer: Pig undergoes some steps when a Pig Latin Script is converted into MapReduce jobs. After performing the basic parsing and semantic checking, it produces a logical plan. The logical plan describes the logical operators that have to be executed by Pig during execution. After this, Pig produces a physical plan. The physical plan describes the physical operators that are needed to execute the script.

Question 6

Q6 Does ‘ILLUSTRATE’ run MR job?

Accepted Answer

Answer: No, illustrate will not pull any MR, it will pull the internal data. On the console, illustrate will not do any job. It just shows output of each stage and not the final output.

Question 7

Q7 How Pig differs from MapReduce

Accepted Answer

Answer: In mapreduce,groupby operation performed at reducer side and filter,projection can be implemented in the map phase.pig latin also provides standard-operation similar to mapreduce like orderby and filters,group by..etc.we can analyze pig script and know data flows ans also early to find the error checking.pig Latin is much lower cost to write and maintain than Java code for MapReduce.

Question 8

Q8 Is the keyword ‘DEFINE’ like a function name?

Accepted Answer

Answer: Yes, the keyword ‘DEFINE’ is like a function name. Once you have registered, you have to define it. Whatever logic you have written in Java program, you have an exported jar and also a jar registered by you. Now the compiler will check the function in exported jar. When the function is not present in the library, it looks into your jar.

Question 9

Q9 Is the keyword ‘FUNCTIONAL’ a User Defined Function (UDF)?

Accepted Answer

Answer: No, the keyword ‘FUNCTIONAL’ is not a User Defined Function (UDF). While using UDF, we have to override some functions. Certainly you have to do your job with the help of these functions only. But the keyword ‘FUNCTIONAL’ is a built-in function i.e a pre-defined function, therefore it does not work as a UDF.

Question 10

Q10 How is Pig Useful For?

Accepted Answer

Answer: In three categories,we can use pig .they are 1)ETL data pipline 2)Research on raw data 3)Iterative processing

Most common usecase for pig is data pipeline.Let us take one example, web based compaines gets the weblogs,so before storing data into warehouse,they do some operations on data like cleaning and aggregation operations..etc.i,e transformations on data.

Question 11

Q11 Why do we need MapReduce during Pig programming?

Accepted Answer

Answer: Pig is a high-level platform that makes many Hadoop data analysis issues easier to execute. The language we use for this platform is: Pig Latin. A program written in Pig Latin is like a query written in SQL, where we need an execution engine to execute the query. So, when a program is written in Pig Latin, Pig compiler will convert the program into MapReduce jobs. Here, MapReduce acts as the execution engine.

Question 12

Q12 What are the scalar datatypes in pig?

Accepted Answer

Answer: scalar datatype

int -4bytes,
float -4bytes,
double -8bytes,
long -8bytes,
chararray,
bytearray

Question 13

Q13 What are the different execution mode available in Pig?

Accepted Answer

Answer: There are 3 modes of execution available in pig

Interactive Mode (Also known as Grunt Mode)
Batch Mode
Embedded Mode

Question 14

Q14 Are there any problems which can only be solved by MapReduce and cannot be solved by PIG? In which kind of scenarios MR jobs will be more useful than PIG?

Accepted Answer

Answer: Let us take a scenario where we want to count the population in two cities. I have a data set and sensor list of different cities. I want to count the population by using one mapreduce for two cities. Let us assume that one is Bangalore and the other is Noida. So I need to consider key of Bangalore city similar to Noida through which I can bring the population data of these two cities to one reducer. The idea behind this is some how I have to instruct map reducer program – whenever you find city with the name ‘Bangalore‘ and city with the name ‘Noida’, you create the alias name which will be the common name for these two cities so that you create a common key for both the cities and it get passed to the same reducer. For this, we have to write custom partitioner.

In mapreduce when you create a ‘key’ for city, you have to consider ’city’ as the key. So, whenever the framework comes across a different city, it considers it as a different key. Hence, we need to use customized partitioner. There is a provision in mapreduce only, where you can write your custom partitioner and mention if city = bangalore or noida then pass similar hashcode. However, we cannot create custom partitioner in Pig. As Pig is not a framework, we cannot direct execution engine to customize the partitioner. In such scenarios, MapReduce works better than Pig.

Question 15

Q15 Whether pig latin language is  case-sensitive or not?

Accepted Answer

Answer: pig latin is sometimes not a case sensitive.let us see example,Load is equivalent to load.

A=load ‘b’ is not equivalent to a=load ‘b’

UDF are also case sensitive,count is not equivalent to COUNT.

Question 16

Q16 What is the purpose of ‘dump’ keyword in pig?

Accepted Answer

Answer: dump display the output on the screen

dump ‘processed’

Question 17

Q17 Does Pig give any warning when there is a type mismatch or missing field?

Accepted Answer

Answer: No, Pig will not show any warning if there is no matching field or a mismatch. If you assume that Pig gives such a warning, then it is difficult to find in log file. If any mismatch is found, it assumes a null value in Pig.

Question 18

Q18 What is grunt shell?

Accepted Answer

Answer: Pig interactive shell is known as Grunt Shell. It provides a shell for users to interact with HDFS.

Question 19

Q19 What co-group does in Pig?

Accepted Answer

Answer: Co-group joins the data set by grouping one particular data set only. It groups the elements by their common field and then returns a set of records containing two separate bags. The first bag consists of the record of the first data set with the common data set and the second bag consists of the records of the second data set with the common data set.

Question 20

Q20 what are relational operations in pig latin?

Accepted Answer

Answer: they are

for each
order by
filters
group
distinct
join
limit

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Top Pig Interview Questions

Headquarters

follow us

Quick Links

resources

About Us

Newsletter