Please enable JavaScript.
Coggle requires JavaScript to display documents.
APACHE PIG (Common operators (JOINING (new_relation = JOIN relation_1 BY…
APACHE PIG
Common operators
Load and store
new_relation = LOAD 'path_to_file_name' USING PigStorage(delimiter) AS (field_name: dtype of field_name);
-
-
-
Diagnostic Operators
-
-
EXPLAIN relation: explanation of the logical, physical and MapReduce plan of a relation
-
Grouping
new_ralation = GROUP old_relation BY (field_1, field_2, ...);
GROUP operators generate relation of format (key_to_group, bag_of_tuple_of_the_original_tuple_from_relation)
new_relation = COGROUP rel_1 BY (field_1, field_2, ..), rel_2 BY (field_1, field_2, ....)
COGROUP operators generate the formatted relation (key_to_group, bag_1_of_relation_1, bag_2_of_relation_2)
JOINING
new_relation = JOIN relation_1 BY (field_1, field_2), relation_2 BY (key_1, key_2, ...);
-
JOIN relation_1 BY .... LEFT OUTER, relation_2 BY ....;
JOIN relation_1 BY ... RIGHT OUTER, relation_2 BY ...;
JOIN relation_1 BY ... FULL OUTER, relation_2 BY ..... ;
UNION and SPLIT
new_relation = UNION relation_1, relation_2, ...;
-
SPLIT old_relation INTO new_relation_1 IF condition_1 AND ... , new_relation_2 IF conditions_2
AND, OR, NOT, IS NULL, IS NOT NULL for filtering, combining condition, ....
filter
-
-
new_relation = FOREACH old_relation GENERATE field_1, field_2, field_3, ... ;
new_relation = ORDER old_relation BY field_1, field_2, ... (ASC|DES);
-
-
Statements
statements work with relation, the outmost data model of Pig
-
-
except LOAD and STORE, every statement take relations as input and outputs RELATIONS
-
-
-
Pig Data Model
STORED as STRING, a number can be used as String or Number
int, float, chararray, charbyte, long
double, Boolean, Datetime, Biginteger, Bigdecimal
-
Bag
-
denoted by { (...), (...), (...), ..}
-
Map
denoted by [string_key#value, string_key#value....]
-
-
-
-
-
Why NEED?
-
-
provide some data types such as tuples, maps which are unavailable in Hadoop