Please enable JavaScript.
Coggle requires JavaScript to display documents.
APACHE PIG_02 (LOAD (PigStorage (LOADING && STORING structured…
APACHE PIG_02
LOAD
-
function: one of PigStorage, BinStorage, JsonLoader, TextLoader
-
-
-
-
-
LIMIT, DISTINCT, CROSS, UNION, SPLIT
-
CROSS alias_1, alias_2, ..., alias_n;
-
UNION alias_1, alias_2, ..., alias_n;
-
-
UNION ONSCHEMA alias_1, alias_2, ..., alias_n;
ONSCHEMA is used to match fields name, not on position.
SPLIT
SPLIT relation INTO alias_1 IF expression_1, alias_2 IF expression_2, etc;
-
-
GROUP/COGROUP
-
-
output syntax like TUPLE(group, BAG{........})
group could be single value, tuple of values, or ALL
-
using COGROUP, output would contain multiple bags, each is for a relation. Could return empty bag
JOIN (INNER)
JOIN relation1 BY column1, relation2 BY column2...;
-
-
-
FLATTEN
-
-
for BAG
(value, BAG) like this could create cross products
-
-
STRING FUNCTIONS
SUBSTRING(string, startIndex, stopIndex)
REPLACE(string, regExp, replaceChars)
JOIN (OUTER)
-
JOIN relation1 BY ... [LEFT|RIGHT|FULL], relation2 BY ... ;
-
-
-