For Apache Parquet files, there is row group size and a file size.\ The row group size is the size of one group of rows in a Parquet file, and each file can have multiple groups. Therefore, an Iceberg table’s default configuration would allow for a 128 MB row group size and a 512 MB file size (four row groups per file).
You’ll always want to make sure these two settings are aligned (i.e., that the row size is evenly divided by the file size).
Fewer row groups results in a smaller file size, as there are fewer groups to have group metadata written for, while more row groups improves predicate pushdown because the row group metadata can have more fine-grained ranges, making it possible for the query engine to eliminate reading more row groups that don’t contain data relevant to the current query. Another example is that you may want to increase the file size to 1 GB per file but keep row groups to 128 MB (eight row groups per file); that way, there are fewer files to open and close. Although the types of queries you’re running often require reading most of the data, you’d prefer fewer row groups since predicate pushdown will not speed up the process of getting all the data. Row group size and file size can both be set as table properties (write.parquet.rowgroup-size-bytes and write.target-file-size-bytes, respectively), but the file size can be set for individual compaction jobs using the options settings.