Please enable JavaScript.
Coggle requires JavaScript to display documents.
ICFF, Requirements, Advantages, Limitations, In general the data is not
…
ICFF
-
-
ICFFs can be used efficiently in any situation where you need to handle a large lookup dataset in an incremental fashion, rather than loading all of it into memory at one time
-
Disk requirements - 10 times less
Memory requirements
Operations during processing - ICFFs allow you to append successive generations of new information without any pause in processing
Performance
Volume of data - Feasible to take hundreds of terabytes
Presorting and compressing
Locating, retrieving, and uncompressing - Your graph must call the operating system to locate the lookup data file on disk, copy the requested block from disk to memory, and then uncompress the retrieved block
Consolidating and reindexing
Index file - Consist of
- The disk offset needed to access each data block
- The first key value stored in each data block
Generation - Each chunk of added update data is called a generationEach new generation consists of blocks, just like the original data, and has its own index, which is simply concatenated to the original index. The concatenated indexes together form, in effect, a single index organized by generation
Use BLOCK-COMPRESSED LOOKUP TEMPLATE component to read ICFF in graph
keep_on_disk is True.
block_compressed is True.
Screening bitmap is an approximate summary of the key values present in each ICFF or each generation
It allows a graph to skip files and generations that do not have the key value, and thus improves the odds of finding the key value in the remaining files and generations
Range lookup operation
eX - lookup_range(lookup_id, my_icff, apple, coconut)
following function accesses the ICFF named my_icff with the lookup identifier given by lookup_id and retrieves from it the first record whose key value falls in the range apple–coconut
-
-
Secondary ICFFs and secondary indexes
Allow you to search a dataset by a key different from the key originally used to sort and store the data.
data file requires only two record fields — the secondary and primary keys. Storing this information requires much less disk space than would a full copy of the dataset
Requires both secondary indexes and primary index or surrogate keys
Direct-addressed ICFFsRequires secondary indexes but no primary index or surrogate keys
-
-
-
-
-