Data Structure Vs File Structure
Similarity
Representation of Data & Operations for accessing data
Difference
Data Structures deal with data in main memory
File Structures deal with data in secondary storage device and Memory for Internal operations
Type of Files
Physical
Exists on secondary storage (backstore)
Identified by the operating system in the filesystem directory
Logical
The operating system link the logical file with physical file.
A logical file refer to a file inside a program.
Has a name such as myfile.txt
Basic Operations on File
Closing a file
file.close();
Better to close the file manually to prevent any data loss in case the program doesn't terminate normally while doing operations on the file.
Reading from file
Reading a single character
file.read(&character, 1);
Reading a single line
getline(file, line);
Includes
Writing on file
Writing a single character
file << character
file.write(&character, 1)
Writing to an array of characters
file.write(array, 10)
Detecting EOF
eof()
fstream
Writing binary object to file
file.write((char*)&object, sizeof(object))
Reading to binary object
file.read((char*)&object, sizeof(object))
Signature
ostream& write (char* s, streamsize n)
Signature
Basic Operations on File 2
Seek Operation
Peek Operation
file.seekg(byte_offset, origin)
file.seekp(byte_offset, origin)
origin
ios::cur
ios::end
ios::beg
byte_offset can be negative
byte 5 is the sixth character of a string
file.tellg() - pointer of reading
file.tellp() - pointer of writing
return The current byte number of the file’s
read/write position
Move the the current pointer to some byte_offset
Convert String to char*
variable_name.c_str()
get(), put(parameter)
The file is treated as a stream of bytes.
Field and Record Organization
Field Organization
Record Organization
Types
Types
Fixed Length
Length Indicator
Delimiters ,
Key-value
Fixed Length
Variable Length
Delimiters
Keys corresponds to fields, or combination of fields, that may be used in a search
Each field in the record has a fixed length
Easy to Read/Store
Waste space with padding
(storage wise)
Each field is indicated with its length
Easy to seek to the end of field
Long fields require more than 1 byte to store length
pipe | or comma ,
May waste less space than with length-based
Have to check every byte of field against the Delimiter (Performance Wise)
Fields are self-describing
Waste space with
keywords
Delimiters are mandatory
Example: A record consists of a 40 byte in size
Saves space when record sizes varies
Using Delimiters To Separate Records
The common delimiter is the end of line because it makes file readable.
Easy to jump to the i-th record
Waste space with padding (Internal Fragmentation)
Offset, or position, of the nth record of a file cannot be calculated.unless through an index file
File Access
Sequential Access
Direct Access
Consumes O(1) for n records
Sequential Search that consumes O(n)
Appropriate for Pattern matching
Possible when RRN is available.
byte_offset = (RRN) x Record Length
RRN = Record Number - 1
Deletion Strategies
Fragmentation
Types
External Fragmentation
Internal Fragmentation
Space is available but small and cannot be reused.
Wasted bytes within a record cannot be re-used.
Minimize fragmentation can be done by
First-Fit Strategy
Best-fit Strategy
Worst-fit Strategy
Record Deletion and Storage Compaction
Deleting Fixed Length Records
Deleting Variable-Length Records
Mark a record as deleted.
Space is not released
Program must check if record is deleted or not.
A program should squeeze/compress the file, also called Storage Compaction.
Use a linked list of available records (AVAIL LIST).
also List Header
Once you deleted a record, store RRN of the record in the AVAIL LIST
Check if the RRN ??????????
Use the space of deleted records.
Use an AVAIL LIST
byte offset is used, no more RRN.
This byte offset is the beginning of the field deleted.
First large enough record to hold the new record
is chosen.
AVAIL LIST is not sorted by size.
Example
AVAIL LIST: size=10,size=50,size=22,size=60
Record to be added is of size=20
Which record from AVAIL LIST is used for the new record?
The available record with size=50 will be selected.
AVAIL LIST is sorted by size (ascending)
Smallest record large enough to hold new record
is chosen.
Example
AVAIL LIST: size=10,size=22,size=50,size=60
Record to be added: size=20
Which record from AVAIL LIST is used for the new record?
The available record with size=22 will be selected
AVAIL LIST is sorted by size(descending)
First largest record in the list is used for holding the new record
Unused space is placed again in AVAIL LIST
Example
AVAIL LIST: size=60,size=50,size=22,size=10
– Record to be added: size=20
– Which record from AVAIL LIST is used for the new record?
- The available record with size=60 will be selected.
- The remaining 40 bytes will be returned back to avail list
cause internal fragmentation if we added a record less than the actual (old) record size.
if the space is very small to store any new record, the space called an external fragmentation
??????????
External Fragmentation can be compacted, which means combine
Worst-fit strategy is better than best-fit strategy in terms of External Fragmentation
click to edit
istream& read (char* s, streamsize n);
file >> character;
Opening a file
Opening An existing file or creating a new file
Once a file is opened, the file pointer (eg. .tellg()) is positioned at the beginning of the file
fstream file(filename, filemodes)
alternative syntax: file.open(filename, filemodes)
File Modes
ios::in
ios::out
ios::app: seek to the end of the file before each write.
ios::binary
ios::trunc
Use pipe " | " to apply a combination of modes on the file
Seeking from the beginning is the same as the end.
Both recognize the space and new line.
Example
Uses Index to keep records addresses.
Internal Organization
- Fixed Length
- Delimited Fields
- Use Field Length
- Length Indicator
Fixed Number of Fields
Delimited Fields and Length Indicator
Begin the record with its length indicator