Data Structure Vs File Structure

Type of Files

Similarity

Representation of Data & Operations for accessing data

Difference

Data Structures deal with data in main memory

File Structures deal with data in secondary storage device and Memory for Internal operations

Type of Files

Physical

Exists on secondary storage (backstore)

Identified by the operating system in the filesystem directory

Logical

The operating system link the logical file with physical file.

A logical file refer to a file inside a program.

Has a name such as myfile.txt

Basic Operations on File

Closing a file

file.close();

Better to close the file manually to prevent any data loss in case the program doesn't terminate normally while doing operations on the file.

Reading from file

Reading a single character

file.read(&character, 1);

Reading a single line

getline(file, line);

Includes

Writing on file

Writing a single character

file << character

file.write(&character, 1)

Writing to an array of characters

file.write(array, 10)

Detecting EOF

eof()

fstream

Writing binary object to file

file.write((char*)&object, sizeof(object))

Reading to binary object

file.read((char*)&object, sizeof(object))

Signature

ostream& write (char* s, streamsize n)

Signature

Basic Operations on File 2

Seek Operation

Peek Operation

file.seekg(byte_offset, origin)

file.seekp(byte_offset, origin)

origin

ios::cur

ios::end

ios::beg

byte_offset can be negative

byte 5 is the sixth character of a string

file.tellg() - pointer of reading

file.tellp() - pointer of writing

return The current byte number of the file’s
read/write position

Move the the current pointer to some byte_offset

Convert String to char*

variable_name.c_str()

get(), put(parameter)

The file is treated as a stream of bytes.

Field and Record Organization

Field Organization

Record Organization

Types

Fixed Length

Length Indicator

Delimiters ,

Key-value

Fixed Length

Variable Length

Delimiters

Keys corresponds to fields, or combination of fields, that may be used in a search

Each field in the record has a fixed length

Easy to Read/Store

Waste space with padding
(storage wise)

Each field is indicated with its length

Easy to seek to the end of field

Long fields require more than 1 byte to store length

pipe | or comma ,

May waste less space than with length-based

Have to check every byte of field against the Delimiter (Performance Wise)

Fields are self-describing

Waste space with
keywords

Delimiters are mandatory

Example: A record consists of a 40 byte in size

Saves space when record sizes varies

Using Delimiters To Separate Records

The common delimiter is the end of line because it makes file readable.

Easy to jump to the i-th record

Waste space with padding (Internal Fragmentation)

Offset, or position, of the nth record of a file cannot be calculated.unless through an index file

File Access

Sequential Access

Direct Access

Consumes O(1) for n records

Sequential Search that consumes O(n)

Appropriate for Pattern matching

Possible when RRN is available.

byte_offset = (RRN) x Record Length

RRN = Record Number - 1

Deletion Strategies

Fragmentation

Types

External Fragmentation

Internal Fragmentation

Space is available but small and cannot be reused.

Wasted bytes within a record cannot be re-used.

Minimize fragmentation can be done by

First-Fit Strategy

Best-fit Strategy

Worst-fit Strategy

Record Deletion and Storage Compaction

Deleting Fixed Length Records

Deleting Variable-Length Records

Mark a record as deleted.

Space is not released

Program must check if record is deleted or not.

A program should squeeze/compress the file, also called Storage Compaction.

Use a linked list of available records (AVAIL LIST).
also List Header

Once you deleted a record, store RRN of the record in the AVAIL LIST

Check if the RRN ??????????

Use the space of deleted records.

Use an AVAIL LIST

byte offset is used, no more RRN.

This byte offset is the beginning of the field deleted.

First large enough record to hold the new record
is chosen.

AVAIL LIST is not sorted by size.

Example

AVAIL LIST: size=10,size=50,size=22,size=60

Record to be added is of size=20

Which record from AVAIL LIST is used for the new record?

The available record with size=50 will be selected.

AVAIL LIST is sorted by size (ascending)

Smallest record large enough to hold new record
is chosen.

Example

AVAIL LIST: size=10,size=22,size=50,size=60

Record to be added: size=20

Which record from AVAIL LIST is used for the new record?

The available record with size=22 will be selected

AVAIL LIST is sorted by size(descending)

First largest record in the list is used for holding the new record

Unused space is placed again in AVAIL LIST

Example

AVAIL LIST: size=60,size=50,size=22,size=10

– Record to be added: size=20

– Which record from AVAIL LIST is used for the new record?

The available record with size=60 will be selected.

The remaining 40 bytes will be returned back to avail list

cause internal fragmentation if we added a record less than the actual (old) record size.

if the space is very small to store any new record, the space called an external fragmentation

??????????

External Fragmentation can be compacted, which means combine

Worst-fit strategy is better than best-fit strategy in terms of External Fragmentation

click to edit

istream& read (char* s, streamsize n);

file >> character;

Opening a file

Opening An existing file or creating a new file

Once a file is opened, the file pointer (eg. .tellg()) is positioned at the beginning of the file

fstream file(filename, filemodes)

alternative syntax: file.open(filename, filemodes)

File Modes

ios::in

ios::out

ios::app: seek to the end of the file before each write.

ios::binary

ios::trunc

Use pipe " | " to apply a combination of modes on the file

Seeking from the beginning is the same as the end.

Both recognize the space and new line.

Example

Uses Index to keep records addresses.

Internal Organization

Fixed Length

Delimited Fields

Use Field Length

Length Indicator

Fixed Number of Fields

Delimited Fields and Length Indicator

Begin the record with its length indicator