Context
- The C language did not build the input/output facilities into the language. In other words, there is no keyword like read or write. Instead, it left the IO to the compiler as external library functions (such as
printf
andscanf
instdio
library). The ANSI C standard formalized these IO functions into Standard IO package (stdio.h). - C++ continues this approach and formalizes IO in libraries such as
iostream
andfstream
. - C/C++ I/O are based on streams, which are a sequence of bytes flowing in and out of the programs.
- In input operations, data bytes flow from an input source (such as keyboard, file, network, or another program) into the program.
- In output operations, data bytes flow from the program to an output sink (such as console, file, network, or another program). Streams act as an intermediary between the programs and the actual IO devices, in much the way that frees the programmers from handling the actual devices, so as to archive device-independent IO operations.
C++ provides both the formatted and unformatted IO functions. In formatted or high-level IO, bytes are grouped and converted to types such as int
, double
, string or user-defined types. In unformatted or low-level IO, bytes are treated as raw bytes and unconverted. Formatted IO operations are supported via overloading the stream insertion (<<
) and stream extraction (>>
) operators, which presents a consistent public IO interface.
To perform input and output, a C++ program:
- Construct a stream object.
- Connect (Associate) the stream object to an actual IO device (e.g., keyboard, console, file, network, another program).
- Perform input/output operations on the stream, via the functions defined in the stream’s pubic interface in a device-independent manner. Some functions convert the data between the external format and internal format (formatted IO); while other does not (unformatted or binary IO).
- Disconnect (Dissociate) the stream to the actual IO device (e.g., close the file).
- Free the stream object.
C++ I/O Headers, Templates, and Classes
C++ IO is provided in headers <iostream>
(which included <ios>
, <istream>
, <ostream>
and <streambuf>
), <fstream>
(for file IO), and <sstream>
(for string IO). Furthermore, the header <iomanip>
provided manipulators such as setw()
, setprecision()
, setfill()
and setbase()
for formatting.
Files I/O (STREAMS)
- A stream models a stream of data. In a stream, data flows between objects, and those objects can perform arbitrary processing on the data. When you’re working with streams, the output is data going into the stream and input is data coming out of the stream. These terms reflect the streams as viewed from the user’s perspective.
- In C++, streams are the primary mechanism for performing input and output (I/O). Regardless of the source or destination, you can use streams as the common language to connect inputs to outputs.
- We can convert our objects to streams of bytes. We can also convert streams of bytes back to objects. The I/O stream library provides such functionality.
- Streams can be output streams and input streams.
- There are different kinds of I/O streams, for instance: file streams.
Formatted Operations (text-based streams)
All formatted I/O passes through two functions: the standard stream operators, operator <<
and operator >>
.
Read
We can read from a file, and we can write to a file. The standard library offers such functionality via file streams. Those files streams are defined inside the <code><fstream></code>
header and they are:
std::ifstream
– read from a filestd::ofstream
– write to a filestd::fstream
– read from and write to a file
The std::fstream
can both read from and write to a file, so let us use that one. To create a std::fstream
object we use:
#include <fstream>
int main()
{
std::fstream fs{ "myfile.txt" };
}
This example creates a fs file stream and associates it with a file name myfile.txt on our disk. To read from such file, line-by-line, we use:
#include <iostream>
#include <fstream>
#include <string>
int main()
{
std::fstream fs{ "myfile.txt" };
std::string s;
while (fs)
{
std::getline(fs, s); // read each line into a string
std::cout << s << '\n';
}
}
To read from a file, one character at the time we can use file stream’s >>operator:
#include <iostream>
#include <fstream>
int main()
{
std::fstream fs{ "myfile.txt" };
char c;
while (fs >> c)
{
std::cout << c;
}
}
Write
To write to a file, we use file stream <<
operator :
#include <fstream>
int main()
{
std::fstream fs{ "myoutputfile.txt", std::ios::out };
fs << "First line of text." << '\n';
fs << "Second line of text" << '\n';
fs << "Third line of text" << '\n';
}
We associate an fs
object with an output file name and provide an additional std::ios::out
the flag which opens a file for writing and overwrites any existing myoutputfile.txt
file. Then we output our text to a file stream using the <<
operator.
To append text to an existing file, we include the std::ios::app
flag inside the file stream constructor:
#include <fstream>
int main()
{
std::fstream fs{ "myoutputfile.txt", std::ios::app };
fs << "This is appended text" << '\n';
fs << "This is also an appended text." << '\n';
}
We can also output strings to our file using the file stream’s << operator:
#include <iostream>
#include <fstream>
#include <string>
int main()
{
std::fstream fs{ "myoutputfile.txt", std::ios::out };
std::string s1 = "The first string.\n";
std::string s2 = "The second string.\n";
fs << s1 << s2;
}
Text Files
- A text file (flat file) is a computer file that only contains text and has no special formatting such as bold text, italic text, images, etc. With Microsoft Windows computers text files are identified with the .txt file extension, as shown in the example picture.
- Because of their simplicity, text files are commonly used for the storage of information. They avoid some of the problems encountered with other file formats, such as endianness, padding bytes, or differences in the number of bytes in a machine word.
- A simple text file may need no additional metadata (other than knowledge of its character set) to assist the reader in interpretation. A text file may contain no data at all, which is the case of a zero-byte file.
Encoding
- The ASCII character set is the most common compatible subset of character sets for English-language text files, and is generally assumed to be the default file format in many situations. It covers American English, but for the British Pound sign, the Euro sign, or characters used outside English, a richer character set must be used.
- Unicode is an attempt to create a common standard for representing all known languages, and most known character sets are subsets of the very large Unicode character set. Although there are multiple character encodings available for Unicode, the most common is UTF-8, which has the advantage of being backwards-compatible with ASCII; that is, every ASCII text file is also a UTF-8 text file with identical meaning.
UTF-8
- UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.
Types of Text Files
CSV (Comma-separated values)
- A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.
- A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields.
- The CSV file format is not fully standardized.
id,firstname,lastname,email,email2,profession
0,Jobi,Gilmour,Jobi.Gilmour@yopmail.com,Jobi.Gilmour@gmail.com,doctor
1,Xylina,Killigrew,Xylina.Killigrew@yopmail.com,Xylina.Killigrew@gmail.com,police officer
2,Patricia,Zitvaa,Patricia.Zitvaa@yopmail.com,Patricia.Zitvaa@gmail.com,doctor
3,Gusty,Friede,Gusty.Friede@yopmail.com,Gusty.Friede@gmail.com,developer
4,Bee,Michella,Bee.Michella@yopmail.com,Bee.Michella@gmail.com,police officer
5,Evita,Keily,Evita.Keily@yopmail.com,Evita.Keily@gmail.com,firefighter
6,Deane,Jarib,Deane.Jarib@yopmail.com,Deane.Jarib@gmail.com,firefighter
7,Amii,Nance,Amii.Nance@yopmail.com,Amii.Nance@gmail.com,firefighter
8,Ardeen,Sparhawk,Ardeen.Sparhawk@yopmail.com,Ardeen.Sparhawk@gmail.com,police officer
9,Kenna,Skell,Kenna.Skell@yopmail.com,Kenna.Skell@gmail.com,developer
10,Kirbee,Shirberg,Kirbee.Shirberg@yopmail.com,Kirbee.Shirberg@gmail.com,doctor
Tab Delimited
- A tab-delimited text file is a file containing tabs that separate information with one record per line.
- A tab delimited file is often used to upload data to a system.
id firstname lastname email email2 profession
0 Jobi Gilmour Jobi.Gilmour@yopmail.com Jobi.Gilmour@gmail.com doctor
1 Xylina Killigrew Xylina.Killigrew@yopmail.com Xylina.Killigrew@gmail.com police officer
2 Patricia Zitvaa Patricia.Zitvaa@yopmail.com Patricia.Zitvaa@gmail.com doctor
3 Gusty Friede Gusty.Friede@yopmail.com Gusty.Friede@gmail.com developer
4 Bee Michella Bee.Michella@yopmail.com Bee.Michella@gmail.com police officer
5 Evita Keily Evita.Keily@yopmail.com Evita.Keily@gmail.com firefighter
6 Deane Jarib Deane.Jarib@yopmail.com Deane.Jarib@gmail.com firefighter
7 Amii Nance Amii.Nance@yopmail.com Amii.Nance@gmail.com firefighter
8 Ardeen Sparhawk Ardeen.Sparhawk@yopmail.com Ardeen.Sparhawk@gmail.com police officer
9 Kenna Skell Kenna.Skell@yopmail.com Kenna.Skell@gmail.com developer
10 Kirbee Shirberg Kirbee.Shirberg@yopmail.com Kirbee.Shirberg@gmail.com doctor
Example I/O
Unformatted Operations (binary files)
- When data is stored in a file in the binary format, reading and writing data is faster because no time is lost in converting the data from one format to another format. Such files are called binary files.
- The class
ios_base
is a multipurpose class that serves as the base class for all I/O stream classes.
Member types and constants, stream open mode type:
Constant | Explanation |
---|---|
app | seek to the end of stream before each write |
binary | open in binary mode |
in | open for reading |
out | open for writing |
trunc | discard the contents of the stream when opening |
ate | seek to the end of stream immediately after open |
File size and indexation
In C++, files are considered a stream or an array of uninterpreted bytes, each byte can also be considered a char
, with the file contents considered as a char array: (char *)myFile
.
The “array” of bytes stored in a file is indexed from zero to len-1, where len is the total number of bytes in the entire file.
Opening Files
There are two main ways of opening files in binary mode:
When declaring the object, set a file name and necessary flags in the constructor.
ifstream myReadFile(filename, ios::in | ios::binary);
Declare a stream object and use the open method to set the file name and necessary flags.
ifstream myFile;
myFile.open ("data2.dat", ios::out | ios::binary);
There are two main flags that need to be used when manipulating binary files:
- The i/o mode
ios::in
orios::out
- The binary mode
ios::binary
Read
The read
method extracts a given number of bytes from the stream, and places them into the memory pointed to by the first parameter.
Person person;
std::string filename = "people.dat";
ifstream inFile;
inFile.open(filename, ios::in | ios::binary);
inFile.read((char*)&person, sizeof(person));
cout << person.toString << std::endl;
inFile.close();
Write
The write member function writes a given number of bytes on the given stream, starting at the position of the “put” pointer.
Person person1 = Person("Julio");
std::string filename = "people.dat";
ofstream outFile;
outFile.open(filename, ios::out | ios::binary);
outFile.write((char *)&person1, sizeof(person1));
outFile.close();
Accessing file positions
Each open file will have a “get” and a “put” pointer, these store a position in the file, and are part of the stream object.
The GET pointer
It is the current reading position, the index of the next byte that will be read from the file. The get pointer can be repositioned with the istream& seekg(streampos pos)
method. To return the index of the get pointer on a given stream use istream& tellg()
.
Person person;
std::string filename = "people.dat";
ifstream inFile;
inFile.open(filename, ios::binary);
inFile.seekg (sizeof(person),ios::beg);
// Reading the second person in the fileinFile.read((char*)&person, sizeof(person));
cout << person.toString << std::endl;
inFile.close();
The PUT pointer
The put pointer can be repositioned with the ostream& seekp(streampos pos)
method. To return the index of the put pointer on a given stream use istream& tellp()
.
Person person;
std::string filename = "people.dat";
ofstream outFile;
outFile.open(filename, std::ios::binary | std::ios::app);
std::cout << "Adding in position: " << outFile.tellg() << " byte." << std::endl;
// Adding to the end of the file.outFile.write((char*)&person, sizeof(person));
outFile.close();