CHAPTER 11 - Files One of the most common operations when using a computer is to either read from, or write to a file. You are already somewhat experienced in file handling from the last chapter, because in computer terminology, the keyboard, terminal, and printer are all classified as files. A file is any serial input or output device that the computer has access to. Since it is serial, only one piece of information is available to the computer at any instant of time. This is in contrast to an array, for example, in which all elements of the array are stored internally and are all available at any time. Several years ago computers were all large cumbersome machines with large peripheral devices such as magnetic tape drives, punch card readers, paper tape readers or punches, etc. It was a simple task to assign the paper tape reader a symbol and use that symbol whenever it was necessary to read a paper tape. There was never more than one file on the paper tape being read, so it was simply read sequentially, and hopefully the data was the desired data. With the advent of floppy disks, and hard disks too, it became practical to put several files of data on one disk, none of which necessarily had anything to do with any of the other files on that disk. This led to the problem of reading the proper file from the disk, not just reading the disk. Pascal was originally released in 1971, before the introduction of the compact floppy disk. The original release of Pascal had no provision for selecting a certain file from among the many included on the disk. Each compiler writer had to overcome this deficiency and he did so by defining an extension to the standard Pascal system. Unfortunately, all of the extensions were not the same, and there are now several ways to accomplish this operation. There are primarily two ways, one using the ASSIGN statement, and the other using the OPEN statement. They are similar to each other and they accomplish the same end result. All of the above was described to let you know that we will have a problem in this chapter, namely, how do we cover all of the possible implementations of Pascal available? The answer is, we can't. Most of what is covered in this chapter will apply to all compilers, and all that is covered will apply to the TURBO Pascal compiler. If your compiler complains about some of the statements, it will be up to you to dig out the details of how to do the intended operations. If there is no way to do any of these operations, you should seriously consider getting another compiler because all of these operations are needed in a useful Pascal environment. Page 51 CHAPTER 11 - Files READING AND DISPLAYING A FILE Examine the file READFILE for an example of a program that can read a text file from the disk, in fact it will read itself from the disk and display it on the video monitor. The first statement in the program is the ASSIGN statement. This is TURBO Pascal's way of selecting which file on the disk will be either read from or written to. In this case we will read from the disk. The first argument in the ASSIGN statement is the device specifier similar to "lst" used in the last chapter for the printer. We have chosen to use "turkey", but could have used any valid identifier. This identifier must be defined in a VAR declaration as a TEXT type variable. The next argument is the filename desired. The filename can be defined as a string constant, as it is here, or as a string variable. The TEXT type is a predefined type and is used to define a file identifier. It is predefined as a "file of CHAR", so it can only be used for a text file. We will see later that there is another type of file, a binary file. Now that we have a file identified, it is necessary to prepare it for reading by executing a RESET statement. The reset statement positions the read pointer at the beginning of the file ready to read the first piece of information in the file. Once we have done that, data is read from the file in the same manner as it was when reading from the keyboard. In this program, the input is controlled by the WHILE loop which is executed until we exhaust the data in the file. WHAT ARE THE "EOF" AND "EOLN" FUNCTIONS? The "eof" function is new and must be defined. When we read data from the file, we move closer and closer to the end, until finally we reach the end and there is no more data to read. This is called "end of file" and is abbreviated "eof". Pascal has this function which is false until we reach the last line of the file, but when there is no more data in the file to be read, the function "eof" becomes true. To use the function, we merely give it our file identifier as an argument. It should be clear that we will loop until we read all of the data available in the file. The "eoln" function is not used in this program but is a very useful function. If the input pointer is anywhere in the text file except at the end of a line, the "eoln" function is false, but at the end of a line, it becomes true. This function can therefore be used to find the end Page 52 CHAPTER 11 - Files of a line of text for variable length text input. The "eoln" function is not available, and in fact meaningless when you are reading a binary file, to be defined later. To actually read the data, we use the READLN procedure, giving it our identifier "turkey" and the name of the variable we want the data read into. In this case, we read up to 80 characters into the string and if more are available, ignore them. Remember this from the keyboard input? It is the same here. Since we would like to do something with the data, we simply output the line to the default device, the video monitor. It should be clear to you by now that the program will simply read the entire file and display it on the monitor. Finally, we CLOSE the file "turkey". It is not really necessary to close the file because the system will close it for you automatically at program termination, but it is a good habit to get into. It must be carefully pointed out here, that you did not do anything to the input file, you only read it and left it intact. You could RESET it and reread it again in this same program. Compile and run this program to see if it does what you expect it to do. A PROGRAM TO READ ANY FILE Examine the next program READDISP for an improved file reading program. This is very similar except that it asks you for the name of the file that you desire to display, and enters the name into a 12 character string named "name_of_file_to_input". This is then used in the ASSIGN statement to select the file to be read, and the file is reset as before. A header is then displayed, and the program is identical to the last one with some small additions. In order to demonstrate the use of a function within the WRITELN specification, the program calls for the length of the input string and displays it before each line. The lines are counted as they are read and displayed, and the line count is then displayed at the end of the listing. You should be able to see clearly how each of these operations is accomplished. Compile and run this program, entering any filename we have used so far (be sure to include the .PAS). After a successful run, enter a nonexistent filename and see the I/O error. HOW TO COPY A FILE (SORT OF) Examine the file READSTOR for an example of both reading from a file and writing to another one. In this program we request an operator input for the filename to read, after which we ASSIGN the name to the file and RESET it. Next, we Page 53 CHAPTER 11 - Files request a different filename to write to, which is assigned to a different identifier. The next statement is new to us, the REWRITE statement. This name apparently comes from the words REset for WRITEing because that is exactly what it does. It clears the entire file of any prior data and prepares to write into the very beginning of the file. Each time you write into it, the file grows by the amount of the data written. Once the identifier has been defined, and the REWRITE has been executed, writing to the file is identical to writing to the display with the addition of the identifier being specified before the first output field. With that in mind, you should have no trouble comprehending the operation of the program. It is similar to the last program, except that it numbers the lines as the file is copied. After running the program, look on your default disk for the filename you input when it asked for the output filename. Examine that file to see if it is truly a copy of the input file with line numbers added. One word of caution, if you used an existing filename for the output file, the file was overwritten, and the original destroyed. In that case, it was good that you followed instructions at the beginning of this tutorial and made a working copy. You did do that, didn't you? HOW TO READ INTEGER DATA FROM A FILE It is well and good to be able to read text from a file, but now we come to the time to read data from a file. First we will read data from a text file, then later from a binary file. Examine the program READINTS for an example of reading data from a text file. A text file is an ASCII file that can be read by a text editor, printed, displayed, or in some cases, compiled and executed. It is simply a file made up of a long string of CHAR type data, and usually includes linefeeds, carriage returns, and blanks for neat formatting. Nearly every file on the Tutorial disk you received with this package is a text file. The notable exception is the file named LIST.COM, which is an executable program file. The example program has nothing new, you have seen everything in it before. We have an assignment, followed by a reset of our file, followed by four read and write loops. Each of the loops has a subtle difference to illustrate the READ and READLN statements. Notice that the same file is read in four times with a RESET prior to each, illustrating the nondestructive read mentioned a few paragraphs ago. The file we will be using is named INTDATA.TXT and is on your disk. You could display it at this time using the Page 54 CHAPTER 11 - Files program READDISP we covered recently. Notice that it is simply composed of the integer values from 101 to 148 arranged four to a line with a couple of spaces between each for separation and a neat appearance. The important thing to remember is that there are four data points per line. READ AND READLN ARE SLIGHTLY DIFFERENT As variables are read in with either procedure, the input file is scanned for the variables using blanks as delimiters. If there are not enough data points on one line to satisfy the arguments in the input list, the next line is searched also, and the next, etc. Finally when all of the arguments in the input list are satisfied, the READ is complete, but the READLN is not. If it is a READ procedure, the input pointer is left at that point in the file, but if it is a READLN procedure, the input pointer is advanced to the beginning of the next line. The next paragraph should clear that up for you. The input data file INTDATA.TXT has four data points per line but the first loop in the program READINTS.PAS requests only three each time through the loop. The first time through, it reads the values 101, 102, and 103, and displays those values, leaving the input pointer just prior to the 104, because it is a READ procedure. The next time through, it reads the value 104, advances to the next line and reads the values 105, and 106, leaving the pointer just prior to the 107. This continues until the 5 passes through the loop are completed. The next loop contains a READLN procedure and also reads the values 101, 102, and 103, but when the input parameter list is satisfied, it moves the pointer to the beginning of the next line, leaving it just before the 105. The values are printed out and the next time we come to the READLN, we read the 105, 106, and 107, and the pointer is moved to the beginning of the next line. It would be good to run the program now to see the difference in output data for the two loops. When you come back to the program again, notice the last two loops, which operate much like the first two except that there are now five requested integer variables, and the input file still only has four per line. This is no problem. Both input procedures will simply read the first four in the first line, advance to the second line for its required fifth input, and each will do its own operation next. The READ procedure will leave the input pointer just before the second data point of the second line, and the READLN will advance the input pointer to the beginning of Page 55 CHAPTER 11 - Files the third line. Run this program and observe the four output fields to see an illustration of these principles. NOW TO READ SOME REAL VARIABLES FROM A FILE By whatever method you desire, take a look at the file named REALDATA.TXT supplied on your Pascal Tutorial disk. You will see 8 lines of what appears to be scrambled data, but it is good data that Pascal can read. Notice especially line 4 which has some data missing, and line 6 which has some extra data. Examine the program file READDATA which will be used to illustrate the method of reading REAL data. Everything should be familiar to you, since there is nothing new here. Notice the READLN statement. It is requesting one integer variable, and three real variables, which is what most of the input file contained. When we come to the fourth line, there are not enough data points available, so the first two data points of the next line are read to complete the fourth pass. Since the pointer is advanced to the beginning of the next line, we are automatically synchronized with the data again. When we come to the sixth line, the last two data points are simply ignored. Run the program to see if the results are as you would predict. If a READ were substituted for the READLN, the pointer would not be advanced to the beginning of line 6, after the fourth pass through the loop. The next attempt to read would result in trying to read the .0006 as an INTEGER, and a run time error would result. Modify the program and see if this is not true. That is all there is to reading and writing text files. If you learn the necessities, you will not be stumbling around in the area of input/output which is very intimidating to many people. Remember to ASSIGN, then RESET before reading, REWRITE before writing, and CLOSE before quitting. It is of the utmost importance to close a file you have been writing to before quitting to write the last few buffers to the file, but it is not important to close read files unless you are using a lot of them, as there is an implementation dependent limit of how many files can be open at once. It is possible to read from a file, close it, reopen it, and write in it in one program. You can reuse a file as often as you desire in a program, but you cannot read from and write into a file at the same time. Page 56 CHAPTER 11 - Files NOW FOR BINARY INPUT AND OUTPUT Examine the file BINOUT for an example of writing data to a file in binary form. First there is a record defined in the type declaration part composed of three different variable types. In the VAR part, "output_file" is defined as a "FILE of dat_rec", the record defined earlier. The variable "dog_food" is then defined as an array of the record, and a simple variable is defined. Any file assigned a type of TEXT, which is a "FILE of CHAR", is a text file. A text file can be read and modified with a text editor, printed out, displayed on the monitor, etc. If a file is defined with any other definition, it will be a binary file and will be in an internal format as defined by the Pascal compiler. Attempting to display such a file will result in very strange looking gibberish on the monitor. When we get to the program, the output file is assigned a name, and a REWRITE is performed on it to reset the input point to the beginning of the file, empty the file, and prepare for writing data into it. The next loop simply assigns nonsense data to all of the variables in the 20 records so we have something to work with. We finally write a message to the display that we are ready to start outputting data, and we output the data one record at a time with the standard WRITE statement. A few cautions are in order here. The output file can be defined as any simple variable type, INTEGER, BYTE, REAL, or a record, but cannot be mixed. The record however, can be any combination of data including other records, if desired, but any file can only have one type of record written to it. Also, a WRITELN statement is illegal when writing to a binary file because a binary file is not line oriented. A WRITE statement is limited to one output field per statement. It is a simple matter to put one WRITE statement in the program for each variable you wish to write out to the file. It is important to CLOSE the file when you are finished writing to it. WHY USE A BINARY FILE A binary file written by a Pascal program cannot be read by a word processor, a text editor, any application program such as a database or spreadsheet, and it may not even be readable by a Pascal program compiled by a different companies compiler because the data is implementation dependent. It can't even be read by a Pascal program using the correct compiler unless the data structure is identical Page 57 CHAPTER 11 - Files to the one used to write the file. With all these rules, it seems like a silly way to output data, but there are advantages to using a binary output. A binary file uses less file space than a corresponding text file because the data is stored in a packed mode. Since all significant digits of REAL data are stored, it is more precise unless you are careful to output all significant data to the corresponding TEXT file. Finally, since the binary data does not require formatting into ASCII characters, it will be considerably faster than outputting it in TEXT format. When you run the example program, it will create the file KIBBLES.BIT, and put 20 records in it. Return to DOS and look for this file and verify its existence. If you try to TYPE it, you will have a real mess, but that might be a good exercise. READING A BINARY FILE BININ is another example program that will read in the file we just created. Notice that the variables are named differently, but the types are all identical to those used to write the file. An additional line is found in the program, the IF statement. We must check for the "end of file" marker to stop reading when we find it or Pascal will list an error and terminate operation. Three pieces of information are written out to verify that we actually did read the data file in. Once again, a few rules are in order. A READLN is illegal since there are no lines in a binary file, and only one variable or record can be read in with a READ statement. WHAT ABOUT FILE POINTERS, GET, AND PUT STATEMENTS? File pointers and the GET and PUT procedures are a part of standard Pascal, but since they are redundant, they are not a part of TURBO Pascal. The standard READ and WRITE procedures are more flexible, more efficient, and easier to use. The use of GET and PUT will not be illustrated or defined here. If you ever have any need for them, they will be covered in detail in your Pascal reference manual for the particular implementation you are using. Pointers will be covered in detail in the next chapter of this tutorial. Page 58 CHAPTER 11 - Files PROGRAMMING EXERCISES 1. Write a program to read in any text file, and display it on the monitor with line numbers and the number of characters in each line. Finally display the number of lines found in the file, and the total number of characters in the entire file. Compare this number with the filesize given by the DOS command DIR. 2. Write a silly program that will read two text files and display them both on the monitor on alternating lines. This is the same as "shuffling" the two files together. Take care to allow them to end at different times, inserting blank lines for the file that terminates earlier. Page 59