FAT and the Directory Structure

MS-DOS and Windows 9x share almost the same FAT and directory structure. In this ariticle, let me talk about them.

MS-DOS stores your files on the disk. Then, how are they arranged? Let me explain it to you.

Let's firstly see how books are stored. Usually, people store information in books. Books are a kind of mass media. They can hold both text and graphic information. Each book has a name, which are usually different from one to another, so that we can distinguish one book from others. A book usually consists of paper only. The number of pieces of paper composing a book can be counted. Every book takes up a certain amount of space in the bookshelf when they are put in a book shelf. They can be moved when there is need to rearrange them.

The MS-DOS FAT file system is somewhat alike. See a list of its features below.

MS-DOS FAT file system has these features:
 
1. Each file takes up a certain number of clusters (also called allocation units), just like each book consists of a certain number pieces of paper. Each cluster is unique on the disk. They can be numbered. By MS-DOS, a certain disk has a fixed cluster size, and clusters are continuously defined on the disk.
 
2. Books cannot be divided, but a file can be divided and then stored on a disk. Why should this be done? Because moving books on a bookshelf is easy, but moving files on a disk from one place to another usually takes much time; and what's more important, files may be modified, so a file may increase or decrease from time to time. If files are stored sequentially, there may be much work in moving the rest files on the disk while there is only one file being modified. So files should not be always stored sequentially on the disk like books on the bookshelf. To enable divided storage of files on the disk, FAT is used. FAT is used to hold chains of files on the disk. Because each cluster can be numbered, it is possible to locate a file on a disk. If you have learned programming, you may have the knowledge of linked lists. Information in the FAT is like linked lists. The pointer to the head node of the linked list is stored in a directory, the following pointers are stored in the FAT. Each FAT entry represents a cluster. To distinguish between occupied clusters and free clusters, clusters are numbered from three. Zero-value FAT entries represent free clusters. On the other hand, there should be an end mark for chains in the FAT. MS-DOS uses 0xFFF, 0xFFFF, 0xFFFFFFFF respectively as end marks of FAT12, FAT16 and FAT32. These marks are quite like NULL in programming. MS-DOS also reserved other marks, which are 0xFF7, 0xFFF7, 0xFFFFFFF7 respectively as "bad cluster" marks (marking bad disk areas) for FAT12, FAT16 and FAT32. Bad clusters are marked in the FAT in order to avoid writing to bad sections of the disk.
 
3. Directories are used for storing the names of files. Just like books have their names, files also have their names. They are listed in the directory. Also, the number of the first cluster of the file is also listed in the directory. By MS-DOS, there is a root directory on a disk. The root directory has a fixed number of entries available for files, and its position on the disk is fixed. (Since Windows 95 OSR 2, its file system FAT32 has a sizable root directory which also has a corresponding chain list in the FAT like subdirectories.) A "directory entry" contains basic information of a file or a subdirectory in the directory. Since MS-DOS 2.0, MS-DOS supports tree-structured directories. To do so, MS-DOS uses subdirectories. A subdirectory is stored on the disk like a file, but it contains directory entries that are similar to those in the root directory. Of course this is a good idea. Number of entries in a subdirectory is unlimited (compared with the root directory; of course it is limited by disk space), because a subdirectory can extend or shrink like a file. To distinguish used entries from free entries, MS-DOS states that if an entry starts with the character 0xE5 (in ASCII, a Greek character Sigma) or 0x00, the entry is free(may be an entry of a deleted file).
 
4. The specification of a drive (logical drive; a volume) (FAT type [8/12/16/32; 1 FAT or 2 FATs], compatible MS-DOS version, disk size, cluster size, root directory size, volume label, volume serial number, etc.) is stored in the boot sector. Usually, a sector is a continuous 512 byte block on the disk (its actual size depends on the type of the disk). The boot sector also contains the boot program for MS-DOS. 2 FATs, 1 root directory and the boot sector are created during the MS-DOS FORMAT.COM process of a usual logical drive.
 

That's a story about FAT and MS-DOS directory structure. For practical use, here are some paragraphs talking about real disk allocation by MS-DOS.

Floppy disks are widely used today. Although they are not used so often now as they were, they are still common. I think the specification of current MS-DOS formatted 3.5 inch High Density floppies can be shown as an example. This example is a Windows Me boot disk.

1. The very beginning sector of the disk is the boot sector. This is always true when you are using MS-DOS (it is also true for most operating systems). There is a program in the boot sector that loads the operating system. The program in the boot sector is loaded by a program in the master boot sector (MBR) or by the computer BIOS program.

Let's see what it contains:


2. The sectors 2 through 10 are sectors of the first FAT (FAT 1), 11 through 19 are sectors of the second FAT (FAT 2). Let's see what they contain:

There are entries in it. The type of the FAT is FAT12, that is, each entry takes 12 bits. Cluster numbers are stored in small-endian format. That is to say, like an integer in the memory, the number 0x3E4D (for example) is stored as "4D 3E" (if you know C language programming, you can try this: {int i = 0x3E4D; printf("%X %X", (int)*( ((char *)&i) + 0 ), (int)*( ((char *)&i) + 1 ));} ). In FAT12, two adjacent number such as 0xA3D, 0x897 are stored as "3D 7A 89" at any offset that can be devided by 3. In the picture above, the entries from the very beginning are "FF0, FFF, 003, 004, 005, 006, 007, 008, 009, 00A, 00B, 00C, 00D, 00E, 00F, 010, 011, 012, 013, . . . ". The first two entries are reserved by MS-DOS, so actual entries are from the third entry, and the third entry actually represents the first cluster, although its cluster number is 3.

Look at the picture above, FAT 2 contains the same information as FAT 1 does. The two FATs are used to enhance reliability of disk operations.

3. Sectors 20 through 33 are sectors of the root directory. The root directory of a 3.5 inch high density floppy disk can hold 224 entries. Each entry takes 32 bytes, so the size of the root directory is 14 sectors. Let's see what they contain:


The file system layout of 3.5 inch floppy disks has been described above.
 
 
 
 
 
 
 
 
 
 
 
Important: the layout of FAT32 structure is quite different from FAT12 and FAT16. I still don't know it clearly.

Windows 9x and NT support long filenames, and they are listed in the directory like this:

You see that every file with a long filename is like this: the long filename entries are listed before the short filename entry of the file in reverse order (although not seen in this picture), and the long filename is in UNICODE. To prevent showing them in old versions of MS-DOS, the entries of the long filename are set to have an attribute "L" which means "label". The attribute byte is the 12th byte in an entry. The byte is defined as follows (the example is the byte of any long filename):
0 0 0 0 1 1 1 1

N N A D L S H R
N --- Reserved
A --- Archive, meaning the file has been changed. Once the file is changed, this attribute is automatically turned on by MS-DOS.
D --- Directory, meaning the file is a subdirectory.
L --- Label, meaning the filename is a volume label.
S --- System, meaning the file is a system file (the file will not appear in the directory list and it is very important).
H --- Hidden, meaning the file will not be listed by default.
R --- Read-only, meaning the file cannot be modified or deleted.

Long filename entries store the long filename in reverse order. The first entry starts with a capital letter (may be A, B, C...). Following long filename entries each start with a sequence number 0x01, 0x02, etc. The first byte of the long filename entry indicates the number of entries. For example, "A" means the first entry is the only entry. "C" means there are three entries including the first entry; there should be two following entries each starts with 0x02 and 0x01. Not all the bytes are used for the filename, but only the bytes 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 29, 30, 31 and 32. Why the other bytes are not used is because Windows 9x is designed to keep compatibility with older versions of MS-DOS. The bytes 27 and 28 are used for storing the starting cluster of FAT16 system, and they should be set to zero if the entry is a label according to older versions of MS-DOS. The bytes 13 and 14 are reserved in older versions of MS-DOS, but they mean something in FAT32 file system. The attribute byte of a long filename entry must be 0x0F. For accurate information, view this page: Windows "Long File Name" Specification.

Now let's talk about some common operations of MS-DOS.
 
The DELETE operation. Deleting a file in MS-DOS takes only a short time. Why? It is because only the directory entry is marked "free" and related FAT entries are marked "free". The space the file occupied is not wiped. To free a directory entry, MS-DOS marks the first byte of the entry 0xE5. To free up some entries in the FAT, MS-DOS marks the entries 0x000, 0x0000 or 0x00000000 (respectively to FAT12, FAT16 and FAT32). So a deleted file can sometimes be recovered. But IMPORTANT: The UNDELETE command in older versions of MS-DOS cannot be used on Windows 9x or on Windows NT, and it also cannot be used on FAT32 drives. To undelete a file, you can use Norton Utilities for Windows 9x/NT --- Norton UnErase, or some other tools for Windows.

The OVERWRITE operation. Overwriting a file really means overwriting the occupied clusters of the file first, and if there are not enough original clusters MS-DOS allocates more. So if you want to delete a file completely, the best way is to overwrite the file with another (which should not be smaller than the original one) first, and then delete it. You can also use some tools like Norton Utilities --- Wipe Info to do this.

The MOVE operation. Moving a file or a directory to the same disk only means moving the directory entry (or entries if it has a long filename) from one directory to another. So it is even faster to move a file than to delete it. Recycle Bin in Windows does the following operation when you deletes a file: It gathers the information of the file (including filename, size, deletion time, and original location) first, then moves it to the "\RECYCLED" directory on the corresponding disk while at the same time renaming it with "DCxxx" ("C" means on drive C:) where "xxx" is a number, storing its information to the file "\RECYCLED\INFO" (Windows 9x/NT 4) or "\RECYCLED\INFO2" (Windows Me/2000). So moving a directory which contains many files to the Recycle Bin is slower than moving it to a normal directory on the same disk.

Return to MS-DOS for Beginners