Version 0.2.4, last update: 2007-01-01
Copyright © 2004-2006 Salvatore ISAJA
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included by reference.
This document describes version 0.2 of LEAN (Lean yet Effective Allocation and Naming), a new free, simple, portable, personal, fully featured file system for embedded tasks. The third number in the version number identifies changes made to this document (e.g. wording) that do not change the file system format.
The LEAN file system is in the alpha (unstable) development stage: this document supersedes any previous version of the LEAN file system specification with no care for backward compatibility.
Since this is a new file system, and some aspects are still to be defined, suggestions or corrections are welcome, for either the file system or this document. Please contact the author sending e-mail at: salvois at users.sourceforge.net. The author wishes to thank who have submitted comments and criticism in order to improve the LEAN file system.
The mission of the LEAN file system is to provide a new free, simple, portable, personal file system alternative to the proprietary (and in part patented) Microsoft FAT file system, primarly intended for media exchange and for use with embedded devices, and to expand the file system functionality of the FreeDOS-32 free modular operating system.
LEAN is a recursive backronym for Lean yet Effective Allocation and Naming, based on the ironical contrary of FAT, as an English word.
The LEAN file system is somewhat conceptually inspired from the Linux Second Extended file system (EXT2), but it is much simpler. Although developed independently, some design choices of the last version of LEAN resemble those of the QNX file system.
LEAN aims to be the simplest file system providing the most common and useful file system features. Any design choices have gone in the direction of the greatest simplicity rather than performance. It is said that programmers like the FAT file system for its simplicity. LEAN aims to be much simpler yet to perform better.
The LEAN file system may be used on small media (as small as a floppy) as well as on bigger media. The basic allocation unit is a sector, that is a block of the hosting block device. Depending on the sector size, a file system volume may be large up to 2 TiB (assuming 512-byte sectors) or up to 16 TiB (assuming 4096-byte sectors). In fact, sectors are addressed using 32-bit numbers. The total number of files, the size of a file and the number of files per directory are only limited by the free volume space.
The volume status is stored in a superblock at the beginning of the hosting block device, as well as in a superblock backup in another location. Free sectors are tracked by a bitmap, that is evenly distributed across the file system.
File metadata are stored in inode structures, located in the first sector of each file, like a header. An inode structure stores the attributes of a file and the information to access any other sector of a file. Each file is uniquely identified by an inode number, that is the address of its first sector.
File allocation is extent-based, that is each file is composed by several chunks, called extents, each made up of one or more of contiguous sectors. Extent addressing is direct for the first extents of a file (in order to maximize performance on small files), storing the extent specirfications in the inode structure itself, whereas it is indirect for subsequent extents, using a doubly-linked list of level-1 indirect sectors.
Directories are files which store an unordered list of directory entries. The LEAN file system provides case sensitive international long file names, encoded in Unicode UTF-8. The length of a file name is limited to 2032 bytes or by the sector size, whichever comes first. Both hard links and symbolic links are available.
The following table summarizes the features of the LEAN file system:
| Sector size | 512, 1024, 2048 or 4096 bytes |
| Max volume size | 232 sectors |
| File size | arbitrary |
| Number of files | arbitrary |
| Files per directory | arbitrary |
| Free space tracking | distributed bitmap |
| Allocation type | mixed linked/indexed, extent-based |
| Directory format | plain unordered list |
| File names | long, case sensitive, in UTF-8 |
| Links | hard and symbolic |
A reference implementation, written in the D programming language, is about to be completed.
Converting LEAN to 64-bit sector addressing is being considered. That would break the 2 TiB limit (with 512-byte sectors) at the price of some extra disk space. If 64-bit numbers are used for both sector numbers and extent sizes, an indirect sector would contain up to 28 extent specifications vs. 60 (with 512-byte sectors), and the inode structure would be 224 bytes (storing 8 extent specifications) vs. 128. If 32-bit numbers are retained for extent sizes, an indirect sector could store up to 38 extent specifications (with 512-byte sectors) and the inode structure would be 192 bytes. Alignment issues should be considered, though (64-bit and 32-bit numbers would be interleaved in an array). Each directory entry would have a header of 12 bytes instead of 8. Feedback and suggestions are welcome.
This section describes the architecture of the LEAN file system as stored in a hosting block device.
The following terms and conventions will be used throughout this specification:
byte, short, int, long mean, respectively, 8-, 16-, 32- and 64-bit signed integers, while ubyte, ushort, uint, ulong are unsigned, char is an unsigned byte of a UTF-8 string, enums are scoped);0x;// The address of a sector. typedef uint Sector; // An extent specification. struct Extent { Sector start; Sector size; } // A 128-bit globally unique identifier (GUID). struct Guid { uint data1; ushort data2; ushort data3; ubyte[8] data4; }
On the i386 platform, each file system is identified by an 8-bit number in an entry of an MBR partition table. The value 0xEA (after LEAN) has been proposed as it seems unassigned.
For GUID partition tables the following GUID is provided to identify a LEAN data partition:
bb5a91b0-977e-11db-b606-0800200c9a66
Several structures of LEAN include fields to make the file system more robust.
Sensitive structures store a checksum field in their first 32 bits. The checksum is computed on any data following the checksum field itself, up to the end of the sensitive structure. The sensitive structure must be cast to an array of 32-bit unsigned integers, then each element of this array must be summed, rotating the sum one bit to right at every iteration. The size of the sensitive structure must be a multiple of 4 bytes. The checksum must be recomputed at least every time a sensitive structure is written back to disk.
The following function shows how to calculate the checksum. The data parameter is a pointer to the sensitive structure, including the checksum field in its first 4 bytes. The size parameter is the size in bytes of the structure pointed by data, including the 4 bytes for the checksum field. It must be a multiple of 4 bytes. The function skips the first 4 bytes when computing the checksum.
uint computeChecksum(void* data, size_t size) { uint res = 0; uint* d = cast(uint*) data; assert((size & (uint.sizeof - 1)) == 0); size /= uint.sizeof; for (uint k = 1; k < size; k++) res = (res << 31) + (res >> 1) + d[k]; return res; }
A sensitive structure also contains a magic field storing a 32-bit constant signature identifying the structure. This can be used as a first test to validate a sensitive structure.
A field specifying the address of the sector storing the structure itself is sometimes provided. This is another robustness measure against repair utilities, in case they are scanning a disk for sensitive data structure, and an image file containing a LEAN file system is stored on that disk.
Figure 1: Layout of a LEAN volume.
To improve locality in space and to help dinamically change the size of a volume, the disk is logically subdivided into bands, made up of several contiguous sectors. Bands are equally sized, perhaps except the last one, and must be made up of a power of two worth of sectors. A volume contains at least one band. Bands are numbered starting from 0, where sector 0 is the first sector of band 0.
Each band contains a portion of the bitmap representing the allocation state of the sectors of that band. That is, each bitmap portion represents the state of closely located sectors. The bitmap portion for each band must start at the first sector of the band. Band 0 is an exception in that it contains sectors reserved for the boot loader and the superblock, thus a specific field is used to tell where the bitmap of band 0 starts.
The superblock is a structure containing fundamental information about a LEAN volume. Its name comes from the Linux EXT2 file system, and it is used to differentiate it from the boot sector, that on i386 is sector 0, containing the code of the boot loader.
A driver must use the superblock to know how to access every other information on the volume. Thus, the superblock must be stored in a well known location. The superblock must be stored in the first 512 bytes (regardless the sector size) of one of any sector in the range 1 to 32, inclusive. This range should allow to scan for a superblock taking advantage of track buffering from the disk driver, while being flexible enough to overcome the limits of a fixed location. This leaves up to 16 KiB for the boot loader if the sector size is 512 bytes. A driver must use the magic and checksum fields of the superblock to identify a valid superblock.
A copy of the superblock must be present in a different location for backup purposes. A driver must ensure that the content of the backup copy is always exactly the same as the superblock. The location of the superblock backup is stored in the superblock to allow a driver update the backup copy. The backup copy of the superblock should be placed in the last sector of the first band, in order to help drivers locate it in the event the primary copy gets lost or corrupted (being band sizes constrained to powers of two). The magic, checksum and backupSuper fields must be used in this case.
Sectors before the superblock are reserved for a boot loader, they must not be used by a LEAN driver, and their content is undefined. The superblock may be shifted forward, up to the limit specified above, if more sectors are needed to store a boot loader. Sector 0 is always reserved.
In the FAT file system the boot sector embeds, besides the boot loader, volume status information. Thus, if a boot sector containing apparently valid FAT information is used on a LEAN volume, FAT drivers may erroneously detect the volume as FAT. Implementors and/or users may remove FAT related information from the boot sector (for example, zeroing every byte but the actual boot code), or try FAT detection at the end of the autodetection process. The LEAN design avoids storage of volume information in the boot sector for several reasons: platform independence, to use a compact, free format, where the fields are properly aligned and ordered, for robustness against a damaged sector 0, and to make room for boot loader code.
The format of the superblock is the following:
uint checksum | The checksum value for the superblock. |
uint magic | This must be equal to 0x4E41454C (the 'LEAN' characters in ASCII), and it must be used to identify a valid LEAN file system. It should be used to find the superblock backup in case the primary copy gets lost or damaged. |
ushort fsVersion | This field identifies the version of the LEAN file system, and it is provided for future development. The high byte identifies the major version number and the low byte the minor version number (for example 0110h would mean version 1.16). At present, in the alpha development stage, it must be set to 0002h (that is version 0.2) and drivers must not try to access an unknown file system version, backward compatibility making no sense. |
ubyte logBytesPerSector | This is the base-2 logarithm of the sector size in bytes. The latter is easily computed as 1 << logBytesPerSector. Its value must be 9, 10, 11 or 12, meaning sector sizes of 512, 1024, 2048 and 4096 bytes respectively. The base-2 logarithm is used instead of the actual size in bytes to allow faster conversion (via shift operator) between bytes and sectors. This is the sector size used as the smallest unit managed by the file system, and should match the sector size of the disk. |
ubyte logSectorsPerBand | The base-2 logarithm of the number of sectors per band. The latter is easily computed as 1 << logSectorsPerBand. It must be at least equal to logBytesPerSector + 3, so that a bitmap portion is always made up of an integral amount of sectors (for example, the smallest band size is 4096 sectors if the sector size is 512 bytes). |
uint state | This field identifies the state of the file system. Bit 0 (the LSB) is the clean unmount bit: it must be set to '0' while the volume is mounted, and it must set '1' while the volume is unmounted. Thus, if driver finds this bit set to '0' when mounting a volume, it means the volume was not unmounted properly. Bit 1 is the error bit: it must be normally '0'. Drivers must set it to '1' in case of major errors, like when file system corruption is detected. File system checkup programs may use this bit to know if some major problem happened with the volume. Bits 2 to 31 are reserved for future use and must be set to '0' and ignored by drivers. |
Guid guid | A 128-bit globally unique identifier (GUID) that should uniquely identify the volume. A driver should use it to check for media change in a removable media drive. A driver may use it for automatic volume identification. It must be calculated using one of the canonical algorithms for GUID generation. |
char[64] volumeLabel | A descriptive string, encoded in Unicode UTF-8, null-terminated, to help the user identify the volume. It may be empty. |
Sector sectorsCount | The total number of sectors that form the volume. |
Sector freeSectorsCount | The number of free (or available, or unallocated) sectors in the volume. A driver must keep it consistent to the rest of the file system to measure the free volume space to store files, directories and book keeping. A value of zero must mean disk full. |
Sector primarySuper | The address of the sector containing the primary copy of the superblock in its first 512 bytes. A driver must use this number to know where to write changes made to the superblock back to disk. This field is provided for robustness, for example against repair utilities operating on a disk containing a LEAN file system image file. |
Sector backupSuper | The address of the sector containing the backup copy of the superblock in its first 512 bytes. A driver must use this number to know where to write the backup copy when changes are made to the superblock. In the event the primary copy of the superblock gets lost or damaged, recovery utilities should search for the backup copy of the superblock looking for valid magic and checksum fields, and may also use the backupSuper field itself for further validation. |
Sector bitmapStart | The address of the sector where the bitmap of the first band (that is, band 0) starts. This is usually the sector right after the superblock. For any other band, the bitmap must start in the first sector of the band. The size of the bitmap in each band is related to the logSectorsPerBand field. |
Sector rootInode | The inode number (that is, the address of the first sector) of the root directory. This is usually the sector right after the bitmap for band 0. |
Sector badInode | The inode number (that is, the address of the first sector) of a pseudo-file to track bad sectors. This pseudo-file should be generated and managed by a file system checkup program, and should be made up of any sectors that the program detected as bad. This ensures that a driver can not allocate them. If the volume has no bad sectors, badInode must be zero. |
ubyte[388] reserved | These bytes are reserved for future use. They must be filled with zeroes to pad the size of the superblock structure to 512 bytes and ignored by drivers. They do count for checksum calculation. |
The bitmap is a non-contiguous data structure, made up of sectors that, as a whole, form an array of bits, each of them indicating the allocated/unallocated state of each sector of a volume. The bitmap is not contiguous because its sectors are subdivided into bands, instead of being stored in a single chunk in a fixed location.
The total number of sectors in a volume is specified by the sectorsCount field of the superblock. Thus, the bitmap as a whole must contain sectorsCount bits, one for each sector. The number of sectors required to store the bitmap (that is, the total bitmap size) is given by ceil(sectorsCount / (bytesPerSector * 8)), where ceil rounds up, and bytesPerSector is the size in bytes of a sector (a quantity that drivers may want to precalculate for several purposes). Any bits after the first sectorsCount are unspecified. Each band must store a portion of the bitmap, according to the band size in sectors, that is the size of each bitmap portion must be sectorsPerBand / (bytesPerSector * 8) sectors, where sectorsPerBand can be easily computed from the superblock. The size of each bitmap portion will result an integral number of sectors due to the constraints specified for logSectorsPerBand.
A bit of the bitmap must be set to '1' if its corresponding sector is used/allocated, whereas it must be set to '0' if its corresponding sector is free/unallocated. Sector 0 is identified by bit 0 (the LSB) of the first byte of the bitmap, sector 1 by bit 1 of the first byte, sector 7 by bit 7 (the MSB) of the first byte, sector 8 by bit 0 of the second byte, and so on.
The sectors occupied by the superblock, the superblock backup, the bitmap itself and any reserved sectors containing boot loader code are not free for data storage, thus must be marked as allocated in the bitmap.
The following code snippet shows how to locate a bit for the sector specified by sector.
/* The following input values and precalcs are assumed: * Sector sector = the sector to locate the bitmap entry for * Superblock sb = a structure containing the superblock * Sector sectorsPerBand = 1 << sb.logSectorsPerBand * uint bytesPerSector = 1 << sb.logBytesPerSector */ Sector band = sector >> sb.logSectorsPerBand; Sector sectorInBand = sector & (sectorsPerBand - cast(Sector) 1); Sector bitmapSector = (sectorInBand >> (3 + sb.logBytesPerSector)) + (band << sb.logSectorsPerBand); if (band == 0) bitmapSector += sb.bitmapStart; uint bitOfSector = sectorInBand & (bytesPerSector * 8 - 1);
Figure 2: File Allocation in LEAN.
The data area is made up of any sector not used by the bitmap, the superblock, the superblock backup and sectors reserved for boot code. It is in general not contiguous due to the band-based layout of the bitmap. It stores actual file system contents: files, directories and their related book keeping (that is the list of their allocated sectors).
Besides being unused, each sector in the data area can be either a data sector or an indirect sector. The former contains actual data, the latter contains book keeping. Data sectors are grouped together to form extents when they are contiguous. An extent is specified by a start and a size in sectors.
Each file is stored in one or more extents, thus a file allocation can be described as an array of extent specifications. The first sector of a file stores the inode structure of the file. Actual file data must begin right after the inode structure. This introduces an offset when converting between file data position and data sectors. The inode structure and indirect sectors are used to know what extents belong to a file. This is done in two ways: direct addressing and indirect addressing.
To speed up file access on small files, a limited number of extent specifications is stored in the inode structure itself. A driver must use this information to directly access any of the first extents of the file in constant time. The constant EXTENTS_PER_INODE, currently defined to 7, specifies how many extent specifications are saved into the inode structure.
If a file needs more than EXTENTS_PER_INODE extents, indirect addressing is used for subsequent extents. An indirect sector is a sector storing an array of extent specifications instead of file data. The number of extent specifications per indirect sectors depends on the sector size (60 extent specifications can be saved in a 512-byte sector). Indirect sectors are linked together to form a doubly-linked list in order to support arbitrary file lengths. The following is the layout of an indirect sector:
uint checksum | The checksum value for the indirect sector. All bytes in the sector do count for checksum calculation. |
uint magic | This must be equal to 0x58444E49 (the 'INDX' characters in ASCII). |
Sector sectorsCount | The total number of sectors addressable using this indirect sector. A driver should use this information to quickly traverse the list of indirect sectors to find the desired data sector. |
Sector inode | The inode number of the file this indirect sector belongs to. This is provided for robustness. |
Sector thisSector | The address of the sector storing this indirect sector. This is provided for robustness. |
Sector prevIndirect | The address of the previous indirect sector. If there is no previous indirect sector, this field must be zero. A driver may use this value to traverse the list of indirect sectors in reverse order. |
Sector nextIndirect | The address of the next indirect sector. If there is no next indirect sector, this field must be zero. This is the primary means for a driver to traverse the list of indirect sectors to seek into a file. |
Sector extentsCount | The number of valid extent specifications stored in the indirect sector. |
Extent[0] extents | The array of extent specifications stored in the indirect sector. It must have as many entries as the sector size allows (for example, 60 extent specifications if the sector size is 512 bytes). All extent specifications in this array must be used, unless the indirect sector is the last one of the file. The extentsCount field specifies how many entries of this array are valid, thus refer to actual file extents. |
The extent-based approach allows efficient and simple access to the file system. It is efficient because, once the driver knows an extent specification, it can access any sector of that extent in constant time. It is also simple because, for each extent, only an offset must be added to its starting sector to find any other sector.
This works best if the file is not badly fragmented, that is unless the file is made up of several extents. Being the first EXTENTS_PER_INODE extent specifications saved in the inode structure, it is expected that most small files would need no indirect sectors, thus any sector can be accessed in constant time with no additional reads. Large files can benefit from this too, if they are mostly contiguous, although this is not as likely.
Indirect sectors store a much greater amount of extent specifications than the inode structure, thus it is expected that with a few indirect sectors a file is completely specified. For example a file made up of 200 fragments would require 4 indirect sectors, assuming a sector size of 512 bytes. They may be buffered for faster operation. The number of indirect sectors, if any are present, is determined only by the count of fragments, not by the file size (at least not directly).
This approach has been chosen in order to advantage random access on small files (e.g. configuration files), but sequential access for bigger files (e.g. multimedia files), as well as keeping the file system implementation as simple as possible.
In order to use the file system as efficiently as possible, drivers may implement allocation strategies when creating or extending files. This specification does not define or mandate such allocation strategies, but a few suggestions are presented.
When extending a file, a driver should try to grow its last extent. The last extent of a file can be easily found using the extentsCount and/or lastIndirect fields of the inode structure. The following algorithm may be used:
if the sector right after the end of the last extent is free
{
grow the last extent increasing its size by 1;
}
else
{
find a free sector as close as possible to the last extent;
if a new indirect sector is needed
{
write the indirect sector in that free sector;
find a free sector as close as possible to the indirect sector just written;
}
create a new extent starting at that free sector and with size 1;
}
To improve locality in space between directories and their files, and to help reduce fragmentation, a driver may use the following algorithm when creating a new file. It assumes that most files will have a single hard link, thus a single parent directory:
locate the band where the parent directory of the new file ends;
if the new file is not a directory
{
find a free sector as close as possible to the end of the parent directory;
}
else
{
generate a band number different from the one of the parent directory;
find a free sector in that band;
}
create the new file in that sector and store there its inode structure;
To help implementing these algorithms, a driver may provide a parameter to the procedure searching for free sectors, specifying a suggested sector where to start to search from.
All metadata of a file are stored in a structure called inode structure. File metadata include file size, attributes and time stamps, as well as the means to access directly- and indirectly-addressable extents of a file.
The inode structure must be stored in the first 128 bytes of the first data sector of a file. The address of that first data sector must be used as inode number, that is a serial number to uniquely identify the file in a volume. It is not supposed to uniquely identify the file on the whole system: the operating system or a driver may use the inode number together with some way to identify the volume (for example, its GUID or an opaque identifier) to uniquely identify the file system-wise.
Putting the inode structure in a file data sector lets avoid a separate inode table, allowing an arbitrary number of files per volume and simplifying inode management, at the price of some disk space (expecially for very small files that would otherwise fit in a single sector or zero-length files).
The format of the inode structure is the following:
| Field | Description |
|---|---|
uint checksum | The checksum value for the inode structure. |
uint magic | This must be equal to 0x45444F4E (the 'NODE' characters in ASCII). |
uint inodeSize | The size in bytes of this inode structure. Drivers must use this value to get the offset to the first byte of the file data. |
uint linksCount | The number of hard links (the count of directory entries) referring to this file. It must be at least 1. |
ulong fileSize | The size in bytes of the file data, thus excluding the size of the inode structure and indirect sectors. |
long accTime | Last access time stamp. This field is computed as the signed number of milliseconds since 1970-01-01 00:00:00.000 UTC. If this value is unknown this field must be set to long.min, that is -(263). Drivers may choose to not update this field if performance is a costraint or the disk has limited write capabilities (e.g. flash memories). A mount-time flag or an open-time flag may be implemented to disable last access time stamps, and the implementation may want this to be the default. |
long creTime | Creation or status modification time stamp. This field is computed as the signed number of milliseconds since 1970-01-01 00:00:00.000 UTC. If this value is unknown this field must be set to long.min, that is -(263). |
long modTime | Last write/modification time stamp. This field is computed as the signed number of milliseconds since 1970-01-01 00:00:00.000 UTC. If this value is unknown this field must be set to long.min, that is -(263). |
Extent[EXTENTS_PER_INODE] extents | This array stores the specifications of the first EXTENTS_PER_INODE extents of the file. The count of valid extent specifications is determined by the extentsCount field. Any unused extent specification is unspecified. EXTENTS_PER_INODE must be equal to 7. |
Sector extentsCount | The count of extents the file is made up of. This amount includes directly addressable extents (see the extents field) and indirectly addressable extents. It must be at least 1 (that is the first extent, containing the inode structure itself). Indirect sectors are not included. |
Sector firstIndirect | The address of the first indirect sector of the file. If extentsCount < EXTENTS_PER_INODE the file has no indirect sectors and this field is unspecified. |
Sector lastIndirect | The address of the last indirect sector of the file. If extentsCount < EXTENTS_PER_INODE the file has no indirect sectors and this field is unspecified. |
uint attributes | The attributes mask for the file, see enum InodeAttributes. It includes POSIX permissions, behavior flags and the file type specification. The current implementation is monouser, thus only the owner permission bits (InodeAttributes.xUSR) are used. The other permission bits (group, others, suid, sgid, sticky) are reserved for possible future multiuser implementations of the file system. The way these attributes are implemented is operating system dependent. Note that the "file type" values are mutually exclusive enumerated values, they are not single bits that can be combined. The InodeAttributes.FMT mask can be used to retrieve the file type. |
uint uid | The numeric identifier of the user that owns the file. This field is reserved for possible future multiuser implementations of the file system, and must be set to 0. |
uint gid | The numeric identifier of the group of the file. This field is reserved for possible future multiuser implementations of the file system, and must be set to 0. |
| Symbolic name | Value | Description |
|---|---|---|
| RUSR | 1 << 8 | Owner permission: read |
| WUSR | 1 << 7 | Owner permission: write |
| XUSR | 1 << 6 | Owner permission: execute |
| HIDDEN | 1 << 12 | Do not show in default directory listings |
| SYSTEM | 1 << 13 | Warn that this is a system file |
| ARCHIVE | 1 << 14 | File changed since last backup (must be set on any write) |
| SYNC | 1 << 15 | Writes must be committed immediatly (besides open- or mount-time flags) |
| NOATIME | 1 << 16 | Do not update last access time (besides open- or mount-time flags) |
| IMMUTABLE | 1 << 17 | Do not move file sectors (e.g. for defraggers) |
| FMT | 7 << 29 | Bit mask to extract the file type |
| FREG | FileType.REG << 29 | File type: regular file (see enum FileType) |
| FDIR | FileType.DIR << 29 | File type: directory (see enum FileType) |
| FLNK | FileType.LNK << 29 | File type: symbolic link (see enum FileType) |
Figure 3: Directory format in LEAN.
A directory is a file containing an unordered list of directory entries. Each directory entry is a structure which binds a file name with an inode structure, thus each directory entry is a hard link to a file. Directory entries do not store file metadata, except what specified below.
A directory entry is a variable length structure, containing a fixed length header (8 bytes long) and a variable length part containing the file name. The size of the whole structure must be a multiple of 8 bytes, that is the size of the fixed length header. This helps to reuse space from deleted directory entries. A directory entry must be completely contained in a single sector, in order to ease memory management in drivers. This limits file name lengths to 504 bytes if the sector size is 512 bytes.
File name storage must be case sensitive in order to simplify name comparison, avoiding case folding to upper or lower case, that is not trivial in Unicode. Drivers may implement case insensitive file name matching in an unspecified way. In environments lacking support for long file names, or requiring short file name aliases, a driver may fabricate short file names on the fly, that is, without having them physically stored in the directory. The algorithm used to fabricate short file names is unspecified, but a proposed procedure is shown below.
This is the format of a directory entry:
| Field | Description |
|---|---|
ubyte type | This is a shortcut to the file type, see enum FileType. A driver must keep it consistent with the file type bits in the attributes of the inode structure. |
ubyte recLen | This is the total size (thus including the fixed length header) of the directory entry, in 8 bytes units. It must be at least 1. This field must be valid even for deleted or empty directory entries, as it must be used as a link to the next directory entry. Being recLen an 8-bit value, the length of a directory entry is further clamped to 2040 bytes. |
ushort nameLen | The length in bytes of the file name stored in the name field. It must be greater than zero. |
Sector inode | The inode number of the file linked by this directory entry, that is the address of the first sector of the file. Drivers must use it to locate the inode structure to do any subsequent processing on the file. |
char[0] name | The file name. This string must not be null terminated and must be encoded in Unicode UTF-8. Its length must be be determined solely by the nameLen field, and padding bytes optionally present after the nameLen bytes are undefined. All Unicode code points are allowed. |
The following table shows the possible values for the type field of a directory entry. A value of zero must indicate that the directory entry is empty (available, or deleted). A file may be undeleted by setting type back to the corresponding value in the inode structure, until its sectors are physically overwritten (the magic and checksum fields should be checked for robustness in this case). Note that these values are mutually exclusive enumerated values, they are not single bits that can be combined.
| Symbolic name | Value | Description |
|---|---|---|
| NONE | 0 | No file type, the directory entry is empty |
| REG | 1 | Regular file |
| DIR | 2 | Directory |
| LNK | 3 | Symbolic link |
The two special entries "." (dot) and ".." (dotdot) must be the first entries of every directory. The former must be a hard link to the directory itself, the latter must a hard link to the parent directory. In the root directory, which has no parent, ".." must be a hard link to itself, too.
A driver may use fictitious "." and ".." entries ignoring the actual entries in the directory, for example to assign ".." to the parent directory of a mount point. If showing "." and ".." to the user is a concern, a driver may choose to make them hidden entries, for example those of the root directory.
This algorithm must be used to search an entry in a directory, using appropriate file name matching criteria:
while not EOF
{
read an 8-byte directory header;
if (DirEntry.type != FileType.NONE)
{
read the file name, that is guaranteed to fit in the same sector;
compare against the required file name;
if the required entry is found return this directory entry;
}
advance using the information provided in recLen;
}
return "file not found";
To delete a hard link to a file, a driver must set the type field of the directory entry to zero (that is FileType.NONE) and decrease the linksCount field of its inode structure. The space occupied by the deleted directory entry (recLen * 8 bytes) is made available for reuse (see the 3rd entry in figure 3). If some degree of security is needed, a driver may zero all bytes of a directory entry, except recLen.
If the linksCount of the inode structure of the file reaches zero, the sectors of the file (both data and indirect) must be deleted. To do this, drivers must set the corresponding bits in the bitmap to '0'. This is efficient and allows to recover data, until physically overwritten. If some degree of security is needed, a driver may also zero or fill with garbage the actual content of the deleted sectors.
This procedure must be used to create a new hard link to a file.
Scan the directory for one or more contiguous available directory entries, enough to store the new directory entry. This is because available entries are not coalesced together. The required size of the directory entry, in 8 bytes units, must be computed as recLen = (8 + nameLen + 7) >> 3. Available directory entries must be identified for having type == FileType.NONE (see above).
If enough space to store the new directory entry is found, the new directory entry should be stored in that place, otherwise the new directory entry must be appended to the directory, thus extending it. The new directory entry must not span across different sectors. If the new directory entry is shorter than the free space found, a driver must fabricate and append a new directory entry header (with type equal to FileType.NONE) to keep the link with the next directory entry consistent (see the 6th entry in figure 3).
In environments lacking support for long file names, or requiring short file name aliases, a driver may fabricate short file names. Fabricated short file names are not supposed to be stored on the file system as new hard links, but rather generated on the fly to the user when they are needed. How to fabricate short file names in such cases is unspecified. The following is an algorithm that drivers may use if appropriate:
A symbolic link (symlink, soft link) is a special file storing the pathname of another file. Its file type must be FileType.LNK.
The string representing the pathname must be encoded in Unicode UTF-8 and must not be null terminated. The length of this string must be equal to the file size of the symbolic link.
The represented pathname may be either absolute or relative. A driver or other appropriate parts of an implementation should provide the means to follow the symbolic link. In a system where pathnames may include a drive or device specification, the pathname stored by the symbolic link may contain a drive or device specification too. The way the implementation behaves in the event the drive or device referred by the symbolic link change (for example a removable or network drive, or when the volume is used in a system not supporting drive or device specifications) is implementation specific.
The meaning of attributes for symbolic links is unspecified. A driver may apply their regular meaning to the symbolic link itself (for example, a hidden symbolic link may be not showed in a directory listing, whether the target is hidden or not, or the permissions of a symbolic link may refer to what can be done with the target when accessed via the symbolic link), or may use any conventions normally used by the system for symbolic links.