LEAN File System specification

This document describes version 0.4 of LEAN (Lean yet Effective Allocation and Naming), a new free, simple, portable, personal, fully featured file system for embedded tasks. The third number in the version number identifies changes made to this document (e.g. wording) that do not change the file system format.

The LEAN file system is in the alpha (unstable) development stage: this document supersedes any previous version of the LEAN file system specification with no care for backward compatibility.

Since this is a new file system, and some aspects are still to be defined, suggestions or corrections are welcome, for either the file system or this document. Please contact the author sending e-mail at: salvois at users.sourceforge.net. The author wishes to thank those who have submitted comments and criticism in order to improve the LEAN file system.

Table of contents

Overview

The mission of the LEAN file system is to provide a new free, simple, portable, personal file system alternative to the proprietary (and in part patented) Microsoft FAT file system, primarly intended for media exchange and for use with embedded devices, and to expand the file system functionality of the FreeDOS-32 free modular operating system.

LEAN is a recursive backronym for Lean yet Effective Allocation and Naming, based on the ironical contrary of FAT, as an English word.

Design choices

The LEAN file system is somewhat conceptually inspired from the Linux Second Extended file system (EXT2), but it is much simpler. Although developed independently, some design choices of the last versions of LEAN resemble those of the QNX file system.

LEAN aims to be the simplest file system providing the most common and useful file system features. Any design choices have gone in the direction of the greatest simplicity rather than performance. Some claim that programmers like the FAT file system for its simplicity. LEAN aims to be much simpler yet to perform better.

LEAN is designed for use even by platforms with limited RAM and CPU power.

Limits

The LEAN file system may be used on small media (as small as a floppy) as well as on bigger media. The basic allocation unit is a conventional sector of 512 bytes. Sectors are addressed using 64-bit numbers, thus file system volumes may be arbitrarily large in practice. The total number of files, the size of a file and the number of files per directory are only limited by the free volume space.

Allocation

The volume status is stored in a superblock at the beginning of the hosting block device, as well as in a superblock backup in another location. Free sectors are tracked by a bitmap, that is evenly distributed across the file system.

File metadata are stored in inode structures, located in the first sector of each file, like a header. An inode structure stores the attributes of a file and the information to access any other sector of a file. Each file is uniquely identified by an inode number, that is the address of its first sector.

File allocation is extent-based, that is each file is composed by several chunks, called extents, each made up of one or more of contiguous sectors. Extent addressing is direct for the first extents of a file (in order to maximize performance on small files), storing the extent specirfications in the inode structure itself, whereas it is indirect for subsequent extents, using a doubly-linked list of level-1 indirect sectors.

File system objects

In order to simplify file system driver implementations, LEAN basically defines only one type of file system object, that is a file. Other kind of file system objects, such as directories, are implemented using the same model, that is the allocation type described in the previous paragraph.

Regular files contains data, and are accessible with one or more names.

Directories are files which store an unordered list of directory entries. The LEAN file system provides case sensitive international long file names, encoded in Unicode UTF-8, up to 500 bytes long. Both hard links and symbolic links are available.

Forks are files which store an unordered list of extended attributes. Extended attributes are optional name-value pairs of metadata associated with a file. Up to one fork can be associated with each non-fork file system object. An arbitrary number of extended attributes can be associated with a file in this way. For performance reasons, a limited amount of extended attributes may also be embedded in the non-fork file itself, right after the inode structure, before the actual file data.

Symbolic links are files which store a single string that is the pathname of the pointed-to object.

Résumé

The following table summarizes the features of the LEAN file system:

Allocation unit512-byte conventional sectors
Max volume size263-1 sectors = trillions of gigabytes
Max file size264-1 bytes = billions of gigabytes
Number of filesarbitrary
Files per directoryarbitrary
Free space trackingdistributed bitmap
Allocation typemixed linked/indexed, extent-based
Directory formatplain unordered list
File namesup to 500 bytes, case sensitive, in UTF-8
Linkshard and symbolic
Extended attributesname-value metadata, embedded in a file or in a separate fork
Forksplain unordered list of extended attributes associated with a file
Access controloptional POSIX permissions
Endiannesslittle endian

Reference implementation

A reference implementation, written in C++, is about to be completed.

Differences from the previous versions and future plans

File system format

This section describes the architecture of the LEAN file system as stored in a hosting block device.

Definitions

The following terms and conventions will be used throughout this specification:

Partition identifier

On the i386 platform, each file system is identified by an 8-bit number in an entry of an MBR partition table. The value 0xEA (after LEAN) has been proposed as it seems unassigned.

For GUID partition tables the following GUID is provided to identify a LEAN data partition:

bb5a91b0-977e-11db-b606-0800200c9a66

Structure identification and checksum

Several structures of LEAN include fields to make the file system more robust.

Sensitive structures store a checksum field in their first 32 bits. The checksum is computed on any data following the checksum field itself, up to the end of the sensitive structure. The sensitive structure must be cast to an array of 32-bit unsigned integers, then each element of this array must be summed, rotating the sum one bit to right at every iteration. The size of the sensitive structure must be a multiple of 4 bytes. The checksum must be recomputed at least every time a sensitive structure is written back to disk.

The following function shows how to calculate the checksum. The data parameter is a pointer to the sensitive structure, including the checksum field in its first 4 bytes. The size parameter is the size in bytes of the structure pointed by data, including the 4 bytes for the checksum field. It must be a multiple of 4 bytes. The function skips the first 4 bytes when computing the checksum.

uint32_t computeChecksum(const void* data, size_t size)
{
	uint32_t res = 0;
	const uint32_t* d = static_cast<const uint32_t*>(data);
	assert((size & (sizeof(uint32_t) - 1)) == 0);
	size /= sizeof(uint32_t);
	for (size_t i = 1; i != size; ++i) res = (res << 31) + (res >> 1) + d[i];
	return res;
}

A sensitive structure also contains a magic field storing a 32-bit constant signature identifying the structure. This can be used as a first test to validate a sensitive structure.

A field specifying the address of the sector storing the structure itself is sometimes provided. This is another robustness measure against repair utilities, in case they are scanning a disk for sensitive data structure, and an image file containing a LEAN file system is stored on that disk.

Layout of a LEAN volume

Figure 1: Layout of a LEAN volume.

Bands

To improve locality in space and to help dinamically change the size of a volume, the disk is logically subdivided into bands, made up of several contiguous sectors. Bands are equally sized, perhaps except the last one, and must be made up of a power of two worth of sectors. A volume contains at least one band. Bands are numbered starting from 0, where sector 0 is the first sector of band 0.

Each band contains a portion of the bitmap representing the allocation state of the sectors of that band. That is, each bitmap portion represents the state of closely located sectors. The bitmap portion for each band must start at the first sector of the band. Band 0 is an exception in that it contains sectors reserved for the boot loader and the superblock, thus a specific field is used to tell where the bitmap of band 0 starts.

The superblock

The superblock is a structure containing fundamental information about a LEAN volume. Its name comes from the Linux EXT2 file system, and it is used to differentiate it from the boot sector, that on i386 is sector 0, containing the code of the boot loader.

A driver must use the superblock to know how to access every other information on the volume. Thus, the superblock must be stored in a well known location. The superblock must be stored in one of any sector in range from 1 to 32, inclusive. This range should allow to scan for a superblock taking advantage of track buffering from the disk driver, while being flexible enough to overcome the limits of a fixed location. This leaves up to 16 KiB for the boot loader. A driver must use the magic and checksum fields of the superblock to identify a valid superblock.

A copy of the superblock must be present in a different location for backup purposes. A driver must ensure that the content of the backup copy is always exactly the same as the superblock. The location of the superblock backup is stored in the superblock to allow a driver update the backup copy. The backup copy of the superblock should be placed in the last sector of the first band, in order to help drivers locate it in the event the primary copy gets lost or corrupted (being band sizes constrained to powers of two). The magic, checksum and backupSuper fields must be used in this case.

Sectors before the superblock are reserved for a boot loader, they must not be used by a LEAN driver, and their content is undefined. The superblock may be shifted forward, up to the limit specified above, if more sectors are needed to store a boot loader. Sector 0 is always reserved.

In the FAT file system the boot sector embeds, besides the boot loader, volume status information. Thus, if a boot sector containing apparently valid FAT information is used on a LEAN volume, FAT drivers may erroneously detect the volume as FAT. Implementors and/or users may remove FAT related information from the boot sector (for example, zeroing every byte but the actual boot code), or try FAT detection at the end of the autodetection process. The LEAN design avoids storage of volume information in the boot sector for several reasons: platform independence, to use a compact, free format, where the fields are properly aligned and ordered, for robustness against a damaged sector 0, and to make room for boot loader code.

The format of the superblock is the following:

struct Superblock
uint32_t checksumThe checksum value for the superblock.
uint32_t magicThis must be equal to 0x4E41454C (the 'LEAN' characters in ASCII), and it must be used to identify a valid LEAN file system. It should be used to find the superblock backup in case the primary copy gets lost or damaged.
uint16_t fsVersionThis field identifies the version of the LEAN file system, and it is provided for future development. The high byte identifies the major version number and the low byte the minor version number (for example 0x0110 would mean version 1.16). At present, in the alpha development stage, it must be set to 0x0004 (that is version 0.4) and drivers must not try to access an unknown file system version, backward compatibility making no sense.
uint8_t defaultInodeExtraThe default extra space to reserve for extended attributes between the inode structure and file data, in 16-byte units (see the inodeExtra field of the inode structure).
uint8_t logSectorsPerBandThe base-2 logarithm of the number of sectors per band. The latter is easily computed as 1 << logSectorsPerBand. It must be at least equal to 12 (4096 bits per band, that is 512 bytes of a sector), so that a bitmap portion is always made up of an integral amount of sectors. This means that the smallest band size is 4096 sectors, that is 2 MiB.
uint32_t stateThis field identifies the state of the file system.
Bit 0 (the LSB) is the clean unmount bit: it must be set to '0' while the volume is mounted, and it must set '1' while the volume is unmounted. Thus, if driver finds this bit set to '0' when mounting a volume, it means the volume was not unmounted properly.
Bit 1 is the error bit: it must be normally '0'. Drivers must set it to '1' in case of major errors, like when file system corruption is detected. File system checkup programs may use this bit to know if some major problem happened with the volume.
Bits 2 to 31 are reserved for future use.
uint8_t guid[16]A 128-bit globally unique identifier (GUID) that should uniquely identify the volume. A driver should use it to check for media change in a removable media drive. A driver may use it for automatic volume identification. Drivers must create a sequence of 16 bytes as unique as possible, for example using a good random number generator or one of the canonical algorithms for GUID generation.
char volumeLabel[64]A descriptive string, encoded in Unicode UTF-8, null-terminated, to help the user identify the volume. It may be empty.
uint64_t sectorsCountThe total number of sectors that form the volume.
uint64_t freeSectorsCountThe number of free (or available, or unallocated) sectors in the volume. A driver must keep it consistent to the rest of the file system to measure the free volume space to store files, directories and book keeping. A value of zero must mean disk full.
uint64_t primarySuperThe address of the sector containing the primary copy of the superblock. A driver must use this number to know where to write changes made to the superblock back to disk. This field is provided for robustness, for example against repair utilities operating on a disk containing a LEAN file system image file.
uint64_t backupSuperThe address of the sector containing the backup copy of the superblock. A driver must use this number to know where to write the backup copy when changes are made to the superblock.
In the event the primary copy of the superblock gets lost or damaged, recovery utilities should search for the backup copy of the superblock looking for valid magic and checksum fields, and may also use the backupSuper field itself for further validation.
uint64_t bitmapStartThe address of the sector where the bitmap of the first band (that is, band 0) starts. This is usually the sector right after the superblock. For any other band, the bitmap must start in the first sector of the band. The size of the bitmap in each band is related to the logSectorsPerBand field.
uint64_t rootInodeThe inode number (that is, the address of the first sector) of the root directory. This is usually the sector right after the bitmap for band 0.
uint64_t badInodeThe inode number (that is, the address of the first sector) of a pseudo-file to track bad sectors. This pseudo-file should be generated and managed by a file system checkup program, and should be made up of any sectors that the program detected as bad. This ensures that a driver can not allocate them. If the volume has no bad sectors, badInode must be zero.
uint8_t reserved2[360]Reserved for future use. These bytes pad the size of the superblock structure to 512 bytes and must be included in checksum calculation.

The bitmap

The bitmap is a non-contiguous data structure, made up of sectors that, as a whole, form an array of bits, each of them indicating the allocated/unallocated state of each sector of a volume. The bitmap is not contiguous because its sectors are subdivided into bands, instead of being stored in a single chunk in a fixed location.

The total number of sectors in a volume is specified by the sectorsCount field of the superblock. Thus, the bitmap as a whole must contain sectorsCount bits, one for each sector. The number of sectors required to store the bitmap (that is, the total bitmap size) is given by ceil(sectorsCount / (512 * 8)), where ceil means round up. Any bits after the first sectorsCount are unspecified. Each band must store a portion of the bitmap, according to the band size in sectors, that is the size of each bitmap portion must be sectorsPerBand / (512 * 8) sectors, where sectorsPerBand can be easily computed from the superblock. The size of each bitmap portion will result an integral number of sectors due to the constrains specified for logSectorsPerBand.

A bit of the bitmap must be set to '1' if its corresponding sector is used/allocated, whereas it must be set to '0' if its corresponding sector is free/unallocated. Sector 0 is identified by bit 0 (the LSB) of the first byte of the bitmap, sector 1 by bit 1 of the first byte, sector 7 by bit 7 (the MSB) of the first byte, sector 8 by bit 0 of the second byte, and so on.

The sectors occupied by the superblock, the superblock backup, the bitmap itself and any reserved sectors containing boot loader code are not free for data storage, thus must be marked as allocated in the bitmap.

The following code snippet shows how to locate a bit for the sector specified by sector.

/*
The following input values and precalculated values are assumed:
  uin64_t sector = the sector to locate the bitmap entry for
  Superblock sb = a structure containing the superblock
  uint64_t sectorsPerBand = 1 << sb.logSectorsPerBand
  const size_t LOG_SECTOR_SIZE = 9;
  const size_t SECTOR_SIZE = 512;
*/
uint64_t band = sector >> sb.logSectorsPerBand;
uint64_t sectorInBand = sector & (sectorsPerBand - 1);
uint64_t bitmapSector = (sectorInBand >> (3 + LOG_SECTOR_SIZE)) + (band << sb.logSectorsPerBand);
if (band == 0) bitmapSector += sb.bitmapStart;
unsigned bitOfSector = sectorInBand & (SECTOR_SIZE * 8 - 1);
File Allocation in LEAN

Figure 2: File Allocation in LEAN.

The data area and file allocation

The data area is made up of any sector not used by the bitmap, the superblock, the superblock backup and sectors reserved for boot code. It is in general not contiguous due to the band-based layout of the bitmap. It stores actual file system contents: files, directories and their related book keeping (that is the list of their allocated sectors).

Besides being unused, each sector in the data area can be either a data sector or an indirect sector. The former contains actual data, the latter contains book keeping. Data sectors are grouped together to form extents when they are contiguous. An extent is specified by a start and a size in sectors.

Each file is stored in one or more extents, thus a file allocation can be described as an array of extent specifications. The first sector of a file stores the inode structure of the file. Actual file data must begin right after the inode structure. This introduces an offset when converting between file data position and data sectors. The inode structure and indirect sectors are used to know what extents belong to a file. This is done in two ways: direct addressing and indirect addressing.

Direct addressing

To speed up file access on small files, a limited number of extent specifications is stored in the inode structure itself. A driver must use this information to directly access any of the first extents of the file in constant time. The constant EXTENTS_PER_INODE, currently defined to 6, specifies how many extent specifications are saved in the inode structure.

Indirect addressing

If a file needs more than EXTENTS_PER_INODE extents, indirect addressing is used for subsequent extents. An indirect sector is a sector storing an array of EXTENTS_PER_INDIRECT = 38 extent specifications instead of file data. Indirect sectors are linked together to form a doubly-linked list in order to support arbitrary file lengths. The following is the layout of an indirect sector:

struct Indirect
uint32_t checksumThe checksum value for the indirect sector. All bytes in the sector do count for checksum calculation.
uint32_t magicThis must be equal to 0x58444E49 (the 'INDX' characters in ASCII).
uint64_t sectorsCountThe total number of sectors addressable using this indirect sector. A driver should use this information to quickly traverse the list of indirect sectors to find the desired data sector.
uint64_t inodeThe inode number of the file this indirect sector belongs to. This is provided for robustness.
uint64_t thisSectorThe address of the sector storing this indirect sector. This is provided for robustness.
uint64_t prevIndirectThe address of the previous indirect sector. If there is no previous indirect sector, this field must be zero. A driver may use this value to traverse the list of indirect sectors in reverse order.
uint64_t nextIndirectThe address of the next indirect sector. If there is no next indirect sector, this field must be zero. This is the primary means for a driver to traverse the list of indirect sectors to seek into a file.
uint32_t extentsCountThe number of valid extent specifications stored in the indirect sector, that is the count of the first elements in the extentStarts and extentSizes arrays that refer to actual file extents. If the indirect sector is not the last indirect sector of a file, all extent specification must be used, that is extentsCount must be equal to EXTENTS_PER_INDIRECT.
uint32_t reservedPadding, reserved for future use.
uint64_t extentStarts[EXTENTS_PER_INDIRECT]The array of starting sectors of the extent specifications. EXTENTS_PER_INDIRECT must be equal to 38.
uint32_t extentSizes[EXTENTS_PER_INDIRECT]The array of extent sizes of the extent specifications. Each element represents the size in sectors of the corresponding extent whose first sector is specified in extentStarts.

Rationale

The extent-based approach allows efficient and simple access to the file system. It is efficient because, once the driver knows an extent specification, it can access any sector of that extent in constant time. It is also simple because, for each extent, only an offset must be added to its starting sector to find any other sector.

This works best if the file is not badly fragmented, that is unless the file is made up of several extents. Being the first EXTENTS_PER_INODE extent specifications saved in the inode structure, it is expected that most small files would need no indirect sectors, thus any sector can be accessed in constant time with no additional reads. Large files can benefit from this too, if they are mostly contiguous, although this is not as likely.

Indirect sectors store a much greater amount of extent specifications than the inode structure, thus it is expected that with a few indirect sectors a file is completely specified. For example using just 5 indirect sectors it is possible to specify a file made up of 196 fragments. They may be buffered for faster operation. The number of indirect sectors, if any are present, is determined only by the count of fragments, not by the file size (at least not directly).

This approach has been chosen in order to advantage random access on small files (e.g. configuration files), but sequential access for bigger files (e.g. multimedia files), as well as keeping the file system implementation as simple as possible.

Allocation strategies

In order to use the file system as efficiently as possible, drivers may implement allocation strategies when creating or extending files. This specification does not define or mandate such allocation strategies, but a few suggestions are presented.

When extending a file, a driver should try to grow its last extent. The last extent of a file can be easily found using the extentsCount and/or lastIndirect fields of the inode structure. The following algorithm may be used:

if the sector right after the end of the last extent is free
{
	grow the last extent increasing its size by 1;
}
else
{
	find a free sector as close as possible to the last extent;
	if a new indirect sector is needed
	{
		write the indirect sector in that free sector;
		find a free sector as close as possible to the indirect sector just written;
	}
	create a new extent starting at that free sector and with size 1;
}

To improve locality in space between directories and their files, and to help reduce fragmentation, a driver may use the following algorithm when creating a new file. It assumes that most files will have a single hard link, thus a single parent directory:

locate the band where the parent directory of the new file ends;
if the new file is not a directory
{
	find a free sector as close as possible to the end of the parent directory;
}
else
{
	generate a band number different from the one of the parent directory;
	find a free sector in that band;
}
create the new file in that sector and store there its inode structure;

To help implementing these algorithms, a driver may provide a parameter to the procedure searching for free sectors, specifying a suggested sector where to start to search from.

The inode structure

All metadata of a file are stored in a structure called inode structure. File metadata include file size, attributes and time stamps, as well as the means to access directly- and indirectly-addressable extents of a file.

The inode structure must be stored at offset zero of the first data sector of a file. The address of that first data sector must be used as inode number, that is a serial number to uniquely identify the file in a volume. It is not supposed to uniquely identify the file on the whole system: the operating system or a driver may use the inode number together with some way to identify the volume (for example, its GUID or an opaque identifier) to uniquely identify the file system-wise.

Putting the inode structure in a file data sector allows to avoid a separate inode table, allowing an arbitrary number of files per volume and simplifying inode management, at the price of some disk space (expecially for very small files that would otherwise fit in a single sector or zero-length files) and possible performance degradation when doing a detailed directory listing including file metadata.

The inode structure is 160 bytes long, and it has the following format:

struct Inode
FieldDescription
uint32_t checksumThe checksum value for the inode structure.
uint32_t magicThis must be equal to 0x45444F4E (the 'NODE' characters in ASCII).
uint8_t inodeExtraAn extra amount of space between the end of the inode structure and the begin of the file data, in 16-byte units. This value must be between 0 and 22, inclusive, that is 0 to 352 bytes. This extra space, if any, may be used to store some extended attributes inline, avoiding additional disk access. The format of extended attributes is explained in the Extended attributes section. Drivers must compute the offset to the first byte of the file data as sizeof(Inode) + inodeExtra * 16.
uint8_t extentsCountThe number of valid extent specifications stored in the inode structure, that is the count of the first elements in the extentStarts and extentSizes arrays that refer to actual file extents, from 1 (the first extent contains the inode structure itself) to EXTENTS_PER_INODE = 6. Directly-addressable extents must be used from the first to the last before using indirect sectors. If extentsCount < EXTENTS_PER_INODE, unused extent specification are undefined.
uint16_t reserved1Padding, reserved for future use.
uint32_t reserved2Padding, reserved for future use.
uint32_t linksCountThe number of hard links (the count of directory entries) referring to this file. It must be at least 1.
uint32_t uidThe numeric identifier of the user that owns the file. Interpretation of this field is operating system dependent. A monouser driver must use zero.
uint32_t gidThe numeric identifier of the group that owns of the file. Interpretation of this field is operating system dependent. A monouser driver must use zero.
uint32_t attributesThe attributes mask for the file, see enum InodeAttributes. It includes POSIX permissions, behavior flags and the file type specification. Interpretation of POSIX permissions is operating system dependent. A monouser driver must use the owner user permission bits only (IA_xUSR). Note that the "file type" values are mutually exclusive enumerated values, they are not single bits that can be combined. The IA_FMT mask can be used to retrieve the file type.
uint64_t fileSizeThe size in bytes of the file data, thus excluding the size of the inode structure, the optional extra space for extended attributes, and indirect sectors.
int64_t accTimeLast access time stamp. This field is computed as the signed number of microseconds since 1970-01-01 00:00:00.000000 UTC, not including leap seconds. If this value is unknown this field must be set to INT64_MIN, that is -(263). Drivers may choose to not update this field if performance is a costraint or the disk has limited write capabilities (e.g. flash memories). A mount-time flag or an open-time flag may be implemented to disable last access time stamps, and the implementation may want this to be the default.
int64_t creTimeCreation or status modification time stamp. This field is computed as the signed number of microseconds since 1970-01-01 00:00:00.000000 UTC, not including leap seconds. If this value is unknown this field must be set to INT64_MIN, that is -(263).
int64_t modTimeLast write/modification time stamp. This field is computed as the signed number of microseconds since 1970-01-01 00:00:00.000000 UTC, not including leap seconds. If this value is unknown this field must be set to INT64_MIN, that is -(263).
uint64_t firstIndirectThe address of the first indirect sector of the file. If the file has no indirect sectors the value of this field must be zero.
uint64_t lastIndirectThe address of the last indirect sector of the file. If the file has no indirect sectors the value of this field must be zero.
uint64_t forkIf this file is not a fork, this field is the address of the first sector of the optional fork associated with this file, or zero if this file has not a fork. If this file is a fork, this field is the address of the first sector of the file this is a fork of.
uint64_t extentStarts[EXTENTS_PER_INODE]The array of starting sectors of the extent specifications. EXTENTS_PER_INODE must be equal to 6. The count of valid extent specifications, starting from index 0, is determined by the extentsCount field. Any unused extent specification is unspecified.
uint32_t extentSizes[EXTENTS_PER_INODE]The array of extent sizes of the extent specifications. Each element represents the size in sectors of the corresponding extent whose first sector is specified in extentStarts.
enum InodeAttributes
Symbolic nameValueDescription
IA_RUSR1 << 8POSIX permissions: owner user can read
IA_WUSR1 << 7POSIX permissions: owner user can write
IA_XUSR1 << 6POSIX permissions: owner user execute/search in directory
IA_RGRP1 << 5POSIX permissions: owner group can read
IA_WGRP1 << 4POSIX permissions: owner group can write
IA_XGRP1 << 3POSIX permissions: owner group execute/search in directory
IA_ROTH1 << 2POSIX permissions: others can read
IA_WOTH1 << 1POSIX permissions: others can write
IA_XOTH1 << 0POSIX permissions: others execute/search in directory
IA_SUID1 << 11POSIX permissions: execute as user ID
IA_SGID1 << 10POSIX permissions: execute as group ID
IA_SVTX1 << 9POSIX permissions: restrict rename/delete to owner
IA_HIDDEN1 << 12Do not show in default directory listings
IA_SYSTEM1 << 13Warn that this is a system file
IA_ARCHIVE1 << 14File changed since last backup (must be set on any write)
IA_SYNC1 << 15Writes must be committed immediatly (besides open- or mount-time flags)
IA_NOATIME1 << 16Do not update last access time (besides open- or mount-time flags)
IA_IMMUTABLE1 << 17Do not move file sectors (e.g. for defraggers)
IA_FMT7 << 29Bit mask to extract the file type
IA_FREG1 << 29File type: regular file (see enum FileType)
IA_FDIR2 << 29File type: directory (see enum FileType)
IA_FLNK3 << 29File type: symbolic link (see enum FileType)
IA_FFORK4 << 29File type: fork (see Extended attributes)
Directory format in LEAN

Figure 3: Directory format in LEAN.

Directories

A directory is a file containing an unordered list of directory entries. Each directory entry is a structure which binds a file name with an inode structure, thus each directory entry is a hard link to a file. Directory entries do not store file metadata, except what specified below.

A directory entry is a variable length structure, containing a fixed length header (16 bytes long, including the first four bytes of the name) and a variable length part containing the file name. The size of the whole structure must be a multiple of 16 bytes, that is the size of the fixed length header. This helps to reuse space from deleted directory entries. Directory entries must be aligned to 16-byte boundaries (this is the primary reason for the constrains on the size of the inode structure and the extra space for inline extended attributes). A directory entry must be completely contained in a single sector, so that drivers may use the sector buffer directly to access the name, instead of using a separate buffer to fetch it. This limits file name lengths to 500 bytes.

File name storage must be case sensitive in order to simplify name comparison, avoiding case folding to upper or lower case, that is not trivial in Unicode. Drivers may implement case insensitive file name matching in an unspecified way. Every file name in a directory must be unique. In environments lacking support for long file names, or requiring short file name aliases, a driver may fabricate short file names on the fly, that is, without having them physically stored in the directory. The algorithm used to fabricate short file names is unspecified, but a proposed procedure is shown below.

This is the format of a directory entry:

struct DirEntry
FieldDescription
uint64_t inodeThe inode number of the file linked by this directory entry, that is the address of the first sector of the file. Drivers must use it to locate the inode structure to do any subsequent processing on the file.
uint8_t typeThis is a shortcut to the file type, see enum FileType. A driver must keep it consistent with the file type bits in the attributes of the inode structure.
uint8_t recLenThis is the total size (thus including the fixed length header) of the directory entry, in 16-byte units. It must be at least 1. This field must be valid even for deleted or empty directory entries, as it must be used as a link to the next directory entry.
uint16_t nameLenThe length in bytes of the file name stored in the name field. It must be greater than zero.
char name[4]The file name. This string must not be null-terminated and must be encoded in Unicode UTF-8. Its length must be be determined solely by the nameLen field, and padding bytes optionally present after the nameLen bytes are undefined. All Unicode code points are allowed.

The following table shows the possible values for the type field of a directory entry. A value of zero must indicate that the directory entry is empty (available, or deleted). A file may be undeleted by setting type back to the corresponding value in the inode structure, until its sectors are physically overwritten (all magic and checksum fields should be checked for robustness in this case). Note that these values are mutually exclusive enumerated values, they are not single bits that can be combined, and that -by definition- there is no type for forks.

enum FileType
Symbolic nameValueDescription
FT_NONE0No file type, the directory entry is empty
FT_REGIA_FREG >> 29 = 1Regular file
FT_DIRIA_FDIR >> 29 = 2Directory
FT_LNKIA_FLNK >> 29 = 3Symbolic link

"." and ".." entries

The two special entries "." (dot) and ".." (dotdot) must be the first entries of every directory. The former must be a hard link to the directory itself, the latter must a hard link to the parent directory. In the root directory, which has no parent, ".." must be a hard link to itself, too.

A driver may use fictitious "." and ".." entries ignoring the actual entries in the directory, for example to assign ".." to the parent directory of a mount point. If showing "." and ".." to the user is a concern, a driver may choose to make them hidden entries, for example those of the root directory.

Looking up a directory

This algorithm must be used to search an entry in a directory, using appropriate file name matching criteria:

while not EOF
{
	read an 16-byte directory header;
	if (DirEntry.type != FT_NONE)
	{
		read the rest of the file name, that is guaranteed to fit in the same sector;
		compare against the required file name;
		if the required entry is found return this directory entry;
	}
	advance using the information provided in recLen;
}
return "file not found";

Deleting an entry

To delete a hard link to a file, a driver must set the type field of the directory entry to zero (that is FT_NONE) and decrease the linksCount field of its inode structure. The space occupied by the deleted directory entry (recLen*16 bytes) is made available for reuse (see the 4th entry in figure 3). If some degree of security is needed, a driver may zero all bytes of a directory entry, except recLen.

If the linksCount of the inode structure of the file reaches zero, the sectors of the file (both data and indirect) must be deleted. To do this, drivers must set the corresponding bits in the bitmap to '0'. This is efficient and allows to recover data, until physically overwritten. If some degree of security is needed, a driver may also zero or fill with garbage the actual content of the deleted sectors.

Creating an entry

This procedure must be used to create a new hard link to a file.

Scan the directory for one or more contiguous available directory entries, enough to store the new directory entry. This is because available entries are not coalesced together. The required size of the directory entry, in 16-byte units, must be computed as recLen = (16 + nameLen + 15) >> 4. Available directory entries must be identified for having type == FT_NONE (see above), and must be found within a single sector.

If enough space to store the new directory entry is found, the new directory entry should be stored in that place, otherwise the new directory entry must be appended to the directory, thus extending it. The new directory entry must not span across different sectors. If the new directory entry is shorter than the free space found, a driver must fabricate and append a new directory entry header (with type equal to FT_NONE) to keep the link with the next directory entry consistent (see the 7th entry in figure 3).

Fabricating a short file name

In environments lacking support for long file names, or requiring short file name aliases, a driver may fabricate short file names. Fabricated short file names are not supposed to be stored on the file system as new hard links, but rather generated on the fly to the user when they are needed. How to fabricate short file names in such cases is unspecified. The following is an algorithm that drivers may use if appropriate:

A symbolic link (symlink, soft link) is a special file storing the pathname of another file. Its file type must be FT_LNK.

The string representing the pathname must be encoded in Unicode UTF-8 and must not be null-terminated. The length of this string must be equal to the file size of the symbolic link.

The represented pathname may be either absolute or relative. A driver or other appropriate parts of an implementation should provide the means to follow the symbolic link. In a system where pathnames may include a drive or device specification, the pathname stored by the symbolic link may contain a drive or device specification too. The way the implementation behaves in the event the drive or device referred by the symbolic link change (for example a removable or network drive, or when the volume is used in a system not supporting drive or device specifications) is implementation specific.

The meaning of attributes for symbolic links is unspecified. A driver may apply their regular meaning to the symbolic link itself (for example, a hidden symbolic link may be not showed in a directory listing, whether the target is hidden or not, or the permissions of a symbolic link may refer to what can be done with the target when accessed via the symbolic link), or may use any conventions normally used by the system for symbolic links.

Extented attributes

Extended attributes are optional metadata, not directly interpreted by the file system, that can be attached to a file system object. They are basically name-value pairs, each of it called an extended attribute record.

The LEAN file system allows to store extended attributes inline, right after the inode structure of each file, before the file data, or in a separate fork of the file. In LEAN a fork is just a file containing extended attribute records, associated with another file system object, and with no directory entries. Inline extended attributes allow to store only a limited amount of metadata, but offer better performance. A fork allows to store much more metadata (theoretically as big as a regular file), but requires additional disk access.

Any kind of name-value pairs can be stored as extended attributes. The freedesktop.org project provides recommendations for a portable set of extended attributes.

Whether an extended attribute record is stored in the extra space after the inode structure or in a fork, it is a variable-length structure with a 32-bit header, with size constrained to multiple of 4 bytes, and the following format:

struct ExtendedAttribute
FieldDescription
uint32_t headerThe fixed header of the record. It must be interpreted as 0xNNVVVVVV, where the highest 8 bits (31..24), NN, represent the lenght in bytes of the attribute name, and the lowest 24 bits (23..0), VVVVVV, represent the size in bytes of the attribute value. If NN is 0, this is an empty record used for padding (perhaps a deleted record), and VVVVVV indicates the size of the whole record, excluding the 32-bit header.
uint8_t name[NN]The attribute name. It must uniquely identify what this attribute means. It is not required to be a human readable text, but if it is, it should be a UTF-8 string formatted as a dot-separated name-spaced name. It may also be some unique binary identifier such as a GUID. If NN is not a multiple of 4, undefined padding bytes must be added after the name to a reach a multiple of 4 bytes.
uint8_t value[VVVVVV]The attribute value. It may have any format that makes sense to the application that manages the attribute. If it is a text, it should be encoded in UTF-8. If VVVVVV is not a multiple of 4, undefined padding bytes must be added after the value to a reach a multiple of 4 bytes.

If the inodeExtra field of the inode structure is greater than zero, the extra space must be completely filled with contiguous extended attribute records, perhaps padding ones. This means that, at the very least, a single record whose header has NN = 0 and VVVVVV = inodeExtra * 16 must be present, meaning that the whole extra space is unused. Drivers should prefer placing in the extra space extended attributes that are prioritary, frequently used and small.

If the fork field of the inode structure is nonzero, that inode has an associated fork whose first sector is specified by fork. That fork is a file system object, thus it must have its own inode structure. That inode structure must have inodeExtra = 0, file type IA_FFORK and the fork field must be set to the inode number of the file system object this fork is associated with, in order to create a backlink. The fileSize field of the inode structure of the fork must indicate how many bytes of extended attribute records are present in the fork. The fork must be made of a contiguous sequence of extended attribute records, perhaps padding ones.

Drivers are not required to understand extended attributes, and must not rely on them being present. Fundamental metadata must be stored in the appropriate fields of the inode structure. A driver not supporting extended attributes must leave them untouched, but must be able to delete them when the inode they are associated with is deleted. This means, at least, marking any fork sectors (data and indirect) as free in the bitmap. Drivers should not make any modification to the content of extended attributes they do not know about.

If the size of all extended attributes is reasonably small, drivers may opt to cache all of them in memory, and write them back to disk as a whole. Drivers may rearrange the sequence of extended attribute records, provided that the constrains above are enforced. If using a fork, drivers may want to store all extended attributes in a single extent whenever possible. Drivers should try to allocate space for the fork as close as possible to the inode structure of the file system object the fork is associated with.

When enumerating extended attributes, drivers should start from those in the extra space after the inode structure, then continue with those in the fork, if any. When saving extended attributes, drivers should provide to applications a way (such as a "priority" flag) to specify whether the attribute should be saved inline, in the extra space after the inode structure, if possible, or in a fork. Drivers may implement policies to restrict editing of certain extended attributes to system services only (for example, attribute names starting with "system."). An application of this could be using one extended attribute record to implement Access Control Lists: in this case, drivers may want to place that record in the extra space after the inode structure and forbid their editing to unprivileged users.