eix - index file format

Authors: Martin Väth <vaeth AT mathematik DOT uni-wuerzburg DOT de> (active)
Emil Beinroth <emilbeinroth AT gmx DOT net> (active)
Wolfgang Frisch <xororand AT users DOT sourceforge DOT net> (inactive)
Copyright: This file is part of the eix project and distributed under the terms of the GNU General Public License v2.

This article describes the format of the eix index file, usually located at /var/cache/eix. The format includes a version field in the header block. The current version is 28 (eix 0.17.0).

Table of Contents:

Overall layout

The file is made up of blocks of data, which may in turn contain other other blocks. [1] The first block is a special header. The remaining blocks are the categories which in turn contain the package blocks which contain the version blocks, ...

1st Category

1st Package in 1st Category

1st Version of this Package
[..]

2nd Package in 1st Category

[..]

2nd Category

[..]

[1]Most blocks here occur as a vector (described below), i.e. the first entry is the number of elements, followed by the individual elements. However, if not stated explicitly that a block is a vector, it is indicated otherwise in the file how many elements it has. For example, the number of category blocks is contained in the header.

Basic Datatypes

This section covers the datatypes number and vector (resp. string) which are used in the index file.

Number

The index file contains non-negative integer values only. The format we use avoids fixed length integers by encoding the number of bytes into the integer itself. It has a bias towards numbers smaller than 0xFF, which are encoded into a single byte.

To determine the number of bytes used, you must first count how often the byte 0xFF occurs at the beginning of the number. Let n be this count (n may be 0). Then, as a rule, there will follow n+1 bytes that contain the actual integer stored in big-endian byte order (highest byte first).

But since it would be impossible to store any number that has a leading 0xFF with this format, a leading 0xFF is stored as 0x00. Meaning, if a 0x00 byte follows the last 0xFF, you must interpret this byte as 0xFF inside the number.

Examples:

Number Bytes stored in the file
0x00 0x00
0xFE 0xFE
0xFF 0xFF 0x00
0x0100 0xFF 0x01 0x00
0x01FF 0xFF 0x01 0xFF
0xFEFF 0xFF 0xFE 0xFF
0xFF00 0xFF 0xFF 0x00 0x00
0xFF01 0xFF 0xFF 0x00 0x01
0x010000 0xFF 0xFF 0x01 0x00 0x00
0xABCDEF 0xFF 0xFF 0xAB 0xCD 0xEF
0xFFABCD 0xFF 0xFF 0xFF 0x00 0xAB 0xCD
0x01ABCDEF 0xFF 0xFF 0xFF 0x01 0xAB 0xCD 0xEF

Vector

Vectors (or lists) are extensively used throughout the index file. They are stored as the number of elements, followed by the elements themselves.

Type Content
Number Number of elements (n)
1st element
...
nth element

String

Strings are stored as a vector of characters. A trailing '\0' is not included.

Hash

A hash is a vector of strings.

HashedString

A number which is considered as an index in the corresponding hash; 0 denotes the first string of the hash, 1 the second, ...

HashedWords

A vector of HashedStrings. The resulting strings are meant to be concatenated, with spaces as separators.

Data blocks

Overlay

Type Content
String overlay path
String label (repository name)

Category

Type Content
String Name of category
Vector Packages in this category

Package

Type Content
Number Offset to the next package in the eix cache file (in bytes; counting starts after the number)
String Package name
String Description
HashedWords Provide
String Homepage
HashedString Licenses, e.g. MPL-1.1 NPL-1.1
HashedWords Useflags (all useflags of all versions of the package are added). This might "falsely" be the empty string if per-version IUSE flags are stored.
Vector Versions

Version

Type Content
char

Mask bitset for the current $ARCH:

0x00:none of the following
0x01:masked by package.mask
0x02:masked by profile
0x04:version is in @system
0x08:version is in @world (if SAVE_WORLD=true)
char
Mask bitset for the PROPERTIES variable:
0x01:PROPERTIES=interactive
0x02:PROPERTIES=live
0x04:PROPERTIES=virtual
Number
Mask bitset for the RESTRICT variable:
0x0001:RESTRICT=binchecks
0x0002:RESTRICT=strip
0x0004:RESTRICT=test
0x0008:RESTRICT=userpriv
0x0010:RESTRICT=installsources
0x0020:RESTRICT=fetch
0x0040:RESTRICT=mirror
0x0080:RESTRICT=primaryuri
0x0100:RESTRICT=bindist
HashedWords Full keywords string of the ebuild.
Vector VersionParts
HashedString Slot name. The slot name "0" is stored as "".
Number Index of the portage overlay (in the overlays block)
HashedWords Useflags of this version. This might "falsely" be empty if only per-package IUSE flags are stored.

VersionPart

A VersionPart consists of two data: a number (referred to as type) and a "string" (referred to as value).

The number is encoded in the lower 5 bits of the length-part of the "string"; of course, the actual length is shifted by the same number of bits.

For the type, these named values are possible:

Value Name Content of value
0 garbage garbage that was found after any valid version string
1 alpha number of "_alpha" (may be empty)
2 beta number of "_beta" (may be empty)
3 pre number of "_pre" (may be empty)
4 rc number of "_rc" (may be empty)
5 revision number of "-r" (may be empty)
6 inter_rev number of inter-revision [2]
7 patch number of "_p" (may be empty)
8 char single character
9 primary numeric version part
10 first first numeric version part

As an example, we split the version string 1.2c_pre12_alpha-r01.01-foo like this:

(first, "1") (primary, "2") (char, "c") (pre, "12") (alpha, "") (revision, "01") (inter_rev, "01") (garbage, "-foo")

To reconstruct a version string, iterate through the vector of parts, and for each part append a specific prefix and the value stored in the string. The prefixes you need should be obvious, but here they are listed anyway:

Prefix Name
"." (dot) primary, inter_rev
"" (empty) first, char, garbage
"_alpha" alpha
"_beta" beta
"_pre" pre
"_rc" rc
"-r" revision
"_p" patch
[2]inter-revision ares used by gentoo-alt to keep their prefixed portage tree in sync with the main tree http://www.gentoo.org/proj/en/gentoo-alt/prefix/techdocs.xml#doc_chap2_sect5

Historical notes