F5 File Format Specification

F5 Performance TableOfContents

0. Introduction and Scope

This document specifies the TableOfContents (ToC) performance extension to the F5 data model. It is independent of and orthogonal to the type extension series (F5_Extension_1_NamedTypes.md, F5_Extension_2_CombMaps.md). It adds no semantic content visible to an end user: a file with a TableOfContents is semantically equivalent to one without. What it provides is O(1) lookup for information that would otherwise require O(timeslices × grids) traversal.

The ToC is semantically optional but practically essential for large files.

The fundamental asymmetry: information in the ToC is trivial to write — the writer already knows what it is writing — but expensive to reconstruct by reading — it requires traversing the entire slice hierarchy.

This document also specifies: - Parameter attributes (§2): how parameter values are typed and annotated beyond a plain double, with time as the primary example - Units (§2.4): the open problem of unit specification, and the F5 approach - Parameter space (§5): generalisation of the time axis to n dimensions

Relationship to the core spec: All rules from F5_Layout.md apply. The ToC is supplementary. Readers MUST NOT require its presence.


1. TableOfContents Group Structure

/TableOfContents/
    TypeInfo               <- default named type for field grouping (§3)
    <AdditionalTypeInfo>   <- optional additional named types (§3.3)
    Grids/
        <GridName>/        <- one subgroup per Grid (§4)
            F5::TimeTable  <- extendable dataset of parameter-value/path pairs
            <SlicePath>    <- soft link per slice
    Fields/
        <FieldName>/       <- one subgroup per Field name (§4.3)
            <GridName>     <- soft link -> /TableOfContents/Grids/<GridName>
    Parameters/
        <ParamName>/       <- one subgroup per parameter dimension (§5)

2. Parameter Attributes: Typed Values Beyond a Plain Double

2.1 The core problem

Every slice group carries one attribute per parameter dimension (§5). The most common case is a single Time attribute. The most naive implementation stores this as a plain double. This is insufficient.

A plain double conflates fundamentally different quantities: - Calendar time: a specific moment on the human calendar (satellite observation at 14:32 UTC, August 29, 2005) - Simulation coordinate time: a unitless or physics-unit parameter with no calendar meaning (black hole merger at t = 3533 M, where M is the ADM mass) - Scaled time: time measured in units that scale with a physical parameter (in numerical relativity, the time unit is often the total mass of the system, making it simultaneously “time” and “mass” in geometrized units)

When multiple datasets from different sources are combined — as in the Hurricane Katrina visualization merging satellite, atmospheric, and storm-surge data — treating all parameter values as plain doubles leads to silent errors. The Mars Climate Orbiter failure (1999), caused by mixing imperial and metric units, illustrates the class of failure that typed parameter values prevent.

2.2 The F5 approach: named HDF5 types for parameter attributes

The F5 solution is to store parameter attributes using a named HDF5 type rather than a plain double. The named type is numerically identical to H5T_NATIVE_DOUBLE but carries semantic information as attributes on the type object.

For the Time parameter:

/TableOfContents/Parameters/Time/
    F5::Time           <- named datatype (= H5T_NATIVE_DOUBLE numerically)
                         Attribute: TimeUnits  <F5_TimeUnits value or string>
                         Attribute: offset     <double; reference epoch if applicable>
                         Attribute: comment    <UTF-8 string; human-readable>

The Time attribute on each slice group uses this named type:

/t=3533.4/
    Attribute: Time    3533.4    (type: F5::Time, TimeUnits=F5_TIME_UNITLESS)

Using the named type has two consequences: 1. HDF5’s automatic type conversion is suppressed — a reader cannot silently read a calendar-time value as a unitless coordinate without explicitly choosing to do so 2. All slice groups share one type definition, so global semantic metadata (units, reference epoch) is updated once on the type object and is immediately visible to all slices

For non-time parameter dimensions the same pattern applies: a named type derived from H5T_NATIVE_DOUBLE with appropriate unit information as attributes. See §2.4 on the open problem of units for non-time parameters.

2.3 F5_TimeUnits: a suggestive starting point

The paper (Buleu & Benger, NCUR 2007) introduced an enum for time units:

typedef enum {
    F5_TIME_UNSPECIFIED   = 0,  /* unit unknown                              */
    F5_TIME_UNITLESS      = 1,  /* dimensionless coordinate                  */
    F5_TIME_NANOSECONDS   = 2,
    F5_TIME_MICROSECONDS  = 3,
    F5_TIME_MILLISECONDS  = 4,
    F5_TIME_SECONDS       = 5,
    F5_TIME_MINUTES       = 6,
    F5_TIME_HOURS         = 7,
    F5_TIME_DAYS          = 8,
    F5_TIME_YEARS         = 9,
    F5_TIME_MEGAYEARS     = 10,
    F5_TIME_ELECTRONVOLTS = 11, /* quantum mechanics (hbar/eV)               */
    F5_TIME_METRES        = 12  /* c=1 units (light-travel time)             */
} F5_TimeUnits;

This enum is suggestive, not normative, and explicitly open-ended. It covers the most common cases encountered in the original implementation but cannot cover all possible time units. A closed enumeration for units is impossible in principle:

  • SI prefixes alone yield very many combinations — each base unit multiplied by the full set of SI prefixes — and enumerating them all is impractical, though some approaches do attempt it. A more compact approach stores the prefix as a numeric scale factor, reconstructable to any precision a floating-point number can represent
  • Physical scales create non-standard units: in black hole merger simulations, the natural time unit is the total ADM mass M of the system — time is simultaneously a mass in geometrized units (G=c=1)
  • There is a meaningful distinction between “unitless” (time as a pure coordinate with no physical scale) and “scalable” (time measured in units that are themselves physical quantities varying per simulation)

The enum should be understood as a registry of frequently used values, analogous to EPSG codes for geoscientific coordinate reference systems or HDF5 filter registration numbers for compression algorithms: a curated common subset, extensible by convention, never exhaustive. F5_TIME_UNSPECIFIED is always available as the fallback; any value is more informative than a bare double with no annotation.

The per-field versioning mechanism (§3) ensures that future changes to the unit representation — including adoption of emerging standards — can be introduced without invalidating existing files.

2.4 Units: an open problem

The general problem of unit specification for scientific data remains unsolved in a way that satisfies all communities. Two reference points:

CGNS units (CFD General Notation System): a comprehensive engineering approach covering mass, length, time, temperature, angle, and their combinations. Well-adopted in aerospace and CFD. The approach is enum-based — a closed list of named units — consistent with engineering practice but contrary to the F5 philosophy of favouring structure over enumeration. CGNS is a useful reference for coverage and naming conventions.

C++ mp-units (P1935, targeting C++29): a proposed standard library for physical quantities and units. Type-safe, dimension-aware, SI-complete. If adopted into the C++ standard, it will represent the most rigorous software-level unit treatment available. F5 SHOULD be designed to be compatible with mp-units encoding when it stabilises, since the per-field versioning mechanism (§3) allows the unit representation to evolve without breaking existing files.

Current F5 approach: The F5_TimeUnits enum and a UTF-8 comment attribute on the named type are the current mechanism. This is explicitly a starting point. The design principle is: anything is better than a bare double. A string annotation that says “solar masses” is more informative than silence, even if it is not machine-enforceable. A named type that carries any unit information is safer than a plain double. Subsequent extensions will tighten this as unit standards mature.

String attributes for units on Parameters/ subgroups are RECOMMENDED as advisory annotations:

/TableOfContents/Parameters/Time/
    Attribute: Units   "M_sun"   (advisory, non-normative)

The F5::Time named type approach (§2.2) and the string attribute approach are complementary, not contradictory: the named type provides the machine-readable mechanism; the string attribute provides the human-readable documentation. Both SHOULD be present when the unit is known.

2.5 Calendar vs. non-calendar time

For calendar time, the offset attribute stores a reference epoch. The absolute time of a slice is offset + TimeValue. For unitless simulation time, the offset attribute is absent or zero and is semantically meaningless.

The deeper taxonomy of time — timescales (TAI, UTC, TDT, TCB, UT1), leap seconds, calendars (Gregorian, Julian, etc.), time zones — is deferred to a future time semantics extension. The Buleu & Benger (2007) paper provides a roadmap. The current F5::Time implementation is a necessary first step, not a complete solution.


3. TypeInfo: Field Grouping and Versioning via Named Types

3.1 Purpose

The TypeInfo named type in the TableOfContents is committed once and shared via HDF5 named type links by all Fields in the file that are associated with it. Its values enumerate the field storage types from core spec §8:

typedef enum {
    F5_UNKNOWN_ARRAY_TYPE             = 0,
    F5_CONTIGUOUS                     = 1,
    F5_SEPARATED_COMPOUND             = 2,
    F5_CONSTANT                       = 3,
    F5_FRAGMENTED_CONTIGUOUS          = 4,
    F5_FRAGMENTED_SEPARATED_COMPOUND  = 5,
    F5_DIRECT_PRODUCT                 = 6,
    F5_INDEX_PERMUTATION              = 7,
    F5_UNIFORM_SAMPLING               = 8,
    F5_FRAGMENTED_UNIFORM_SAMPLING    = 9
} F5_TypeInfo;

3.2 Field-level grouping, not file-level

The TypeInfo named type is a field-level grouping mechanism. Fields that reference the same named TypeInfo object belong to the same group — sharing whatever attributes are attached to that type object.

The most important use of grouping is field-level versioning: within one file, different Fields may have been written by different versions of the F5 library. The layout convention for connectivity types may differ between library versions. F5 avoids implicit naming conventions by design, so those are not a versioning concern.

HDF5-level compression filters (deflate, szip, etc.) are transparent to readers — the HDF5 library handles decompression automatically without reader knowledge. These do not affect TypeInfo grouping.

An F5-level precision transformation is distinct and NOT transparent: a field of double values (coordinates in particular) may be stored in single precision after subtracting a numerical offset. This yields a compression factor of 2 while preserving precision, because the offset removes the large-magnitude part of the value before truncation to float. It is also beneficial for GPU rendering, where double-precision is slow and VRAM is precious. However, shader code must apply the offset explicitly — this transformation cannot be hidden from the reader.

The detection mechanism is a normative attribute on the field or fragment:

#define FIBER_FRAGMENT_NUMERICALSHIFT_ATTRIBUTE  "Fiber::NumericalShift"

A reader checks for the presence of Fiber::NumericalShift on each field or fragment and applies the offset before use if found. No TypeInfo version check is needed for the current mechanism.

However, if a future version supersedes this mechanism in a way that conflicts with the current one, TypeInfo versioning becomes the discriminator: a field written under specification version 0.1.5 uses Fiber::NumericalShift as defined here, while a field written under a future version may use a different attribute or encoding. The TypeInfo version attribute (§3.4) allows a reader to select the correct interpretation per field rather than per file.

However, versioning is not the only reason to group Fields. Multiple TypeInfo named types with the same version but different user-defined attributes allow grouping by any property: provenance (from which data source or simulation code was a Field produced), quality level (raw vs. post-processed vs. derived), or any application- defined category. The grouping mechanism is general; versioning is one application.

3.3 Multiple TypeInfo types per file

The /TableOfContents/ group MAY contain multiple named types with TypeInfo semantics. The type named TypeInfo is the default — used by all Fields not requiring special grouping. Additional named types may be committed for Fields sharing a distinct set of attributes:

/TableOfContents/
    TypeInfo          <- default; version 2.0.0; standard layout
    TypeInfo_v1       <- legacy layout from older writer version
    TypeInfo_ADCIRC   <- provenance: Fields from the ADCIRC surge model
    TypeInfo_MM5      <- provenance: Fields from the MM5 atmospheric model

All Fields referencing TypeInfo_ADCIRC share whatever attributes are on that type object. A single O(1) write to that type object updates global metadata for all such Fields simultaneously — regardless of how many datasets reference it. This underutilised HDF5 capability is particularly powerful for provenance management in large multi-source files.

Provenance information is especially relevant for file merging and splitting operations (referenced in core spec §4.3). When merging files from different sources, an advanced merge tool SHOULD assign source-specific TypeInfo types to the incoming Fields rather than assigning all Fields the default TypeInfo. This preserves provenance across the merge and allows the merged file to record which Fields originated from which source. The converse applies to file splitting: fields sharing a TypeInfo with specific provenance attributes can be extracted as a coherent group. This is OPTIONAL for simple merge/split tools but RECOMMENDED when provenance matters.

3.4 Versioning attributes

Every TypeInfo named type SHOULD carry:

Attribute: URL      "https://www.fiberbundle.net/F5-0.1.5/"  (ASCII string)
Attribute: version  {0, 1, 5}                                 (int[3])

The URL points to the specification version subfolder at fiberbundle.net. Each released specification version has its own subfolder, so a reader encountering a file can locate the exact specification document that governed its writing. Additional TypeInfo types SHOULD carry the same attributes with version values appropriate to their layout context.


4. Grid and Field Lookup

4.1 TimeTable datasets — naming and structure

Each Grid’s ToC entry contains one or more extendable HDF5 datasets, each storing one record per slice for one parameter space. The name of the dataset encodes the parameter space, placing information into structure rather than into a separate metadata attribute.

Naming the TimeTable dataset follows the F5 principle of placing information into structure rather than into reserved names. Three approaches are available and all are valid:

Canonical (parameter-name = dataset-name): For a single time parameter, the TimeTable is a 1D extensible dataset named after the parameter:

/TableOfContents/Grids/Carpet/Time    Dataset {937/Inf}

Compound type: {double Time, char[56] SliceName}

Two separate datasets do NOT model a 2D parameter space — they model two independent 1D parameter spaces. A reader cannot determine from two separate datasets whether a given slice belongs to a point in Time × MassRatio space or to two independent one-dimensional sequences.

A true n-dimensional parameter space is encoded as a single dataset whose HDF5 rank encodes the parameter space dimension, named by joining the parameter names with & in alphabetical order:

/TableOfContents/Grids/Carpet/Time&MassRatio    Dataset {1 × N/Inf}

Compound type: {double MassRatio, double Time, char[40] SliceName}

HDF5 allows only one unlimited dimension. For an appendable 2D parameter space, the fixed dimension has extent 1 and the unlimited dimension is extended on each append. If the full parameter space is known in advance (post-simulation assembly), a proper 2D dataset {M × N} may be used without the extent-1 constraint, but it cannot be extended by append.

The dataset name Time&MassRatio makes the parameter space composition explicit without requiring additional metadata. A reader determines the parameter space dimension from the dataset rank and identifies the parameters from the compound type member names and/or the &-separated dataset name. The dataset name matches the parameter attribute names on slice groups joined with & (§5.2).

Nested (group/dataset with identical names): When scoping soft links per parameter space is desirable for performance, a subgroup is introduced and the dataset inside it reuses the group name:

/TableOfContents/Grids/Carpet/Time/          <- group
    Time    Dataset {937/Inf}                 <- dataset name = group name
    t=000000000...   -> /t=000000000...       <- links scoped to this param space

The redundancy Time/Time (group containing dataset of same name) is intentional — it encodes “this is a parameter-space container” in the structure without reserving a new name. A reader checks whether Grids/<GridName>/<ParamName> is a dataset (flat, canonical) or a group (nested, scoped links).

Legacy (F5::TimeTable): The reference implementation uses the reserved name F5::TimeTable for the time dataset. This name is a useful fallback: it covers variations in time attribute naming (t, time, Time, TIME) without requiring strict matching to a parameter name, and it is recognizable to legacy readers. Readers SHOULD treat a dataset named F5::TimeTable as a TimeTable for the time parameter. New writers targeting the full parameter-space model SHOULD prefer the canonical or nested layout.

The three layouts are progressive and non-contradictory. A flat F5::TimeTable file can be read by all readers; a full nested multi-parameter file requires a reader that implements the parameter space extension.

The TimeTable dataset and soft links are intentionally redundant. They serve distinct purposes: - The TimeTable dataset is for high-performance access: a single sequential read yields the complete list of parameter values and slice paths, enabling O(log N) binary search without any group traversal. - The soft links enable direct human and tool access: h5ls and similar tools can navigate to any slice group without reading the TimeTable dataset. They make the ToC self-documenting.

This redundancy enables consistency checking: a validator can verify that every TimeTable entry has a corresponding soft link and vice versa. A missing soft link whose TimeTable entry references an external file signals a potentially missing external file — not an F5 error, but a file management issue the user can resolve by copying the external file. A reader SHOULD distinguish: - “Slice does not exist”: no TimeTable entry and no soft link - “Slice exists but file is absent”: TimeTable entry present, soft link present but unresolvable (external file missing)

Both the TimeTable and the soft links record the slice group path as a string. These strings MUST be textually identical to each other and to the actual slice group path. A validator SHOULD check this three-way consistency.

Entry structure:

typedef struct {
    double  ParameterValue;            /* parameter value for this slice       */
    char    SliceName[entry_size - 8]; /* absolute HDF5 path to slice group    */
} F5_TableEntry;

The total entry size SHOULD be a power of two. The reference implementation uses 64 bytes (8 bytes double + 56 bytes path string). For multi-parameter entries where the compound type stores k doubles, the string length is entry_size - 8k:

Parameters Double bytes String bytes Total
1 8 56 64
2 16 48 64
3 24 40 64
4 32 32 64

Implementations MAY choose a larger power-of-two total (128, 256) for longer paths. The chunk size SHOULD satisfy chunk_entries × entry_size = k × filesystem_block. The reference implementation uses 1024 entries per chunk (64 KB for 64-byte entries).

The power-of-two size is a performance RECOMMENDATION. The HDF5 API exposes the type structure at runtime; readers determine layout from the type definition.

4.2 Write protocol and ordering

/TableOfContents/Grids/<GridName>/
    F5::TimeTable           Dataset {N/Inf}
    t=000000000.0000000000  -> /t=000000000.0000000000
    t=000000003.7750000000  -> /t=000000003.7750000000
    ...

When writing a new slice: 1. Extend the parameter’s TimeTable dataset by one entry; write the parameter value(s) and the slice group path string 2. Create a soft link whose name is textually identical to the slice path string, pointing to the slice group at that path

The SliceName string in the TimeTable entry, the soft link name, and the actual slice group path MUST all be textually identical and consistent. A validator SHOULD check this three-way consistency.

Soft links MAY point to slice groups in external HDF5 files. In this case the target is physically absent if the external file is not present, but remains semantically valid — its existence is asserted by the ToC entry. A physically absent slice group is not an error; it signals a missing external file, resolvable by file copy. Readers SHOULD distinguish “slice does not exist” (no ToC entry) from “slice exists but file is absent” (ToC entry present, link unresolvable).

Named HDF5 types and external links: Named HDF5 types do not propagate across external file links — an external file cannot share the global /TableOfContents/ named types of the linking file. The rule is:

  • Files with a ToC store all named types globally in /TableOfContents/. This is the preferred layout.
  • Files without a ToC (ToC is optional) store named types locally within their own group hierarchy.
  • When a soft link points to an external file, that external file MUST carry its own copies of all named types used by its Fields. If the external file has its own ToC, its named types are already there. If not, they must be present locally.

A merge tool assembling multiple files into one SHOULD promote all named types into the merged file’s /TableOfContents/. Local copies in the merged file are created only when a genuine structural incompatibility prevents sharing — that is, when two named types share a name but differ in structure. Version differences alone are NOT incompatibilities: a merge tool SHOULD read fields written under an older spec version and write them under the current version, analogous to HDF5 library version bounds (H5Pset_libver_bounds). The per-field TypeInfo versioning (§3.2) enables the merge tool to record which spec version governed each Field’s original layout.

No sorting is required on write. Writers append in the order slices are produced. Readers that need ordered access sort the in-memory copy after loading. Requiring sorted order would force rewriting the entire dataset on each append for non-monotonic sequences — a prohibitive cost for interactive applications and keyframe editors.

For multi-dimensional parameter spaces (§5), the parameter space is encoded as a single dataset of appropriate rank (§4.1). The concept of “sorted” is ill-defined for 2D+ parameter spaces with non-regular sampling. HDF5 allows only one unlimited dimension per dataset, so all TimeTable datasets are extensible along exactly one axis regardless of the logical parameter space dimension. Sorting within the file is neither achievable nor required.

Grid ToC groups MAY carry attributes from the actual Grid group:

Attribute: Refinement  {1, 1, 1}   <- AMR refinement levels (int array)

4.3 Field reverse lookup

/TableOfContents/Fields/<FieldName>/
    <GridName>   Soft Link {/TableOfContents/Grids/<GridName>}

One subgroup per Field name; one soft link per Grid containing that Field. The first time a Field/Grid pair is written, create the link. Subsequent slices of the same pair require no action.


5. Parameter Space

5.1 Time as a special case of parameter space

In current F5 files, each slice group carries a single Time attribute using the F5::Time named type (§2). The Parameters/Time/ ToC subgroup documents the time parameter dimension. This is the base case: a 1-dimensional parameter space.

A time series is a 1D path through an n-dimensional parameter space. Other parameter dimensions might include: - Physical parameters: mass ratio, spin, eccentricity (binary merger parameter study) - Numerical parameters: resolution level, damping coefficient - Ensemble parameters: random seed, perturbation amplitude

5.2 Parameter identification — strict naming

The Parameters/ subgroup defines the parameter dimensions present in a file. The subgroup names are the normative attribute names that writers MUST use on slice groups.

If /TableOfContents/Parameters/Time/ exists, then each slice group MUST carry an attribute named exactly Time (case-sensitive). Alternative names such as time, t, or T are NOT permitted. The parameter name in the ToC and the attribute name on slice groups MUST match exactly.

This strict naming rule enables O(1) attribute lookup by name and prevents ambiguity in files that combine data from multiple sources with different naming conventions.

5.3 Multiple parameter dimensions and grid base spaces

A Grid over a 1-dimensional parameter space (Time only) and a Grid over a 2-dimensional parameter space (Time + MassRatio) are fundamentally different: they are fibers over different base spaces. In fiber bundle terms, the base manifold of the first Grid is 1-dimensional; the base manifold of the second is 2-dimensional. These Grids MUST NOT be merged or interchanged.

Parameter space identity is determined by the slice group attributes, not by the Grid group. A Grid itself does not know which parameter space it lives on — analogously to a fiber in a fiber bundle, which does not know which point of the base space it is attached to. The ToC associates Grids with parameter spaces by organising slice paths into TimeTable datasets of appropriate rank and compound type.

This separation has a practical consequence: merging multiple 1D time-series into a 2D parameter space requires no changes to existing Grid objects. A merge can be a zero-data-copy operation using only external links. The merged file contains a new 2D ToC structure whose TimeTable entries reference slice groups in the original 1D files via external links. The Grid groups themselves are untouched.

Example: merging three binary black hole simulations (each a 1D time series at a different mass ratio q=0.5, q=0.75, q=1.0) into a 2D parameter space:

/TableOfContents/Grids/Carpet/
    Time&MassRatio    Dataset {1 × 3×937/Inf}
        { Time=0.0,    MassRatio=0.5,  SliceName="/t=0.0"  }  -> external file 1
        { Time=0.0,    MassRatio=0.75, SliceName="/t=0.0"  }  -> external file 2
        { Time=0.0,    MassRatio=1.0,  SliceName="/t=0.0"  }  -> external file 3
        ...
    t=0.0  -> /t=0.0   (which itself is an external link to the appropriate file)

The Grid group MAY carry arbitrary attributes (simulation parameters, command line, provenance) but these are not F5-normative and do not participate in parameter space identification.

5.4 Multi-parameter slice structure

A slice group in a 2-parameter space carries one attribute per parameter dimension:

/t=100.0_q=0.5/
    Attribute: Time       100.0   (type: F5::Time, TimeUnits=F5_TIME_UNITLESS)
    Attribute: MassRatio  0.5     (type: F5::MassRatio or plain double)

The attribute type for non-time parameters SHOULD follow the same named-type approach as F5::Time (§2.2) — a double cast into a named type carrying unit and semantic information. In the absence of a defined named type, a plain double is permitted as a fallback, with a string Units attribute on the corresponding Parameters/<Name>/ subgroup as advisory documentation:

/TableOfContents/Parameters/MassRatio/
    Attribute: Units   "dimensionless"   (advisory, non-normative)

The unit specification problem for arbitrary parameter dimensions is as open as for time (§2.4). The current approach — named type where available, string annotation as fallback — is an interim position. Future adoption of a unit standard (such as C++29 mp-units or an F5-specific extension) will be accommodated via the per-field versioning mechanism without breaking existing files.

5.5 The TimeStep attribute

Integer simulation step counters are often the primary identifier in numerical simulations. The floating-point time value is derived from the step counter and the timestep size, and for non-equidistant timestepping, recovering the step counter from the time value is numerically unstable. A normative optional attribute on slice groups preserves the step counter directly:

#define FIBER_HDF5_TIMESTEP_ATTRIB  "TimeStep"

When present, TimeStep is an integer attribute on the slice group recording the simulation step counter. Its presence is OPTIONAL but RECOMMENDED for simulation output. A reader SHOULD use TimeStep for step-counter arithmetic rather than deriving it from the Time double value.

The TimeStep attribute is independent of the Time attribute and of the ToC structure. Both may coexist on the same slice group.

5.6 Parameters/ subgroup content

/TableOfContents/Parameters/<ParamName>/
    F5::<ParamName>    <- optional named type for this parameter (RECOMMENDED)
    Attribute: Units   <- optional advisory string (RECOMMENDED when applicable)

All content in Parameters/ subgroups is OPTIONAL. A reader can derive aggregate information (range, distinct values, count) from the F5::TimeTable dataset. Writers SHOULD NOT be required to maintain synchronised aggregate attributes such as min/max.

5.7 Multiple parameter spaces and file merging

A single F5 file may contain multiple distinct one-dimensional parameter spaces with different semantics — for example, Parameters/Time_Unitless/ (numerical relativity simulation time) and Parameters/Time_JulianDate/ (observational calendar time). This allows files from different sources to be merged without forcing a common time type.

For file merging when source files have incompatible time semantics, three strategies are available:

  1. Create separate parameter spaces (RECOMMENDED): each source file’s time type becomes a distinct entry in Parameters/. Grids from each source retain their original time semantics. The merged file contains both parameter spaces.
  2. Refuse to merge: safe but inflexible. Appropriate when the merge tool cannot determine the relationship between the time types.
  3. Retype to a common type: lossy; not recommended unless the conversion is exact.

Strategy 1 is the recommended approach because it preserves semantic information, is structurally supported by the current spec, and allows a downstream reader or visualization tool to select the appropriate parameter space for its context.

For named type handling during merges, see §4.2. The key principle: version differences are not incompatibilities; a merge tool promotes all named types into the merged file’s ToC and creates local copies only for genuine structural conflicts.


6. Reader Behavior

6.1 ToC-aware readers

  1. Open /TableOfContents/Parameters/ to determine the parameter space dimensions
  2. Open /TableOfContents/Grids/ to enumerate available Grids — O(1)
  3. For each Grid, read its F5::TimeTable dataset — one sequential read
  4. Sort in memory if needed — O(N log N) once, then O(log N) per query
  5. Use SliceName to open the target slice group — O(1)
  6. Open /TableOfContents/Fields/ to enumerate available Fields — O(1)

6.2 Readers without ToC support

Readers iterate root-group children to find slice groups and discover Grids/Fields by traversal per core spec §13. Readers MUST NOT require the ToC to be present.

6.3 Consistency

The ToC is a derived structure. Inconsistency (missing entries, stale soft links) is a warning, not a fatal error. Readers MAY fall back to direct traversal if a ToC entry fails to resolve. Writers MUST ensure ToC entries are appended before closing.


7. Structural Example

From an actual F5 file (binary black hole evolution, 937 timeslices, Carpet AMR):

/TableOfContents/
    TypeInfo               enum { Contiguous=1, ..., FragmentedUniformSampling=9 }
                             Attribute: URL      "https://www.fiberbundle.net/F5-0.1.5/"
                             Attribute: version  {0, 1, 5}

    Grids/
        Carpet/
            Attribute: Refinement  {1, 1, 1}
            F5::TimeTable      Dataset {937/Inf}, 64 bytes/entry, chunk 1024
                                 (legacy name; equivalent to canonical "Time")
                { Time=0.0,     SliceName="/t=000000000.0000000000" }
                { Time=3.775,   SliceName="/t=000000003.7750000000" }
                ...
                { Time=3533.4,  SliceName="/t=000003533.4000000000" }
            t=000000000.0000000000  -> /t=000000000.0000000000
            ...  (soft links: redundant with TimeTable, serve tool/human access)

    Fields/
        Positions/
            Carpet  -> /TableOfContents/Grids/Carpet
        WEYLSCAL4::Psi4R/
            Carpet  -> /TableOfContents/Grids/Carpet
        WEYLSCAL4::Psi4I/
            Carpet  -> /TableOfContents/Grids/Carpet

    Parameters/
        Time/
            F5::Time           (named type: double + TimeUnits=F5_TIME_UNITLESS)
            Attribute: Units   "M"   (ADM mass of the system)

8. Summary of Normative Elements

Element Status
Named type approach for parameter attributes (§2.2) RECOMMENDED
F5_TimeUnits enum: suggestive open-ended registry (§2.3) Non-normative
Units: string annotation as minimum (§2.4) RECOMMENDED
TypeInfo enum values as specified (§3.1) Normative
Multiple TypeInfo types for grouping (§3.3) OPTIONAL
TypeInfo URL pointing to versioned spec subfolder (§3.4) RECOMMENDED
TypeInfo version int[3] attribute (§3.4) RECOMMENDED
Power-of-two entry size for F5::TimeTable (§4.1) RECOMMENDED
Chunk size as power-of-two multiple (§4.1) RECOMMENDED
Write protocol: append without sorting (§4.2) Normative
Soft link per slice in Grid ToC (§4.2) Normative
Field reverse lookup structure (§4.3) Normative
Strict parameter name matching: ToC name = attribute name (§5.2) Normative
Different parameter-space dimensionalities = different Grids (§5.3) Normative
Parameters/ subgroup per dimension (§5) RECOMMENDED
All Parameters/ content optional (§5.5) Normative
ToC presence: readers MUST NOT require it (§6.2) Normative
ToC consistency (§6.3) Normative

9. Literature and Background

Time semantics in HDF5: Buleu, A.E.; Advisor: Benger, W.: “An Ontological Scheme for Specifying Time in HDF5.” Proceedings of The National Conference on Undergraduate Research (NCUR) 2007, Dominican University of California, San Rafael, California, April 12-14, 2007. Center for Computation and Technology, Louisiana State University.

Units standards for reference: CGNS (CFD General Notation System): https://cgns.github.io — comprehensive unit coverage for aerospace/CFD, enumeration-based approach. C++ mp-units (P1935R5): https://mpusz.github.io/mp-units/ — proposed C++29 standard for physical quantities and units, type-safe and SI-complete.

Hurricane Katrina visualization (motivating application): Venkataraman, S., Benger, W., Long, A., Jeong, B., Renambot, L.: “Visualizing Hurricane Katrina: large data management, rendering and display challenges.” GRAPHITE 2006, Malaysia, pp. 209-212.

Carpet AMR framework (motivating application for the ToC): Schnetter, E., Hawley, S.H., Hawke, I.: “Evolutions in 3D numerical relativity using fixed mesh refinement.” Classical and Quantum Gravity 21(6), pp. 1465-1488 (2004).

F5 data model: Benger, W. (2005): PhD thesis, FU Berlin / ZIB. https://www.fiberbundle.net/papers/TensorFieldViz.pdf

Benger, W. et al. (2009): GraVisMa 2009, Plzen, pp. 80-89. http://gravisma.zcu.cz/GraVisMa-2009/Papers_2009/!_2009_GraVisMa_proceedings-FINAL.pdf


Usage Restriction

This specification and all documents in the F5 specification series are published for academic and personal use. Use of this specification, or any implementation derived from it, by employees of military or defense-related organizations, or within facilities producing weapons or conducting research on weapon design, or for any other military purpose, is explicitly prohibited and contrary to the intent of the authors.

This restriction follows the spirit and terms of the light++ license under which the original F5 reference implementation was published. The rationale is stated there directly: software is technology, technology conveys power, and the inventor bears responsibility for deciding to whom that power is granted. This specification was developed to advance scientific understanding — not to enable harm.

See: https://www.fiberbundle.net/doc/copyright.html


End of Document