F5 File Format Specification

F5 DesignPhilosophy

0. Purpose and Scope

This document articulates the foundational design axioms of the F5 data model. It is not a specification — it contains no normative rules. Its purpose is to record the reasoning from which the specifications are derived, so that future extensions, corrections, and implementations can be evaluated against first principles rather than against accumulated convention.

On the relationship between axioms and practice: These axioms are ideals. Practical limitations — technical constraints of existing tools, implementation effort, HDF5 API restrictions, the need for backward compatibility — may require deviations in any given implementation or specification version. This is acceptable. The axioms define the direction of travel: future improvements SHOULD converge towards these axioms, not against them.

If a particular axiom turns out to be unachievable in practice, this does not invalidate the others. Each axiom stands independently (salvatory clause). The discovery that one axiom cannot be satisfied is itself a result — it motivates either a refinement of the axiom or the development of a new mechanism.

Axioms vs. theorems: This document distinguishes foundational axioms (statements taken as starting points, not derived from others) from theorems (statements derivable from the axioms). Many properties of the F5 model that appear in the specifications are theorems — they are consequences of the axioms rather than independent design choices. Labelling them correctly clarifies which parts of the model are fundamental and which can be improved without changing the foundations.

The document is ordered deductively: the most fundamental axioms come first, and theorems follow from them.


1. The Fiber Bundle Axiom

Axiom F: {#axiom-fiber-bundle} Scientific data has the structure of a fiber bundle

The foundational choice of the F5 model is to represent scientific data as sections of fiber bundles drawn from differential geometry and topology.

A fiber bundle E → B consists of: - Base space B: the domain where data lives (the mesh, the grid, the parameter space — discrete or continuous) - Fiber F: the mathematical value attached at each point of B (scalar, vector, tensor, spinor, multivector, or any other algebraic structure) - Projection π: E → B: the map assigning fibers to base points

This is not a metaphor. The base space of a simulation mesh is a topological space. The velocity field on that mesh is literally a section of the tangent bundle. The metric tensor in general relativity is literally a section of a symmetric covariant rank-2 tensor bundle. F5 models these structures directly, not as an approximation.

The fiber bundle concept is reused at every level of the model: - A Grid is a fiber over the parameter space (each Grid is the same physical object observed/simulated at different parameter values) - Each Skeleton defines its own base space, with Fields as fibers over that base - The TableOfContents is a fiber bundle structure: the ToC dataset is a section of a trivial bundle over the parameter space, providing slice path information at each parameter value

The fiber bundle axiom rules out any data model that conflates base space and fiber — for example, a format that binds a “temperature scalar” to a specific mesh type without separating what the scalar is (fiber) from where it lives (base).

Theorem F.1 (Topology/Geometry separation): Because the base space (topology) and the fiber placement (geometry = chart representation) are independent structures in a fiber bundle, they must be stored independently. This is the origin of the Skeleton/Representation split.

Theorem F.2 (Multiple representations): A single topological structure (Skeleton) can be placed geometrically in multiple coordinate systems simultaneously. This follows from the atlas structure of differential geometry.

Different charts over the same manifold are not necessarily equivalent in practice: some exhibit coordinate singularities (spherical coordinates at the poles, Schwarzschild coordinates at the event horizon), and a single chart may not cover the entire manifold. The atlas of multiple charts is the mathematical solution to this: together, multiple charts cover what no single chart can.

F5 expresses this directly. A Field may have multiple coordinate Representations simultaneously — one chart per patch, one chart per coordinate system the application supports. An implementation can switch between chart Representations as needed for a given point or region, following the atlas concept. At the F5 level a Field can thus be treated as an abstract mathematical object, independent of any particular coordinate realization. The choice of coordinates is a concern of the implementation layer, not of the stored data.

Theorem F.3 (Transformation rules): The algebraic type of a fiber (scalar, vector, co-vector, tensor) determines how its components transform under chart changes. Storage of algebraic type information (the covariance array, grade, rank) is therefore not optional for data that will be subjected to coordinate transformations.


2. The Five (+Two) Level Hypothesis

Hypothesis L: {#hypothesis-levels} Five (+Two) levels of hierarchy are necessary and sufficient

The F5 hierarchy — Timeslice, Grid, Skeleton, Representation, Field, and optionally Fragment and SeparatedCompound — is hypothesized to be both necessary and sufficient for representing the full range of scientific visualization data derived from numerical physical simulations.

This is stated as a hypothesis rather than an axiom because it may be falsified by the discovery of a class of data not representable in this hierarchy. To date, no such class has been found. The hypothesis has been confirmed across a wide range of data types: structured and unstructured meshes, AMR hierarchies, tensor fields in general relativity, point clouds, parametric surfaces, time-evolving data, and multi-source observational data.

The five core levels are conceptual. The two optional levels (Fragment, Separated Compound) are implementation-internal performance mechanisms — they are semantically invisible to the end user. An implementation may freely adjust, combine, or omit these two levels without changing the semantic content of the file. The five conceptual levels are normative; the two performance levels are advisory.

The scope of the hypothesis is scientific visualization of numerical physical simulations. It does not claim universality for all possible data.


3. Foundational Axioms

Axiom 1: {#axiom-how-not-what} Describe how, not what

F5 does not classify data into predefined categories. It describes the mathematical and structural properties of data — from which any classification follows as a derivable consequence.

A reader that correctly identifies the properties of a dataset can determine what kind of data it is without any predefined vocabulary. This is the foundational distinction between the F5 approach and the engineering approach of enumerated types.

Theorem 1.1: No enumeration of cell types, field types, or data categories is required or defined in the F5 core model.

A question of the form “does this file contain a triangular surface (type 23)?” is itself contradictory to Axiom 1 — it asks what the data is, not how it is structured. Even setting that aside, the answer cannot be exclusive: in F5, a triangular surface unavoidably also is a point cloud. It is not possible to store a triangular surface in F5 that is not simultaneously a point cloud, because the vertex Skeleton is always present. A format that distinguishes these as different types conflicts with the mathematical structure of the model.

An application that insists on asking “what type is this?” must translate F5’s structural properties into its own vocabulary — an effort that is both implementable and unnecessary for any application that asks the right question: “does this data have the properties I need?”.

Theorem 1.2: Forward compatibility is a consequence of Axiom 1. A data type that does not exist at spec writing time can be expressed in F5 by describing its properties, without modifying the specification.

Axiom 2: {#axiom-structure-not-names} Information belongs in structure, not in names or enumerations

The semantic content of an F5 file is encoded in hierarchical position, type structure, and object identity — not in names, naming conventions, or reserved vocabularies.

In practice, F5 cannot yet fully achieve this axiom. Some normative name conventions remain: the field name Positions is normative; the D/d prefix convention for tangential vectors and co-vectors encodes covariance in names; the ^ separator in exterior product names encodes the product type. These are acknowledged as imperfections relative to the axiom — minimal naming conventions chosen because no structural encoding is yet available.

Theorem 2.1 (Improvement direction): If HDF5 or any future mechanism provides a way to encode the information currently in naming conventions into structure (e.g., HDF5 shared dataspaces would encode Skeleton index spaces structurally), F5 SHOULD adopt it and retire the corresponding naming convention. This is a route to specification version bumps that improve alignment with Axiom 2.

Theorem 2.2: The Positions name is the minimal necessary exception to Axiom 2 — the one name F5 assigns normative structural meaning to, because some starting point for geometric placement must be identifiable.

Axiom 3: {#axiom-simplicity} Simplicity for simple cases

As Einstein stated: “Keep things as simple as possible, but not simpler.”

Generality must not impose cost on users who do not need it. Every extension to the F5 model is strictly additive. Each layer can be used independently. A reader implementing only the core spec correctly handles files that use extensions by ignoring what it does not understand.

Theorem 3.1: The two optional levels (Fragment, SeparatedCompound) are invisible to the user model — an implementation adjusts them for performance without user awareness.

Theorem 3.2 (Absence is informative): The covariance attribute is absent from scalar types; the grade attribute is absent when grade equals rank; the Positions field may be absent from a partial Representation. These absences carry semantic meaning — the simplest interpretation applies when the attribute is absent.

Axiom 4: {#axiom-file-identity} File identity is semantically irrelevant

An F5 dataset has no preferred file boundary. Merging multiple F5 files into one, or splitting one into many, must be possible at every level of the hierarchy, down to the field fragment. A merge or split that changes no data values must be achievable as a zero-data-copy operation. The file name carries no meaning.

Theorem 4.1: Fragment-level merging and splitting (the minimum granularity) implies that the “+2” internal levels must be restructurable without semantic consequence.

Theorem 4.2: Named HDF5 types do not propagate across external file links. Any file referenced by an external link must carry its own copies of all named types it uses. A merge tool SHOULD promote named types to the merged file’s global TableOfContents.

Theorem 4.3: Performance enhancements that are orthogonal to the F5 core semantics may be added anywhere — fragments, TableOfContents, external link structures. The criterion for “orthogonal” is: removing the enhancement yields a semantically equivalent file. The TableOfContents satisfies this; so does the Fragment level.

Axiom 5: {#axiom-compatibility} Forward and backward compatibility

The F5 model SHOULD evolve without invalidating existing files (backward compatibility) and existing readers SHOULD handle future files gracefully by ignoring unknown constructs (forward compatibility).

A specification advancement that is compatible with an existing version requires no version number change. Only contradictory advancements — where the new interpretation conflicts with the old — require a version bump. Version bumps are therefore expected to be rare, because the model is designed to be extensible without contradiction.

Theorem 5.1 (Per-field versioning): Because different fields in one file may have been written under different specification versions, versioning must be per-field rather than per-file. The TypeInfo named type is the mechanism.

Theorem 5.2: Version differences are not incompatibilities. A reader implementing version V can read fields written under version V-n by applying the interpretation rules for that older version, identifiable from the field’s TypeInfo reference.


4. Index Space Axiom

Axiom 6: {#axiom-index-spaces} Data is organized by index spaces; index depth encodes their role

Every field in F5 is a function from an index space to a mathematical value space (the fiber). An index space is a discrete set with a defined role in the topological hierarchy. A Skeleton is the F5 representation of an index space.

The IndexDepth of an index space encodes its position in the topological hierarchy relative to vertices: - Negative IndexDepth: entities from which vertices are derived (generators, coefficients) - IndexDepth 0: vertices — the atomic base of spatial discretization - Positive IndexDepth: entities composed from vertices (cells, sets of cells, etc.)

IndexDepth is not limited in magnitude in either direction. The current specification covers the most common cases, but the axiom is general.

Application rule for unknown data types: When encountering a new, previously unseen type of data, the first step is to identify its index spaces: What are the discrete sets over which this data is defined? What is the relationship of each index space to vertices? This analysis determines the IndexDepth, and from there the appropriate Skeleton structure follows.

Theorem 6.1 (Vertices as reference): The special status of vertices (IndexDepth 0) is not axiomatic — it is a consequence of the definition of IndexDepth. Vertices are the index space at depth zero by definition; all other depths are relative to this.

Theorem 6.2 (Negative IndexDepth = procedural coordinates): If vertex coordinates are themselves derivable from data stored at IndexDepth -1 (coefficients such as spherical harmonics), then the resolution of the vertex coordinates is a reader decision rather than a writer decision. The file stores the recipe; the reader computes the result at the resolution required by the application. Negative IndexDepth can be iterated: IndexDepth -2 covers coefficients that are themselves derived from another source.

On the Maximal-Depth Principle: The recommendation to assign the maximum IndexDepth a Skeleton may ever need is a practical heuristic, not an axiom. It is likely correct in most cases — adding an index space at a deeper level later without changing the existing structure is easier if the depth was pre-allocated. However, this principle is tentative and may be refined by implementation experience.


5. Chart and Type Axioms

Axiom 7: {#axiom-chart-names} Chart component names are the axioms of the chart; all other names derive

The member names of the named HDF5 compound type for a chart are the coordinate function names xᵘ : M → ℝ. They are the starting axioms of the chart. All algebraic type names within the chart — tangential vector components, co-vector components, exterior product names, tensor component names — are derived from these by explicit rules.

This axiom is partially compromised in the current implementation (the D/d prefix and ^ separator are naming conventions) but it defines the ideal: if structural encoding of co-variance and grade were available without naming conventions, the component names would be sufficient without any prefixes.

Theorem 7.1: The separation of chart types (defined by component names) from chart objects (specific instances with transformation rules) follows from Axiom 7 combined with Axiom F (fiber bundles). A chart type is the abstract structure; a chart object is a specific embedding in the file.

On chart types as measurement rules: A coordinate system is not merely a naming convention — it defines a measurement rule. Cartesian coordinates {x, y, z} encode the instruction “measure three orthogonal distances.” Polar coordinates {r, θ, φ} encode “measure one distance and two angles, in a specific order with specific orientations.” These measurement instructions are usually implicit knowledge among practitioners of a field. F5 makes this explicit: a chart type description, storable as an attribute on the ChartDomain group, can record the measurement semantics of each coordinate in human- and machine-readable form.

A file that carries complete chart type descriptions is axiomatically self-descriptive: a reader with no prior knowledge of the coordinate system can determine how to interpret every stored value from the file itself. Self-descriptiveness is also a design goal of HDF5 (it achieves it at the syntactic level — how data is laid out). F5 extends this to the semantic level — what the data means. A fully self-descriptive F5 file requires no external documentation to interpret correctly.


6. Summary: Axioms and Theorems

Axioms (foundational, not derived)

Label Statement
F Scientific data has the structure of a fiber bundle
1 Describe how, not what
2 Information belongs in structure, not names or enumerations
3 Keep things as simple as possible, but not simpler
4 File identity is semantically irrelevant
5 Forward and backward compatibility
6 Data is organized by index spaces; IndexDepth encodes their role
7 Chart component names are the axioms of the chart

Hypothesis (confirmed but potentially falsifiable)

Label Statement
L Five (+Two) levels of hierarchy are necessary and sufficient for scientific simulation data

Selected Theorems (derived from axioms)

Label Derived from Statement
F.1 F Topology and geometry must be stored independently (Skeleton/Representation split)
F.2 F A Skeleton can have multiple Representations simultaneously
F.3 F Algebraic type (covariance, grade) must be stored for transformable data
1.1 1 No enumeration of cell or field types is defined in the core model
1.2 1 Forward compatibility follows from describing properties, not categories
2.1 2 Structural HDF5 features that encode what naming conventions encode SHOULD be adopted
2.2 2 Positions is the minimal necessary exception to the no-reserved-names principle
3.1 3 The +2 optional levels are user-invisible performance mechanisms
3.2 3 Absence of an attribute is informative; the simplest interpretation applies
4.1 4 The +2 levels must be restructurable without semantic consequence
4.2 4 Named types must be locally available in any file that uses them
4.3 4 Performance enhancements orthogonal to core semantics may be added freely
5.1 5 Versioning must be per-field, not per-file
5.2 5 Version differences are not incompatibilities
6.1 6 Vertices are the reference index space (depth zero) by definition
6.2 6 Negative IndexDepth encodes procedural/derived coordinates
7.1 7, F Chart types vs. chart objects distinction follows from axioms

7. Literature

The fiber bundle foundation: - Butler, D.M., Pendley, M.H.: “A visualization model based on the mathematics of fiber bundles.” Computers in Physics 3(5), pp. 45–51 (1989).

The F5 data model and the five-level hypothesis: - Benger, W.: “Visualization of General Relativistic Tensor Fields via a Fiber Bundle Data Model.” PhD thesis, FU Berlin / ZIB, August 2004. https://www.fiberbundle.net/papers/TensorFieldViz.pdf

The algebraic type system: - Benger, W., et al.: “Using Geometric Algebra for Navigation in Riemannian and Hard Disc Space.” GraVisMa 2009, Plzen, pp. 80–89. http://gravisma.zcu.cz/GraVisMa-2009/Papers_2009/!_2009_GraVisMa_proceedings-FINAL.pdf

The “how not what” philosophy: - Benger, W.: “On Safari in the File Format Jungle — Why Can’t You Visualize My Data?” IEEE Computing in Science & Engineering, Nov/Dec 2009, pp. 98–103.


Usage Restriction

This specification and all documents in the F5 specification series are published for academic and personal use. Use of this specification, or any implementation derived from it, by employees of military or defense-related organizations, or within facilities producing weapons or conducting research on weapon design, or for any other military purpose, is explicitly prohibited and contrary to the intent of the authors.

This restriction follows the spirit and terms of the light++ license under which the original F5 reference implementation was published. The rationale is stated there directly: software is technology, technology conveys power, and the inventor bears responsibility for deciding to whom that power is granted. This specification was developed to advance scientific understanding — not to enable harm.

See: https://www.fiberbundle.net/doc/copyright.html


End of Document