Research data vary widely between academic disciplines. There is a variety of data types and formats available. Thorough format selection ensures the reusability of research data even years later.
You should consider the following when selecting the data format:
- The format should be sustainable. Sustainability is enhanced by increasing the range of supporting software.
- The software documentation and format description should be available. This ensures that software can be developed to read the data even decades later.
- The format should be usable without legal restrictions such as licences or patents.
- The format should be readable without technical constraints such as encryption or digital rights.
- The format should only contain relevant information and avoid unnecessary formatting information such as type size in tables.
- The format should be well-established in your community.
We recommend the following data formats:
||PDF (ideal: PDF/A)
with no formatting: TXT
for editability: ODT, RTF, HTML
with formulae: LaTeX (TEX)
||CSV / TSV
numerical data: HDF5
||raster graphics: PNG, TIFF (baseline)
vector graphics: SVG, PDF (ideal: PDF/A)
||container: MKV, WebM, OGG
video codec: VP8, Theora
audio codec: FLAC, WAV (PCM data), Vorbis, Opus
||SQL Dump, XML, see also table formats
||XML, JSON, YAML