Adequate formats for preservation
To enable the long time preservation of contents, it is advisable to use adequate formats.
Format list
Formats are classified in 3 levels of confidence for preservation: High, Medium and Low for each type of media:
What are adequate formats for preservation?
It is impossible to guess which formats will be used in future. However, there are current formats with characteristics that will facilitate their long term preservation.
Authors should publish information using adequate formats for preservation or present alternative versions of the published contents in these formats. An adequate format for preservation is:
- Freely available, without legal rights that restrict its use;
- A standard issue by an official organization (e.g. W3C);
- Openly documented through a public and free specification;
- Widely used;
- Read and written by several software platforms, including open-source technologies;
- Not compressed or compressed without information loss.
is:
- Proprietary and with a closed specification;
- Narrowly used;
- Read and written exclusively through closed-source proprietary platforms;
- Compressed with information loss;
- Composed by embedded elements, such as macros.
Text
High confidence for preservation
- HTML, XHTML or XML, with included or accessible schema and character encoding explicitly specified (.html, .xhtml, .xml)
- Plain text using charset encoding UTF-8, USASCII or UTF-16 with Byte Order Mark (.txt)
- PDF/A-1 according to standard ISO 19005-1 (.pdf)
- Open Document Text (.odt)
Medium confidence for preservation
- HTML, XHTML ou XML, without included or accessible schema and character encoding (.html, .xhtml, .xml)
- Cascading Style Sheets (.css)
- Plain text uisng charset encoding ISO-8859-1 (.txt)
- PDF with embedded fonts (.pdf)
- Rich Text Format 1.x (.rtf)
- HTML 4.x including DOCTYPE declaration (.html)
- Open Office Text Document (.sxw)
- Office Open XML (.docx)
- DTD (.dtd)
- SGML (.sgml)
Low confidence for preservation
- Microsoft Word (.doc)
- Postscript (.ps)
- PDF encrypted (.pdf)
- WordPerfect (.wpd)
- DVI (.dvi)
Image
High confidence for preservation
- PNG (.png)
- JPEG2000 compressed without information loss (.jp2)
- TIFF without compression (.tiff)
- SVG (.svg)
Medium confidence for preservation
- JPEG2000 compressed with information loss (.jp2)
- GIF (.gif)
- JPEG/JFIF (.jpg)
- TIFF compressed (.tiff)
- BMP (.bmp)
- Digital Negative (.dng)
- Computer Graphic Metafile e WebCGM (.cgm)
Low confidence for preservation
- Macromedia Flash (*.swf)
- PhotoShop (.psd)
- JPEG 2000 Part 2 (.jpf, .jpx)
- MrSID (.sid)
- TIFF in Planar format (.tiff)
- FlashPix (.fpx)
- RAW
- Encapsulated Postscript (.eps)
Audio
High confidence for preservation
- AIFF with Pulse-code modulation (.aif, .aiff)
- WAV with Pulse-code modulation (.wav, bwf)
- Ogg Vorbis (.ogg, .oga)
Medium confidence for preservation
- MP3 (MPEG-1/2, Layer 3) (.mp3)
- Free Lossless Audio Codec (.flac)
- SUN Audio uncompressed (.au)
- Standard MIDI (.mid, midi)
- Advance Audio Coding (.mp4, .m4a, .aac)
Low confidence for preservation
- RealNetworks 'Real Audio' (.ra, .rm, .ram)
- Windows Media Audio (.wma)
- WAV compressed (.wav)
- AIFC compressed (.aifc)
- NeXT SND(.snd)
Video
High confidence for preservation
- QuickTime Movie uncompressed (.mov)
- AVI uncompressed (.avi)
- Motion JPEG 2000 (ISO/IEC 15444-4) (mj2)
- Motion JPEG (.avi, .mov)
Medium confidence for preservation
- MPEG-1,MPEG-2 (.mpg, .mpeg)
- MPEG-4 (.mp4)
- Ogg Theora (.ogg, .ogm, .ogv)
Low confidence for preservation
- Windows Media Video (.wmv)
- AVI compressed (.avi)
- QuickTime Movie compressed (.mov)
- RealNetworks 'Real Video' (.rv, .rm)
Other formats
High confidence for preservation
- Comma Separated Values (.csv)
- SQL DDL
Medium confidence for preservation
- OpenOffice (.sxc/.ods, .sxi/.odp))
- OOXML segundo a norma ISO/IEC DIS 29500 (.xlsx, .pptx)
Low confidence for preservation
- Microsoft Excel (.xls)
- Microsoft PowerPoint (.ppt)
- Microsoft Access (.mdb)
- Microsoft Visio(.vsd)
The presented classification in not consensual among the scientific community. Therefore, to a deeper analysis we recommend the following bibliography.
Bibliografia
- Florida Digital Archive, Recommended Data Formats for Preservation Purposes in the Florida Digital Archive, 2008.
- IDEALS Illinois Digital Environment for Access to Learning and Scholarship, IDEALS Digital Preservation: Current Status and Future Directions, (format matrix), 2006.
- Smithsonian Institution Archives, Recommendations for converting original to preservation formats, 2004.
- Sunita Barve, File Formats in Digital Preservation, 2007.
- The National Archives, Digital Preservation Guidance Note 1: Selecting file formats for long-term preservation, 2008.
- Library of Congress, Introduction to Digital Formats for Library of Congress Collections, 2007.
- Steen S. Christensen, Archival data format requirements, 2004.
- Miguel Ferreira, Introdução à preservação digital, 2006.