Ferramentas Pessoais
  •  
Você está aqui: Entrada Collaborate Recommendations for web authors to enable web archiving Adequate formats for preservation

Adequate formats for preservation

To enable the long time preservation of contents, it is advisable to use adequate formats.

Format list

Formats are classified in 3 levels of confidence for preservation: High, Medium and Low for each type of media:

What are adequate formats for preservation?

It is impossible to guess which formats will be used in future. However, there are current formats with characteristics that will facilitate their long term preservation. 

Authors should publish information using adequate formats for preservation or present alternative versions of the published contents in these formats. An adequate format for preservation is:

  • Freely available, without legal rights that restrict its use;
  • A standard issue by an official organization (e.g. W3C);
  • Openly documented through a public and free specification;
  • Widely used;
  • Read and written by several software platforms, including open-source technologies;
  • Not compressed or compressed without information loss.
On the other hand, a format with weak characteristics for preservation

is:

  • Proprietary and with a closed specification;
  • Narrowly used;
  • Read and written exclusively through closed-source proprietary platforms;
  • Compressed with information loss;
  • Composed by embedded elements, such as macros.

Text

High confidence for preservation

  • HTML, XHTML or XML, with included or accessible schema and character encoding explicitly specified (.html, .xhtml, .xml)
  • Plain text using charset encoding UTF-8, USASCII or UTF-16 with Byte Order Mark (.txt)
  • PDF/A-1 according to standard ISO 19005-1 (.pdf)
  • Open Document Text (.odt)

Medium confidence for preservation

  • HTML, XHTML ou XML, without included or accessible schema and character encoding (.html, .xhtml, .xml)
  • Cascading Style Sheets (.css)
  • Plain text uisng charset encoding ISO-8859-1 (.txt)
  • PDF with embedded fonts (.pdf)
  • Rich Text Format 1.x (.rtf)
  • HTML 4.x including DOCTYPE declaration (.html)
  • Open Office Text Document (.sxw)
  • Office Open XML (.docx)
  • DTD (.dtd)
  • SGML (.sgml)

Low confidence for preservation

  • Microsoft Word (.doc)
  • Postscript (.ps)  
  • PDF encrypted (.pdf)
  • WordPerfect (.wpd)
  • DVI (.dvi)

Image

High confidence for preservation

  • PNG (.png)
  • JPEG2000 compressed without information loss (.jp2)
  • TIFF without compression (.tiff)
  • SVG (.svg)

Medium confidence for preservation

  • JPEG2000 compressed with information loss (.jp2)
  • GIF (.gif)
  • JPEG/JFIF (.jpg)
  • TIFF compressed (.tiff)
  • BMP (.bmp)
  • Digital Negative (.dng)
  • Computer Graphic Metafile e WebCGM (.cgm)

Low confidence for preservation

  • Macromedia Flash (*.swf)
  • PhotoShop (.psd)
  • JPEG 2000 Part 2 (.jpf, .jpx)
  • MrSID (.sid)
  • TIFF in Planar format (.tiff)
  • FlashPix (.fpx)
  • RAW
  • Encapsulated Postscript (.eps)

Audio

High confidence for preservation

  • AIFF with Pulse-code modulation (.aif, .aiff)
  • WAV with Pulse-code modulation (.wav, bwf)
  • Ogg Vorbis (.ogg, .oga)

Medium confidence for preservation

  • MP3 (MPEG-1/2, Layer 3) (.mp3)
  • Free Lossless Audio Codec (.flac)
  • SUN Audio uncompressed (.au)
  • Standard MIDI (.mid, midi)
  • Advance Audio Coding (.mp4, .m4a, .aac)

Low confidence for preservation

  • RealNetworks 'Real Audio' (.ra, .rm, .ram)
  • Windows Media Audio (.wma)
  • WAV compressed (.wav)
  • AIFC compressed (.aifc)
  • NeXT SND(.snd)

Video

High confidence for preservation

  • QuickTime Movie uncompressed (.mov)
  • AVI uncompressed (.avi)
  • Motion JPEG 2000 (ISO/IEC 15444-4) (mj2)
  • Motion JPEG (.avi, .mov)

Medium confidence for preservation

  • MPEG-1,MPEG-2 (.mpg, .mpeg)
  • MPEG-4 (.mp4)
  • Ogg Theora (.ogg, .ogm, .ogv)

Low confidence for preservation

  • Windows Media Video (.wmv)
  • AVI compressed (.avi)
  • QuickTime Movie compressed (.mov)
  • RealNetworks 'Real Video' (.rv, .rm)

Other formats

High confidence for preservation

  • Comma Separated Values (.csv)
  • SQL DDL

Medium confidence for preservation

  • OpenOffice (.sxc/.ods, .sxi/.odp))
  • OOXML segundo a norma ISO/IEC DIS 29500 (.xlsx, .pptx)  

Low confidence for preservation

  • Microsoft Excel (.xls)
  • Microsoft PowerPoint (.ppt)
  • Microsoft Access (.mdb)
  • Microsoft Visio(.vsd)

The presented classification in not consensual among the scientific community. Therefore, to a deeper analysis we recommend the following bibliography.

Bibliografia

FCCN - Fundação para a Computação Científica Nacional UMIC - Agência para a Sociedade do Conhecimento POSC - Programa Operacional Sociedade do Conhecimento UE - União Europeia - FEDER - Fundo Europeu de Desenvolvimento Regional