Personal tools
  •  
You are here: Home Collaborate Supplying historical Portuguese web contents

Supplying historical Portuguese web contents

Anyone can contribute for preserving the Portuguese web by supplying historical contents.

The Portuguese Web Archive (PWA) periodically crawls the Portuguese web since January 2008.

However, contents previously published must be gathered from external sources to be archived in our system. 

If you have web contents interesting for the Portuguese community and want to contribute for their preservation, please contact us.

We consider that all contents published on sites under the .PT domain are part of the Portuguese web and must be preserved. However, contents hosted under other domains, considered of interest for the Portuguese community, will also be accepted.

Should I supply only old contents?

We are interested in receiving contents that are no longer available online, independently from their publication date.

The web is extremely dynamic and the lifetime of most contents is very short.

Thus, many contents are lost because they become unavailable before we can gather them, even though we perform periodic crawls of the Portuguese web.

Backups of Portuguese web sites are a good example of contents that may be provided to the PWA.

How can I supply contents?

The PWA system stores the archived contents using the ARC format. Ideally, contents should be supplied using this format.

However, it's natural that most people do not use it to keep their files. Therefore, we accept Portuguese web contents kept in any format.

Later, the Portuguese web archive team will convert them to the ARC format, so that they can be integrated in our system.

To facilitate this task we would appreciate that the largest amount of meta-data would be supplied along with the contents, specially:

  • the web site address(es). If there are several web sites, please group the contents belonging to each one of them on a separate directory;
  • the content addresses (URL). If you are providing a local copy of a site please maintain the original file names. If you are supplying contents that you gathered from the web please provide their original URLs;  
  • the content dates. Supply the date when each content was published or saved. If you do not know the exact dates, please supply approximate dates;
  • the content media type (MIME). Please maintain the original file name extensions of the contents (e.g. .gif, .html, .jpg). If possible, provide the full HTTP header for each content. It is particularly important to provide the media type for contents dynamically generated that do not contain file name extensions.

Do not hesitate to contact us

Supplying and integrating historical web contents is a complex task. 

Please, do not hesitate to contact us to clarify any doubt.

Contributors list

We express our gratitude to the following entities for supplying contents to the Portuguese Web Archive:

FCCN - Fundação para a Computação Científica Nacional UMIC - Agência para a Sociedade do Conhecimento POSC - Programa Operacional Sociedade do Conhecimento UE - União Europeia - FEDER - Fundo Europeu de Desenvolvimento Regional