Portuguese Web Archive
The Portuguese Web Archive (PWA) is a National Foundation for Scientific Computing (FCCN) project whose main objective is to preserve the information published on the Portuguese Web.
Only 20% of the URLs still reference a valid content after 1 year (Ntoulas, 2004). That is, 8 in 10 of the pages that you added on your browser Favorites will be lost after 1 year.
The Web allows people in general to make information available to everyone without having to resort to publishers and traditional printing channels. Millions of contents such as texts, photographs and videos are published on the web every day. The amount of information that is published solely on the web has grown dramatically over the past few years. However, not long after it has been published, a large amount of this information ceases to be available online and is irrevocably lost.
If we wish future generations to have access to this information, it is important to archive and preserve what is published on the web.
The present National Foundation for Scientific Computing (FCCN) project aims to create a system for archiving Portuguese Web contents. This system will periodically collect, store and preserve the information published online.
The first stage of development began in January 2008 and is expected to finish within two years. However, the maintenance of this system and the preservation of the information that has been archived is to carry on beyond that date.
What is the Portuguese Web?
Everything hosted under the .pt domain is considered to be part of the Portuguese Web.
We aim to archive only contents hosted under the national .pt domain. However, at a later stage the project may come to include all pages written in the Portuguese language.
What is the Web Archive for?
The services provided by the Portuguese Web Archive go beyond the merely historical and cultural aspects of preserving digital information. The existence of an archive for the Portuguese Web may:
- Contribute to increase the use of Portuguese as a language of communication on the Web;
- Provide access to Portuguese contents that are of interest to scientists working in fields such as History, Sociology or Web Mining;
- Help to develop local resources for dealing with information published on the web, reducing Portugal's dependence on foreign services with regard to this issue;
- Supply evidence in court cases that require information published on the web.
The Portuguese Web Archive and other Web archives
The Internet Archive collects and archives web contents published worldwide. However, it is not easy for a single organization to maintain an exhaustive archive of all contents published online because these contents are always changing and many of them will have disappeared before they can be collected and archived.
Historic events of great importance, such as the Katrina Hurricane, gave rise to extraordinary efforts by the Internet Archive, so that this episode that marked the history of the United States would be as thoroughly documented as possible.
However, the preservation of documents pertaining to historic events of national importance to Portugal is not a priority for the Internet Archive.
Communities in several countries have become aware of the urgent need to preserve information of national interest that is published on the web and have given rise to formal initiatives of preserving and cataloging digital information.
A number of parallel initiatives aimed at preserving knowledge available on the web are currently underway. Despite these efforts, the large size of the web and the relatively short amount of time most contents are posted online, makes it hard to preserve most of the published information.Archiving the Web is a worldwide endeavor.
We are working to meet this need. We count on your help.