Personal tools
  •  
You are here: Home About How does the web archive work? rARC rARC: ARC replicator

rARC: ARC replicator

RARC is a pioneer system being developed within the Portuguese Web Archive project. Its main goal is to enable Internet users to provide storage space from their computers to help preserve web contents for the future.

What is rARC?

Periodically the Portuguese web is collected and stored in a central repository. This process requires a large amount of disk space to store and replicate each collection of contents.

RARC is a distributed system that will enable Internet users to provide storage space from their computers to replicate small parts of the archived data. Only the installation of a software application is required.

If the information stored in the central repository is lost due, for instance, to the occurrence of a natural disaster, the historical contents may be recovered from the replicas stored across the Internet users computers.

Why is it called rARC?

This system was named rARC, which means replicator of ARC files. 

The Internet Archive started its mission to preserve information published on the web in 1996. The contents collects are grouped in files that follow the ARC format. This format is currently used by several web archiving initiatives for historical reasons and because tools to handle these files have been created over time.

We hope rARC will join this set of tools and become useful to the web archiving international community.

How does it work?

RARC presents a client-server architecture. A rARC server is installed in the central repository of the web archive to replicate the stored ARC files. Internet users install client applications on their computers to kept backup copies of the archived contents. The client applications communicate with the rARC server to receive authentication credentials and then download ARC files from the server.

Every time the client communicates with the server it is informed about the process being executed: backup copy or recovery.

  • If the serve is performing backup copies, the client will download the archive information.
  • If the server is recovering from a loss of information, the client application will upload its backup copies to the server.

Periodically the client communicates with the server to verify the status of the local backup copies, allowing the server to verify its integrity.

Who will be able to store backup copies?

Any individual or institution can contribute to preserve the Portuguese web. We count on the collaboration of world-wide citizens aware of the need for a community effort to preserve the Web, as well as the collaboration of organizations with concerns and responsibilities for the preservation of the Portuguese culture and history.

We hope that rARC will also be used to exchange backup copies between web archives. For instance, Portuguese Web Archive could provide space to store contents from a Brazilian web archive, and vice-versa.

In case of disaster, one of the archives could be reconstructed from backup copies stored in remote countries where the disaster would have not had effects.

Why should I give disk space from my computer?

We hope citizens aware of the importance of History to join the project.

The “reward” is only to collaborate in the preservation of humankind knowledge.

We will publish a list of all the rARC contributers in the Portuguese web archive site and a rank of the most generous and persistent ones. Every week, a contributor will be focused on the project site.

How much space must I provide?

At least 100 MB, the space required to store one ARC file.

In February 2008, a common desktop contains a 320 GB disk. A backup copy of 100 MB represents only 0,03% of this disk space.

The more space you provide, the more likely we will be able to preserve the information published on the web for the future. We count on you.

Will my computer run slower?

The RARC application installed in your computer will have a minimum impact on its performance. This application is idle most of the time, after it downloads the backup copies from the central repository, it only performs occasional operations to verify the integrity of its backup copies.

Will I have to keep my backup copies of the archive forever?

No.

It is perfectly natural that after some time people change their computer, decide to uninstall software or simply lose interest in supporting the rARC project.

However, during the time that people maintained the backup copies stored on their computers they contributed to preserve History, because if a problem with the central repository had occurred, these copies would have been crucial to prevent valuable historical information from getting lost.

You may free up disk space at any time but please use the uninstall option so that we can keep an estimate of the number of active copies.

Please do not delete rARC related files directly from your disk.

Will there be made backup copies of the whole Archive?

It will only be possible to create backup copies of the whole archive if enough space is provided by the Internet users. The amount of replicated data will follow the growth of the rARC community.

Nonetheless, even if it won't be possible to replicate the whole archived collection, rARC will provide a valuable contribution if it prevents the total loss of historical data in case of destruction of the central repository.

What are the main features of rARC?

The main features of rARC are:

Scalability - At a first stage rARC must be scalable to thousands of storage nodes;

Security - The information kept across the Internet storage nodes cannot be accessed by the users and the system must be robust against malicious users. RARC must guarantee that there were created a minimum number of replicas for a given ARC file and that they are not corrupted;

Usability - Internet users must be able to join a replication initiative and provide diskspace as easily as possible;

Customization - To enable its usage in independent web archiving initiatives.

Can I develop for rARC?

Yes.

RARC is an open source project available in SourceForge. Development of new features, error detection and corrections are welcome.

Please visit SourceForge to find out how you can participate and learn technical details about the project.

Is rARC only for the Portuguese Web Archive?

No.

RARC has free open source code. It may be used by any web archive initiative that stores the collected contents in the ARC format.

May it be used to replicate other content types?

In theory yes, but we haven't tried it yet.

 

FCCN - Fundação para a Computação Científica Nacional POSC - Programa Operacional Sociedade do Conhecimento UMIC - Agência para a Sociedade do Conhecimento UE - União Europeia - FEDER - Fundo Europeu de Desenvolvimento Regional