Institutional collaboration
Scientific and cultural institutions can contribute to the development of the Portuguese Web Archive.
The Portuguese Web Archive project includes research and development tasks. Given the scope of the project and the challenges that must be overcome, collaborations with institutions external to FCCN may yield interesting results to all. Next, we present tasks that may be implemented as part of research and development projects:
- Requirement analysis. To create a Web Archive useful to the community it is necessary to understand the users’ needs and expectations. The Web Archive may be useful to common citizens or researchers from several areas such as historians, linguists and sociologists, which have different requirements and expectations regarding the system. Performing studies to identify different users’ profiles of a web archive would be a valuable contribution;
- Analysis of the Portuguese web. We will periodically generate characterization reports for the Portuguese web. Further research over the archived data sets is interesting and we are particularly interested in quality-related studies of the the Portuguese web, such as measuring the accessibility of Portuguese web contents;
- Testing of developed systems. The developed systems will be thoroughly tested before being released to the public. The participation of people with critical sense is crucial to detect required improvements, for instance, in terms of usability and systems security;
- Textual information retrieval over historical collections. In addition to archiving the information published on the web it is crucial to maintain it accessible. The algorithms currently used by search engines address only a single web collection and do not consider the existence of historical content collections incrementally built across time. The search for information over web historical archives is a complex problem and research on this has just begun;
- Image search. A picture is worth a thousand words, but sometimes a thousand words are not enough to find the image we want. Web search engines look for images based on the texts associated with them. However, making this association is not trivial and often it generates erroneous results. The study of efficient mechanisms to enhance the extraction or association of texts to images could lead to an additional search service in our project. The Portuguese Web Archive will hold a large amount of images that will enable the development and testing of new image search algorithms using real data;
- Video search. The amount of videos available on the Web has increased significantly during the last years. Information formerly published on text, such as user manuals, is now frequently published as videos. However, as it happens in image search, current search services only process the texts associated with the videos, and do not allow searching for information within the videos. Moreover, the results refer full videos, which requires the users to watch the whole video, even when they are only interested in the information contained in a few seconds of it. Thus, it is more difficult and time consuming to identify relevant information contained within a video than within a text. Research in mechanisms to enable information search within videos is an interesting field;
- Appropriate user interfaces to search archived information. The usability of information systems user interfaces has repeatedly proven to be a key factor for the success of a project. The study of a user interface and middleware to provide access to the archived information is a challenging task, involving extensive research and testing with real users.
If you work in these fields, or others that you may find relevant for the project, and you are interested in collaborating with the Portuguese Web Archive, please contact us. If your organization is concerned with the preservation of the Portuguese history and culture you may join the rARC project providing space for storing backup copies of the Portuguese Web Archive.