Spreading excellence and disseminating the cutting edge results of our research and development efforts is crucial to our institute. Check for our educational offers for Bachelor, Master and PhD studies at the University of Innsbruck!

Post-processing framework for large scale document collections

Type: 
Master
Student name: 
none

The goal of this work consists of building a post-processing framework for a large scale document collection. One part of this framework is an RDF triple store containing large amounts of meta data.  Optimizing this large scale meta data collection and access in RDF is part of the aim of the work. Single included tasks consist of:

  • developing and implementing filters and processors that do (a) run within a Web crawler and (b) partially also after the crawler, that is to post-process the crawled data. This includes, e.g., creation of additional data, dependency on additional data, data conversions etc.
  • tuning the code of the Web crawler, which stores RDF metadata about Web Services
  • making sure that only necessary data is written to RDF
  • tuning a given RDF triple store, Virtuoso, to digest as many triples as possible, verifying limits on how many triples can be stored, how many queries can be processed, defining / implementing query quotas (forbid expensive queries) etc.

Error