Spreading excellence and disseminating the cutting edge results of our research and development efforts is crucial to our institute. Check for our educational offers for Bachelor, Master and PhD studies at the University of Innsbruck!

Duplicate document detection

Student name: 

The goal of this work is to:

  • Evaluate the use of shingles for detection of duplicate (Web) documents provided by a crawler (paper to read: “A Large-Scale Study of the Evolution of Web Pages”).
  • Provide the implementation of an algorithm that detects duplicate documents (that is similar documents). The implementation can be based on shingles, but does not need to