E-Locus - Institutional Repository of the University of Crete

Home Trawling the Deep Web

Results - Details

[Add to Basket]

Identifier

000368509

Title

Trawling the Deep Web

Author

Γκέσιου, Ελένη Αθανάσιος

Thesis advisor

Μαρκάτος, Ευάγγελος

Abstract

In a networked world, content on the Web is blossoming and it is available to anyone who has access to a computer system and the Internet. Accessing the Web, is usually done through a search on a standard search engine, e.g. Google. But, is it only what we see online or is something hidden underneath all that information? The World Wide Web content which is not indexed by conventional search engines, is referred to the Deep Web. This master thesis constitutes an approach to explore several aspects of the DeepWeb concerning Personal Identifiable Information (PII). We conduct two immense privacy case studies that expose Personal Identifiable Information inside the Deep Web. First, we examine database content as the Deep Web. To this end, we highlight the privacy issues that have arisen from the introduction of the Greek Social Security Number (AMKA), in connection with the availability of personally identifiable information on Greek web sites. Second, we conduct another case study that refers to documents’ metadata as Deep Web content. We analyze the metadata stored in over fifteen million of documents (DOC, PDF, XLS and PPT) found online and we present the privacy leaks that emerge from the analysis. Also, we present countermeasures that shield our digital life against disclosure of sensitive information. We propose an information retrieval based method for information leak detection which constitutes an improvement of cyclical hashing so as to both accelerate leak detection and increase the accuracy of the result. Experiments were conducted on real-world data to prove the efficiency and effectiveness of the proposed solution.

Language

English

Subject

Amka

Cyclical Hashing

Deep web

Metadata

Privacy

Issue date

2011-07-15

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses