Your browser does not support JavaScript!

Home    Trawling the Deep Web  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000368509
Title Trawling the Deep Web
Author Γκέσιου, Ελένη Αθανάσιος
Thesis advisor Μαρκάτος, Ευάγγελος
Abstract In a networked world, content on the Web is blossoming and it is available to anyone who has access to a computer system and the Internet. Accessing the Web, is usually done through a search on a standard search engine, e.g. Google. But, is it only what we see online or is something hidden underneath all that information? The World Wide Web content which is not indexed by conventional search engines, is referred to the Deep Web. This master thesis constitutes an approach to explore several aspects of the DeepWeb concerning Personal Identifiable Information (PII). We conduct two immense privacy case studies that expose Personal Identifiable Information inside the Deep Web. First, we examine database content as the Deep Web. To this end, we highlight the privacy issues that have arisen from the introduction of the Greek Social Security Number (AMKA), in connection with the availability of personally identifiable information on Greek web sites. Second, we conduct another case study that refers to documents’ metadata as Deep Web content. We analyze the metadata stored in over fifteen million of documents (DOC, PDF, XLS and PPT) found online and we present the privacy leaks that emerge from the analysis. Also, we present countermeasures that shield our digital life against disclosure of sensitive information. We propose an information retrieval based method for information leak detection which constitutes an improvement of cyclical hashing so as to both accelerate leak detection and increase the accuracy of the result. Experiments were conducted on real-world data to prove the efficiency and effectiveness of the proposed solution.
Language English
Subject Amka
Cyclical Hashing
Deep web
Metadata
Privacy
Issue date 2011-07-15
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 660

Digital Documents
No preview available

Download document
View document
Views : 67