Research Thesis

Current Projects

On the Effectiveness of Private Information Retrieval Protocols

Rafi Ullah Khan (PhD Student)

Due to the exponential growth of information on the World Wide Web, Web Search Engines (WSEs) have become indispensable for the effective retrieval of information. In order to provide the results most relevant to the user, WSEs store the users profile that may contain sensitive information including the users age, gender, health condition, personal interests, religious or political affiliation, etc. However, this raises serious concerns for the privacy of the user since the identity of a user may get exposed and misused by third parties. Collaborating with the WSEs one well-known solution to preserve privacy is a private information retrieval (PIR) protocol called Useless User Profile (UUP) that issues the queries via Peer-to-Peer (P2P), thereby hiding users identity from the WSE. This thesis investigates the protection offered by UUP. For this reason, we proposed QuPiD (Query Profile Distance) Attack: a machine learning based attack is presented that evaluates the effectiveness of UUP in privacy protection. QuPiD Attack determines the distance between the user’s Profile (web search history) and upcoming query using a novel feature vector. The proposed feature vector is composed of a set of numeric values of 10 major topics acquired from uClassify service. The results show that the proposed QuPiD attack associates more than 40% queries to the correct user with a precision of over 70%. The results show that UUP does not provide satisfactory protection to users. Moreover, during the investigations, the proposed QuPiD behaved unexpectedly different in some cases, affecting its precision and recall. Upon a more detailed investigation, three reasons were found to be responsible for that behavior: (i) variable similarity score between incoming query and user profile, (ii) lack of traces of incoming query in the user profile, and (iii) presence of more than one almost similar user profiles. We call this behavior as ProQSim (Profile to Query Similarity) Effect. Therefore, we developed PEM (privacy exposure measure), a technique that minimizes the privacy exposure of a user while using the PIR protocols. PEM assesses the similarity between the users profile and query before posting to WSE and assists the user to avoid further privacy exposure.

ObSecure Logging: A Framework to Protect and Evaluate the Web Search Privacy

Mohib Ullah Khan(PhD Student)

The Web Search Engine (WSE) is an essential software system used by people around the world to retrieve data from the web. To successfully retrieve the data, it uses the user’s search queries to build the user’s profile and provide personalized results. Users’ search queries hold identifiable information that could compromise the privacy of the respective user. However, for a variety of reasons, preserving privacy in web search is the main concern of every user. This thesis investigates the distributed privacy-preserving protocol and proposes a single group ObSecure Logging protocol (OSLo), a multi-group distributed protocol (MG-OSLo) and a profile aware distributed protocol (PaOSLo) to preserve the web search privacy of a user. MG-OSLo measures the impact of the multi-group on the user’s privacy. The primary objective of this thesis is to assess the local privacy and profile privacy of a user through unlinkability and indistinguishability. The subsequent objective is to evaluate the impact of group size, group count and profile aware grouping on the local privacy and profile privacy of a user. Local privacy is evaluated using probabilistic advantage a curious entity has in linking query to the user. In MG-OSLo, users are grouped using non-overlapping group design and overlapping group design to measure the impact of the group count and group size on the privacy of a user. The investigation reveals the probability of linking the query to the user depends on the group size and group count. The higher the group size or group count, the lower the probability of linking the query to the user. The profile privacy calculates the level of profile obfuscation using a privacy matrix Profile Exposure Level (PEL). Experiments are performed in order to evaluate i) when self-query submission is allowed and ii) self-query submission is not allowed over the subset of an AOL query log to estimate the profile privacy. The privacy achieved by the proposed protocols is compared with the state-of-the-art privacy-preserving protocol UUP. The results show that the OSLo provides better results as compared to the UUP. Similarly, the multi-group has a positive impact on the local privacy and the profile privacy of a user. The profile aware grouping (PaOSLo) further improves the profile privacy as compared to UUP and OSLo when simulated with the same dataset.

Past Project

Exploiting Temporal Features of News Documents in Time-aware Information Retrieval

Shafiq Ur Rehman (PhD Student)

This thesis exploits the temporal features of the news documents to improve the retrieval effectiveness of IR systems. As best to my knowledge, this thesis is the pioneer study that focuses on the problem of temporal specificity in news documents. I propose and evaluate novel approaches to determine the temporal specificity in news documents. Thereafter, these approaches are utilized to classify news documents into three novel temporal classes. Furthermore, the study also considers 24 implicit temporal features of news documents to classify in to; a) High Temporal Specificity (HTS), b) Medium Temporal Specificity (MTS), and c) Low Temporal Specificity (LTS) classes. For such classification, Rule-based and Temporal Specificity Score (TSS) based classification approaches are proposed. In the former approach, news documents are classified using a proposed set of rules that are based on temporal features. The later approach classifies news documents based on a TSS score using the temporal features. The results of the proposed approaches are compared with four Machine Learning classification algorithms: Bayes Net, Support Vector Machine (SVM),Random Forest and Decision Tree. The outcomes of the study indicate that the proposed rule-based classifier outperforms the four algorithms by achieving 82% accuracy, whereas TSS classification achieves 77% accuracy.

Opinion Based Entity Ranking Using Data Mining Techniques

Tayyaba Sehar (MS Student)

The era of social computing has kindled massive growth of opinions and reviews on the web, including reviews on business, products and opinion about people. The vast amount of opinions expressed by experts and ordinary users can be very useful to help people make all kinds of decision ranging from what to buy. For example shoppers at Amazon typically would read the reviews about a product before buying it, and travelers may rely on opinions about hotels on trip advisor. Opinion based entity ranking is a retrieval task of information retrieval for automatically ranking entities on the basis of opinions. Opinion based entity ranking directly ranks entities based on how well the opinions about these entities match with user’s own preference. The research task focus is to represent each entity with the text of all its user’s opinions. Then given a user’s search query where keywords of a query represent aspects we are interested in regarding entities. By performing sentiment analysis or by calculation sentiment polarity of opinions we can then rank the relevant entities based on how well opinions of entities match the user search preference. By doing this automatic ranking system the user can only focus on a much smaller set of top retrieved relevant entities that matches with his/her preferences based on the judgment of sentiment polarity.

Adaptive Routing Protocol for Underwater Wireless Sensor Network

Nakhshab Hussain (MS Student)

Spatially dispersed sensors for monitoring environmental and physical conditions like Air Pollution, Natural Disaster, and Water quality etc. make a wireless sensor network, also known as wireless sensor and actor network (WSAN). Now the research on wide range applications of wireless sensor networks is most prominent. In WSNs, each sensor node is connected to one or more sensors having capability of sensing, processing and transmitting information through proper communication. Earth surface is covered with water of more than 70%. Development of new technology requires exploring this vast unexplored area. A combination of anchored nodes on sea bed and floating sensor nodes, which are connected with other gateways through acoustic link, form an underwater environment. Underwater wireless sensor networks are proposed for enabling various application like disaster prevention, pollution monitoring, assisted navigation, offshore explorations, tsunami warnings etc. Sensors are also attached with many autonomous underwater vehicles (AUVs) to explore natural underwater resources. Underwater network design issues like deployment differences, cost differences, power necessity, mobility etc. hits the routing in underwater wireless sensor network. In this Study, We want to adaptively select a path that will address the mobility of underwater wireless sensor network. Protocol will take advantage of both VBF and DBR and will evaluate and argue the combination of these protocols will increase the delivery ratio of packets to the sinks on water surface as well as will decrease the energy consumption while achieving reliable routing paths.

Leaf based Cotton Crop disease detection

Syed Ali Sajjad Rizvi

Zeeshan Ahmed

This project will be useful for the workers of pesticide companies or persons whose job is relevant to crops in disease detection of cotton crops. This software will help them to detect disease at once in cotton crop by finding the visual effects of images.

JavaSymphony: A new Java based programming model for shared, distributed and hybrid memory multi-/many-core parallel computers, and coprocessors accelerators as an extension to the existing Java distributed programming environment. Using JavaSymphony, a parallel Java application can be uniformly programmed and executed on variety of multi-/many-core architectures. Heterogeneous conventional and data-parallel multi-core devices can be programmed using a unique and high-level Java programming abstraction that shields the user from the low-level architectural details such as method invocations, thread management, synchronisations, memory allocations, and data transfers. JavaSymphony’s design is based on the concept of dynamic virtual architectures, which allow programmers to define a hierarchical structure of the underlying computing resources (e.g., accelerators, cores, processors, machines, and clusters), and to control load balancing and locality. JavaSymphony provides high-level programming constructs which abstract low-level details and simplify the tasks of controlling parallelism, locality, and load balancing. Moreover, JavaSymphony provides a multi-core aware scheduling mechanism capable of mapping parallel Java applications on large multi-core machines and heterogeneous clusters with improved performance. JavaSymphony scheduler considers several multi-core specific performance parameters and application types, and uses these parameters to optimise the mappings of applications, objects, and tasks.

Thesis Template