In: Publication20 Dec 2011
Even long journeys eventually come to an end. After 4 years of intensive research I successfully defended my doctoral dissertation at the University of Maryland. The thesis ties together my work on indexing and querying huge social networks and machine learning on multi-relational data under the common theme of social network data management. A lot of this work is introduced in individual blog posts on this site.
Title: Social Network Data Management.
With the increasing usage of online social networks and the semantic web’s graph structured RDF framework, and the rising adoption of networks in various fields from biology to social science, there is a rapidly growing need for indexing, querying, and analyzing massive graph structured data. Facebook has amassed over 500 million users creating huge volumes of highly connected data. Governments have made RDF datasets containing billions of triples available to the public. In the life sciences, researches have started to connect disparate data sets of research results into one giant network of valuable information. Clearly, networks are becoming increasingly popular and growing rapidly in size, requiring scalable solutions for network data management.
This thesis focuses on the following aspects of network data management. We present a hierarchical index structure for external memory storage of network data that aims to maximize data locality. We propose efficient algorithms to answer subgraph matching queries against network databases and discuss effective pruning strategies to improve performance. We show how adaptive cost models can speed up subgraph matching query answering by assigning budgets to index retrieval operations and adjusting the query plan while executing.
We develop a cloud oriented social network database, COSI, which handles massive network datasets too large for a single computer by partitioning the data across multiple machines and achieving high performance query answering through asynchronous parallelization and cluster-aware heuristics.
Tracking multiple standing queries against a social network database is much faster with our novel multi-view maintenance algorithm, which exploits common substructures between queries.
To capture uncertainty inherent in social network querying, we define probabilistic subgraph matching queries over deterministic graph data and propose algorithms to answer them efficiently.
Finally, we introduce a general relational machine learning framework and rule-based language, Probabilistic Soft Logic, to learn from and probabilistically reason about social network data and describe applications to information integration and information fusion.
Where is the knowledge we have lost in information?
- T.S. Elliot, The Rock
We are drowning in data - exabytes of it. My research explores technologies that can help us organize, structure, and efficiently search huge amounts of information as well as automatically deduce actionable pieces of knowledge from it. Learn more
I was a PhD student at the University of Maryland with research interests in databases, artificial intelligence, and machine learning. Learn more