My research centers around the question of how to elicit knowledge from large amounts of information. My focus lies on richly structured information repositories, that is, a set of entities such as people, companies, events, documents and tags connected by various types of relationships such as “works for”, “knows”, “participated in”, “CEO of”, etc. Such socio-information networks are a generalization of social networks since they contain multiple types of entities (not just people), relationships, and attributes as shown in the small example network.
Socio-information networks are a concise and rich representation framework for heterogeneous or quickly evolving information repositories and therefore a suitable format to express the semi-structured data increasingly amassed by organizations, companies, and individuals.
In my research, I develop the tools and techniques to help analysts and regular users extract actionable intelligence from socio-information networks. Specifically, past and current work addresses the following aspects of socio-information network analysis:
- Storage: Building a scalable database to store socio-information networks is a challenge due to their heterogeneous nature but crucially important to efficiently execute analysis on very large networks. We are developing a cloud oriented graph database to achieve unprecedented scalability and are devising novel index structures for efficient querying.
- Search & Retrieval: We are developing efficient algorithms and index structures to answer complex queries over socio-information networks and can scale to very large data sets. In addition, we are investigating how to exploit the rich structure of the networks to provide answers to advanced queries that include structural aspects of the network.
- Data Integration and Alignment: Socio-information networks facilitate the integration of multiple information sources. However, the full benefit of data integration can only be achieved when identical entities or relationships contained in multiple information sources are correctly aligned. For large information repositories, manual alignment is prohibitively expensive. We are developing a framework for automatic data and meta-data (such as ontologies) alignment which can learn optimal alignment strategies from humans as well as from data even when faced with noisy, incomplete or uncertain data.
- Diffusion: Many phenomena, such as political opinion and product adoption, diffuse through social networks by way of social influence or recommendation. Given information about a social network of interest, we are devising a general framework to model a broad class of competitive diffusion processes, predict how they will unfold, and to compute optimal allocation of resources in the network to facilitate diffusion.
- Data Mining: We are developing techniques that exploit the conditional dependencies in socio-information networks by analyzing relationships to achieve better performance on standard data mining tasks such as classification.