Conference Poster, International World Wide Web Conference, WWW 2012, Lyon, France
Authors: Matthias Broecheler, Andrea Pugliese, VS Subrahmanian
Direct link to paper

Abstract
The Social Semantic Web (SSW) refers to the mix of RDF data in web content, and social network data associated with those who posted that content. Applications to monitor the SSW are becoming increasingly popular. For instance, marketers want to look for semantic patterns relating to the content of tweets and Facebook posts relating to their products. Such applications allow multiple users to specify patterns of interest, and monitor them in real-time as new data gets added to the web or to a social network. In this paper, we develop the concept of SSW view servers in which all of these types of applications can be simultaneously monitored from such servers. The patterns of interest are views. We show that a given set of views can be compiled in multiple possible ways to take advantage of common substructures, and de fine the concept of an optimal merge. We develop a very fast MultiView algorithm that scalably and efficiently maintains multiple subgraph views. We show that our algorithm is correct, study its complexity, and experimentally demonstrate that our algorithm can scalably handle updates to hundreds of views on real-world SSW databases with up to 540M edges.

Poster

Even long journeys eventually come to an end. After 4 years of intensive research I successfully defended my doctoral dissertation at the University of Maryland. The thesis ties together my work on indexing and querying huge social networks and machine learning on multi-relational data under the common theme of social network data management. A lot of this work is introduced in individual blog posts on this site.

Title: Social Network Data Management.

Abstract
With the increasing usage of online social networks and the semantic web’s graph structured RDF framework, and the rising adoption of networks in various fields from biology to social science, there is a rapidly growing need for indexing, querying, and analyzing massive graph structured data. Facebook has amassed over 500 million users creating huge volumes of highly connected data. Governments have made RDF datasets containing billions of triples available to the public. In the life sciences, researches have started to connect disparate data sets of research results into one giant network of valuable information. Clearly, networks are becoming increasingly popular and growing rapidly in size, requiring scalable solutions for network data management.

This thesis focuses on the following aspects of network data management. We present a hierarchical index structure for external memory storage of network data that aims to maximize data locality. We propose efficient algorithms to answer subgraph matching queries against network databases and discuss effective pruning strategies to improve performance. We show how adaptive cost models can speed up subgraph matching query answering by assigning budgets to index retrieval operations and adjusting the query plan while executing.

We develop a cloud oriented social network database, COSI, which handles massive network datasets too large for a single computer by partitioning the data across multiple machines and achieving high performance query answering through asynchronous parallelization and cluster-aware heuristics.

Tracking multiple standing queries against a social network database is much faster with our novel multi-view maintenance algorithm, which exploits common substructures between queries.

To capture uncertainty inherent in social network querying, we define probabilistic subgraph matching queries over deterministic graph data and propose algorithms to answer them efficiently.

Finally, we introduce a general relational machine learning framework and rule-based language, Probabilistic Soft Logic, to learn from and probabilistically reason about social network data and describe applications to information integration and information fusion.

Conference Paper, International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2011, Kaohsiung, Taiwan
Authors: Matthias Broecheler, Andrea Pugliese, VS Subrahmanian
Direct link to paper

Abstract
Users querying massive social networks or RDF databases are often not 100% certain about what they are looking for due to the complexity of the query or heterogeneity of the data. In this paper, we propose “probabilistic subgraph” (PS) queries over a graph/network database, which afford users great flexibility in specifying “approximately” what they are looking for. We formally define the probability that a substitution satisfies a PS-query with respect to a graph database. We then present the PMATCH algorithm to answer such queries and prove its correctness. Our experimental evaluation demonstrates that PMATCH is efficient and scales to massive social networks with over a billion edges.

Presentation

Workshop Paper, ICDE Workshop on Graph Data Management, 2011, Hannover, Germany
Authors: Matthias Broecheler, Andrea Pugliese, VS Subrahmanian
Direct link to paper

Abstract
As social network and RDF data grow dramatically in size to billions of edges, the ability to scalably answer queries posed over graph datasets becomes increasingly important.
In this paper, we consider subgraph matching queries which are often posed to social networks and RDF databases - for such queries, we want to find all matching instances in a graph database. Past work on subgraph matching queries uses static cost models which can be very inaccurate due to long-tailed degree distributions commonly found in real world networks. We propose the BudgetMatch query answering algorithm. BudgetMatch costs and recosts query parts adaptively as it executes and learns more about the search space. We show that using this strategy, BudgetMatch can quickly answer complex subgraph queries on very large graph data. Specifically, on a real world social media data set consisting of 1.12 billion edges, we can answer complex subgraph queries in under one second and significantly outperform existing subgraph matching algorithms.

Presentation

Abstract
There is a growing interest in methods for exploiting causal or correlational dependencies in structured domains. Exploiting such dependencies often results in improved predictive performance on complex inference tasks in diverse domains such as information integration, natural language processing, and computer vision. In this presentation, we introduce probabilistic soft logic (PSL), a general-purpose framework for expressing, reasoning about and learning structural dependencies.
PSL provides a declarative language tailored to relational domains that require reasoning about similarity and/or probability. Some of the novel aspects of PSL include a representation based on continuous valued random variables, efficient polynomial-time inference algorithms, native support for reasoning about sets, and the ability to estimate confidences values for predictions. This presentation provides a detailed account of PSL covering its mathematical foundation, logic programming semantics, inference and learning algorithms, scalability through parallelization, and different applications. We close with a demonstration of the PSL system implementation.

Presentation

Workshop Paper, NIPS Workshop on Predictive Models in Personalized Medicine, 2010, Whistler, Canada
Authors: Stephen H. Bach, Matthias Broecheler, Stanley Kok, and Lise Getoor
Direct link to paper

Abstract
We introduce the concept of a decision-driven model, a probabilistic model that reasons directly over the uncertain information of interest to a decision maker. We motivate the use of these models from the perspective of personalized medicine. Decision-driven models have a number of benefits that are of particular value in this domain, such as being easily interpretable and naturally quantifying confidences in both evidence and predictions. We show how decision-driven models can easily be constructed using probabilistic soft logic, a recently introduced framework for statistical relational learning and inference which allows the specification of medical domain knowledge in concise first-order-logic rules with assigned confidence values.

The paper is presented as a poster. Follow the link for more information on decision-driven modeling in probabilistic soft logic.

Conference Paper, Advances in Neural Information Processing Systems (NIPS), 2010, Vancouver, Canada
Authors: Matthias Broecheler and Lise Getoor
Direct link to paper
Data set used in the experiments

Abstract
Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence.

Poster Presentation

Conference Paper, Proceedings of the 2010 IEEE International Conference on Social Computing, Symposium Section
Authors: Matthias Broecheler, Paulo Shakarian, and V.S. Subrahmanian
Direct link to paper

Abstract
Multiple phenomena often diffuse through a social network, sometimes in competition with one another. Product adoption and political elections are two examples where network diffusion is inherently competitive in nature. For example, individuals may choose to only select one product from a set of competing products (i.e. most people will need only one cell-phone provider) or can only vote for one person in a slate of political candidate (in most electoral systems). We introduce the weighted generalized annotated program (wGAP) framework for expressing competitive diffusion models. Applications are interested in the eventual results from multiple competing diffusion models (e.g. what is the likely number of sales of a given product, or how many people will support a particular candidate). We define the “most probable interpretation” (MPI) problem which technically formalizes this need. We develop algorithms to efficiently solve MPI and show experimentally that our algorithms work on graphs with millions of vertices.

Presentation

View more presentations from Matthias Broecheler.

Conference Paper, Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, Odense, Denmark
Authors: Matthias Bröcheler, Andrea Pugliese and V.S. Subrahmanian
Direct link to paper

3rd Best Paper Award

Abstract
Subgraph matching is a key operation on graph data. Social network (SN) providers may want to find all subgraphs within their social network that “match” certain query graph patterns. Unfortunately, subgraph matching is NP-complete, making its application to massive SNs a major challenge. Past work has shown how to implement subgraph matching on a single processor when the graph has 10-25M edges. In this paper, we show how to use cloud computing in conjunction with such existing single processor methods to efficiently match complex subgraphs on graphs as large as 778M edges. A cloud consists of one “master” compute node and k “slave” compute nodes. We first develop a probabilistic method to estimate probabilities that a vertex will be retrieved by a random query and that a pair of vertices will be successively retrieved by a random query. We use these probability estimates to define edge weights in an SN and to compute minimal edge cuts to partition the graph amongst k slave nodes. We develop algorithms for both master and slave nodes that try to minimize communication overhead. The resulting COSI system can answer complex queries over real-world SN data containing over 778M edges very efficiently.

Presentation at ASONAM 2010

View more presentations from Matthias Broecheler.

Conference Paper, Proceedings of the 2010 conference on Uncertainty in Artificial Intelligence
Presented at: Conference on Uncertainty in Artificial Intelligence, held on Santa Catalina Island, CA, USA from July 8th - July 11th, 2010
Authors: Matthias Broecheler, Lilyana Mihalkova, Lise Getoor
Direct link to paper
Data set used in the experiments

Abstract
Many machine learning applications require the ability to learn from and reason about noisy multi-relational data. To address this, several effective representations have been developed that provide both a language for expressing the structural regularities of a domain, and principled sup- port for probabilistic inference. In addition to these two aspects, however, many applications also involve a third aspect–the need to reason about similarities–which has not been directly supported in existing frameworks. This paper introduces probabilistic similarity logic (PSL), a general-purpose framework for joint reason- ing about similarity in relational domains that incorporates probabilistic reasoning about similarities and relational structure in a principled way. PSL can integrate any existing domain- specific similarity measures and also supports reasoning about similarities between sets of entities. We provide efficient inference and learn- ing techniques for PSL and demonstrate its effectiveness both in common relational tasks and in settings that require reasoning about similarity.

About this blog

Where is the knowledge we have lost in information?
- T.S. Elliot, The Rock

We are drowning in data - exabytes of it. My research explores technologies that can help us organize, structure, and efficiently search huge amounts of information as well as automatically deduce actionable pieces of knowledge from it. Learn more

I was a PhD student at the University of Maryland with research interests in databases, artificial intelligence, and machine learning. Learn more
 

My Homepage: http://www.matthiasb.com
Also on Twitter: @MBroecheler

  • Foorceazorrop: for sale is simply a sound Apple offers Ping, a music online community. This app enables you to [...]
  • Edidagmabiatt: First and foremost you should get your credit report and do a credit check on yourself. In most case [...]
  • http://smmkk.net: My brother recommended I may like this website. He used to be entirely right. This submit truly ma [...]
  • www.youtube.com: Hello There. I discovered your blog the usage of msn. This is a very smartly written article. I wil [...]
  • Iqxqcdin64: XRztslqkcx2014 Kristen has taken on the role of a soldier who is assigned to Guantanamo Bay in the f [...]