Conference Paper, Proceedings of the 2010 IEEE International Conference on Social Computing, Symposium Section
Authors: Matthias Broecheler, Paulo Shakarian, and V.S. Subrahmanian
Direct link to paper

Abstract
Multiple phenomena often diffuse through a social network, sometimes in competition with one another. Product adoption and political elections are two examples where network diffusion is inherently competitive in nature. For example, individuals may choose to only select one product from a set of competing products (i.e. most people will need only one cell-phone provider) or can only vote for one person in a slate of political candidate (in most electoral systems). We introduce the weighted generalized annotated program (wGAP) framework for expressing competitive diffusion models. Applications are interested in the eventual results from multiple competing diffusion models (e.g. what is the likely number of sales of a given product, or how many people will support a particular candidate). We define the “most probable interpretation” (MPI) problem which technically formalizes this need. We develop algorithms to efficiently solve MPI and show experimentally that our algorithms work on graphs with millions of vertices.

Presentation

View more presentations from Matthias Broecheler.

Conference Paper, Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, Odense, Denmark
Authors: Matthias Bröcheler, Andrea Pugliese and V.S. Subrahmanian
Direct link to paper

3rd Best Paper Award

Abstract
Subgraph matching is a key operation on graph data. Social network (SN) providers may want to find all subgraphs within their social network that “match” certain query graph patterns. Unfortunately, subgraph matching is NP-complete, making its application to massive SNs a major challenge. Past work has shown how to implement subgraph matching on a single processor when the graph has 10-25M edges. In this paper, we show how to use cloud computing in conjunction with such existing single processor methods to efficiently match complex subgraphs on graphs as large as 778M edges. A cloud consists of one “master” compute node and k “slave” compute nodes. We first develop a probabilistic method to estimate probabilities that a vertex will be retrieved by a random query and that a pair of vertices will be successively retrieved by a random query. We use these probability estimates to define edge weights in an SN and to compute minimal edge cuts to partition the graph amongst k slave nodes. We develop algorithms for both master and slave nodes that try to minimize communication overhead. The resulting COSI system can answer complex queries over real-world SN data containing over 778M edges very efficiently.

Presentation at ASONAM 2010

View more presentations from Matthias Broecheler.

Conference Paper, Proceedings of the 2010 conference on Uncertainty in Artificial Intelligence
Presented at: Conference on Uncertainty in Artificial Intelligence, held on Santa Catalina Island, CA, USA from July 8th - July 11th, 2010
Authors: Matthias Broecheler, Lilyana Mihalkova, Lise Getoor
Direct link to paper

Abstract
Many machine learning applications require the ability to learn from and reason about noisy multi-relational data. To address this, several effective representations have been developed that provide both a language for expressing the structural regularities of a domain, and principled sup- port for probabilistic inference. In addition to these two aspects, however, many applications also involve a third aspect–the need to reason about similarities–which has not been directly supported in existing frameworks. This paper introduces probabilistic similarity logic (PSL), a general-purpose framework for joint reason- ing about similarity in relational domains that incorporates probabilistic reasoning about similarities and relational structure in a principled way. PSL can integrate any existing domain- specific similarity measures and also supports reasoning about similarities between sets of entities. We provide efficient inference and learn- ing techniques for PSL and demonstrate its effectiveness both in common relational tasks and in settings that require reasoning about similarity.

COSI press coverage

In: News

2 Jul 2010

COSI, our cloud-oriented graph database framework for fast subgraph identification, has received some attention in the media.

So far, I have found two articles on the internet (1, 2) which are based on the original press release and additional interviews.

Conference Paper, Proceedings of the 8th International Semantic Web Conference, 2009, Chantilly, VA, USA
Authors: Matthias Broecheler, Andrea Pugliese, V.S. Subrahmanian
Direct link to paper

Abstract
RDF is an increasingly important paradigm for the representation of information on the Web. As RDF databases increase in size to approach tens of millions of triples, and as sophisticated graph matching queries expressible in languages like SPARQL become increasingly important, scalability becomes an issue. To date, there is no graph-based indexing method for RDF data where the index was designed in a way that makes it disk-resident. There is therefore a growing need for indexes that can operate efficiently when the index itself re- sides on disk. In this paper, we first propose the DOGMA index for fast subgraph matching on disk and then develop a basic algorithm to answer queries over this index. This algorithm is then significantly sped up via an optimized algorithm that uses efficient (but correct) pruning strategies when combined with two different extensions of the index. We have implemented a preliminary system and tested it against four existing RDF database systems developed by others. Our experiments show that our algorithm performs very well compared to these systems, with orders of magnitude improvements for complex graph queries.

Conference Paper, Proceedings of the 25th International Conference on Logic Programming, 2009, Pasadena, CA
Authors: Matthias Broecheler, Gerardo I. Simari, and V.S. Subrahmanian
Direct link to paper

Best Student paper award

Abstract
Probabilistic logic programs (PLPs) define a set of probability distribution functions (PDFs) over the set of all Herbrand interpretations of the underlying logical language. When answering a query Q, a lower and upper bound on Q is obtained by optimizing (min and max) an objective function subject to a set of linear constraints whose solutions are the PDFs mentioned above. A common critique not only of PLPs but many probabilistic logics is that the difference between the upper bound and lower bound is large, thus often providing very little useful information in the query answer. In this paper, we provide a new method to answer probabilistic queries that tries to come up with a histogram that “maps” the probability that the objective function will have a value in a given interval, subject to the above linear constraints. This allows the system to return to the user a histogram where he can directly “see” what the most likely probability range for his query will be. We prove that computing these histograms is #P-hard, and show that computing these histograms is closely related to polyhedral volume computation. We show how existing randomized algorithms for volume computation can be adapted to the computation of such histograms. A prototype experimental implementation is discussed.

Conference Publication, Principles of Knowledge Representation and Reasoning: Proceedings of the Eleventh International Conference, KR 2008, Sydney, Australia
Authors: Gerardo I. Simari, Matthias Broecheler, V.S. Subrahmanian, Sarit Kraus
Direct link to full paper.

Abstract
In this paper, we propose a theoretical framework within which to evaluate the reliability of promises that an agent makes, based on past performance of the agent. Our frame- work does not just propose one such measure, but defines axioms that govern the choice of measure. The framework is able to account for partial fulfillment of promises, late fulfillment of promises, fulfillment of variants of promises, and the like. Within this framework, we propose some specific measures to evaluate promises made by agents and develop algorithms to compute these efficiently. We tested our methods on a real world data set of airline flight information and show that our methods are both accurate and quickly computable, even on large data sets.

Technical Report, 2008, Jacobs University, School of Engineering and Science, Bremen, Germany
Authors: Matthias Bröcheler
Advisor: Professor Dr. Michael Kohlhase
Special Thanks to: Christoph Lange
Direct link to full paper.

Abstract

The body of mathematical knowledge is rapidly increasing and constantly changing. Zentralblatt MATH, an abstracting and reviewing service in the field of mathematics, maintains a database of more than 1.6 million mathematical documents and reports an annual growth by 80,000 articles. Similarly, the open internet archive for for electronic preprints of scientific papers, arXiv.org, contains close to half a million documents.
These figures suggest that neither a mathematician’s memory nor the time she devotes to studying new publications can possibly suffice to cover a significant fraction of the accumulated wealth of mathematical knowledge. In addition to the exponential increase in the amount of information, one also observes an increase in the complexity of mathematical content with more interdependencies between different areas within and beyond mathematics.
We propose a Mathematical Semantic Web to support mathematicians in efficiently managing and retrieving mathematical knowledge using computer systems and the internet. As with the World WideWeb, the Mathematical Semantic Web will allow authors of mathematics to publish their documents online which accumulate to a gigantic, decentralized and dynamic mathematical knowledge base. Authors semantically annotate their work in a special logical formalism, namely Description Logics, to allow computers to
understand the actual knowledge contained therein. Based on these annotations, computer agents reason about the mathematical knowledge and provide novel services on the Mathematical Semantic Web, giving mathematicians efficient access to vast repositories of mathematics.

This thesis first analyzes the utility of Description Logics for formalizing mathematical knowledge. This analysis concludes with the proposal of a Mathematical Semantic Web which is introduced in great detail subsequently. We elaborate on the architecture and individual building blocks before describing the authoring process for the Mathematical Semantic Web. To motivate the value of our proposal, some services operating on the Mathematical Semantic Web are specified. In an effort to improve knowledge retrieval even further, we introduce the combination of domain and structural semantics for reasoning
processes, thereby effectively leveraging additional knowledge.
Finally, we provide a list of Best Practices which aim at simplifying the knowledge modeling process and improving the quality of the resulting knowledge base. The Best Practices have been distilled from the experience gained during our case studies.

About this blog

Where is the knowledge we have lost in information?
- T.S. Elliot, The Rock

This blog revolves around the general question of how to gain knowledge from the massive amounts of information available today as well as its social and business implications and my research in the area in particular.

I am a PhD student at the University of Maryland with research interests in databases, artificial intelligence, and machine learning. Learn more