Área Restrita  |

Workshop Brazil – Cornell

Location:
CS Department, Cornell University
May 7-8, 2012

Organizing Committee:
John Hopcroft (Cornell, USA)
Robert Kleinberg (Cornell, USA)
Nivio Ziviani (UFMG, Brazil)

Goal:
The Brazil-Cornell Workshop will be held at the CS Department from Cornell University on May 7-8, 2012. The goal is to build stronger ties between computer scientists at Cornell and those in Brazil. From the Brazilian side eleven professors from seven universities (UFAM, UFMG, UFRGS, UFRJ, UNICAMP, USP/São Carlos, USP/São Paulo) will participate. Brief research presentations by faculty and students will be interspersed with opportunities for informal interactions, meals, social activities, and enjoying the beautiful scenery of Cornell and Ithaca.

Participants:

  • Guido Araujo (UNICAMP, Brazil)
  • Claire Cardie (Cornell, USA)
  • Johannes Gehrke (Cornell, USA)
  • Robert Kleinberg (Cornell, USA)
  • Yoshiharu Kohayakawa (USP/SP, Brazil)
  • Thorsten Joachims (Cornell, USA)
  • Alberto Laender (UFMG, Brazil)
  • Luis da Cunha Lamb (UFRGS, Brazil)
  • Wagner Meira Jr (UFMG, Brazil)
  • Edleno Silva de Moura (UFAM, Brazil)
  • Noah Snavely (Cornell, USA)
  • Altigran Soares da Silva (UFAM, Brazil)
  • Jayme Szwarcfiter (UFRJ, Brazil)
  • Caetano Traina Jr (USP/SC, Brazil)
  • Adriano Veloso (UFMG, Brazil)
  • Nivio Ziviani (UFMG, Brazil)

Schedule:

Talks:

  1. Title: Computer Systems Research at the University of Campinas and Science without Borders (SwB) Funding Opportunities. Guido Araujo, UNICAMP, Brazil.
    Abstract: This talk is divided in two parts. In the first part we will introduce the Computer Systems Laboratory (LSC) at the University of Campinas (Unicamp). LSC main research areas include: digital systems design and simulation, compiler optimizations, code generation and code compression algorithms, dynamic compilation and architecture evaluation. Samples of typical research projects developed at LSC will be described. In the second part of the talk, we will introduce Science without Borders (SwB), a scholarship program funded by Brazilian R&D agencies, which could be used to support a potential partnership between Cornell University and Brazilian institutions.
    Bio: Guido Araujo is a full professor of Computer Architecture and the head of the Computer Systems Laboratory (LSC) at the University of Campinas (Unicamp) . His main research interests are in computer design and optimization and code optimization. His research projects have involved companies from embedded (Conexant) computer (Intel) and consumer electronic (Samsung) areas.
  2. Title: Declarative Data-Driven Coordination. Johannes Gehrke, CS Department, Cornell.
    Abstract: There are many web applications that require users to coordinate and communicate. Friends want to coordinate travel plans, students want to jointly enroll in the same set of courses, and busy professionals want to coordinate their schedules. These tasks are difficult to program using existing abstractions provided by database systems since they all require some type of coordination between users. However, this type of information flow is fundamentally incompatible with classical isolation in database transactions. In this talk, I will argue that it is time to look beyond isolation towards principled and elegant abstractions that allow for communication and coordination between some notion of (suitably generalized) transactions. This new area of declarative data-driven coordination is motivated by many novel applications and is full of challenging research problems. This talk describes joint work with Gabriel Bender, Nitin Gupta, Christoph Koch, Lucja Kot, Milos Nikolic, and Sudip Roy.
  3. Title: Random contagion and segregation processes in graphs. Bobby Kleinberg, CS Department, Cornell.
    Abstract: We will survey some recent results concerning random processes on graphs, inspired by questions arising in the social sciences. In a threshold contagion process, an epidemic spreads through the nodes of a graph; each node’s probability of becoming infected depends on the number of infected neighbors. Guided by network design and game theory, our perspective centers on questions such as: for a given number of edges, what is the network structure that minimizes the spread of contagion? To what extent do the networks resulting from selfish behavior of individual nodes differ from this optimum? The second part of the talk concerns Schelling’s segregation model, a classic model from economics in which segregation on a macroscopic scale arises from the random decisions made by individuals each of whom only cares about the residents of a small neighborhood in the vicinity of their home. Our analysis provides a mathematically rigorous quantification of the extent of segregation in a one-dimensional version of the process.
  4. Title: Testing permutation properties through subpermutations. Yoshiharu Kohayakawa, University of São Paulo
    Abstract: The theory of graph limits has shown to have interesting consequences in the area of graph property testing. We shall discuss a theory of permutation limits, with consequences to property testing of permutations.
    Bio: Yoshiharu Kohayakawa is a Professor at the Institute of Mathematics and Statistics of the University of São Paulo. He mainly works in extremal and probabilistic combinatorics. He is currently on the editorial boards of Combinatorics, Probability and Computing and Random Structures and Algorithms, and is one of the managing editors of the Journal of Combinatorial Theory, Series B.
  5. Title: Co-Active Learning for Optimizing Retrieval Systems. Thorsten Joachims, CS Department, Cornell.
    Abstract:Many systems, ranging from search engines to smart homes, aim to continually improve the utility they are providing to their users. While clearly a machine learning problem, it is less clear what the interface between user and learning algorithm should look like. Focusing on learning problems that arise in recommendation and search, this talk explores how the interactions between the user and the system can be modeled as an online learning process. In particular, the talk proposes a new online learning model – called co-active learning – for learning from implicit feedback. Co-active learning interprets implicit user feedback as preference statements, and we provide learning algorithms with bounded regret for (approximately) rational users.
  6. Title: The UFMG Database Research Lab. Alberto H. F. Laender, CS Department, UFMG, Brazil.
    Abstract: The UFMG Database Research Lab was established in 1984 and has become a reference in Latin America. Its research lines range from more traditional database topics, such as conceptual modeling, query interfaces and geographic databases, to Web-related topics, such as digital libraries, Web data management and Web information retrieval. In this talk, we will present an overview of the lab’s activities and describe some results of recent work.
    Bio: Alberto H. F. Laender is a Professor of Computer Science at the Universidade Federal de Minas Gerais (UFMG), Brazil. He holds a BS degree in Electrical Engineering and an MSc in Computer Science, both from UFMG, and a PhD degree in Computing from the University of East Anglia, UK. He served as member of the ACM SIGMOD’s Advisory Board and also of the SIGMOD’s Jim Gray PhD Dissertation Award Committee. He is a founder-member of the Brazilian Computer Society and one of the co-founders of Akwan Information Technologies, a Brazilian search technology company that was acquired by Google Inc. in 2005 to become its Research and Development Center for Latin America. Prof. Laender is a member of the Brazilian Academy of Science and in 2010 was awarded the National Order of the Scientific Merit. He is the author of more than 150 refereed journal and conference papers and his research interests include conceptual modeling and database design, web data management, web information systems and digital libraries.
  7. Title: Research Opportunities at the Institute of Informatics, UFRGS, Brazil. Luis da Cunha Lamb, UFRGS, Brazil.
    Abstract: Research at the Institute of Informatics, UFRGS is organized into 5 core areas: Artificial Intelligence, Computer Engineering, Computing Theory, Computer Systems and Information Systems. I will highlight recent research results and point out several opportunities for research cooperation, faculty and student interchange.
    Bio: Luis Lamb is Professor and Dean of the Institute of Informatics at the Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil. He holds a PhD from Imperial College London (2000). His research interests include Logic in Computer Science, Artificial Intelligence and Social Computing.
  8. Title: Mining Social Network Graphs. Wagner Meira Jr., Universidade Federal de Minas Gerais
    Abstract: The growth of the Internet in terms of both the number of users and diversity of their activities require new models, algorithms and techniques to understand their behavior and to enable novel and/or relevant applications. In the context of social networks, graphs are being employed to represent all kinds of information and mining those graphs is still challenging, despite being a popular strategy. In this talk we discuss graph mining models and techniques for determining structural correlations and performing non-textual sentiment analysis, as well as the results of their application to real data.
    Bio: Wagner Meira Jr. is currently Associate Professor at the CS Dept at Universidade Federal de Minas Gerais, Brazil. He also leads the Knowledge Discovery group at InWeb – the Brazilian National Institute of Science and Technology for the Web. His research focuses on scalability and efficiency of large scale parallel and distributed systems and on data mining models and algorithms, their parallelization, and application to areas such as social networks, information retrieval, bioinformatics, and e-governance.
  9. Title: Learn to Pre-compute Evidence Fusion for Efficient Query Evaluation. Edleno Silva de Moura, UFAM, Brazil.
    Abstract: State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high quality ranking of results for each query. The fusion of information is currently done at query processing time, which has direct impact on response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to pre-compute term impacts at indexing time. In this talk, I present an alternative to pre-compute term impacts recently proposed by our research group. This new alternative provides a generic framework for combining any distinct set of sources of evidence by using a machine learning technique and retains the advantages of producing high quality results, but avoiding the costs of combining evidence at query processing time. In our proposal, the pre-computed impact values are indexed and used later for computing document ranking at query processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
    Bio: Edleno Silva de Moura is an Associate Professor of Computer Science at the Federal University of Amazonas (UFAM) in Brazil, where he heads the Database and Information Retrieval Group (BDRI). He received a PhD in Computer Science from the Federal University of Minas Gerais (UFMG), Brazil, in 1999, where his research activities were focused on applications of data compression for information retrieval systems. After finishing his PhD, he worked as the Chief Technology Officer for Akwan Information Technologies, a company specialized in developing information retrieval systems for the Web. He is member of Brazilian Academy of Sciences, and member of the program committee of important conferences on the information retrieval area, such as ACM SIGIR, ACM CIKM, ACM WSDM and WWW Conference. He is also the author of several papers in journals and conference proceedings covering topics related to information retrieval area, such as efficiency issues in information retrieval, text indexing, ranking algorithms, text classification, text compression, and related areas.
  10. Title: Methods and Techniques for Information Extraction for Text Segmentation (IETS). Altigran Soares da Silva, Institute of Computing, UFAM, Brazil
    Abstract: The growing use of text files for information exchange, such as HTML pages, XML documents, e-mail, blogs posts, tweets, RSS and SMS messages, has brought many problems related to how properly exploit the information implicitly contained therein. In particular, problems related to Information Extraction from such sources has motivated many studies in various scientific communities in areas such as Databases, Data Mining, Information Retrieval and Artificial Intelligence. We overview recent methods and techniques in the literature to address the problem of Information Extraction by Text Segmentation (IETS), which consists in extracting values of interest organized into semi-structured records (e.g., postal addresses, bibliographic citations, classified ads, etc.), implicitly present in textual sources.
    Bio: Altigran Soares da Silva received his PhD degree in computer science from the Universidade Federal de Minas Gerais, Brazil. He is an associate professor of Computer Science at the Universdade Federal do Amazonas, Brazil. His research interests include web data management, web data extraction, web information retrieval, and digital libraries.
  11. Title: The Distributed Camera: Modeling the world from online photos. Noah Snavely, CS Department, Cornell
    Abstract: We live in a world of ubiquitous imagery, in which the number of images at our fingertips is growing at a seemingly exponential rate.  These images come from a wide variety of sources, including Google Maps and related sites, webcams, and millions of photographers around the world uploading billions and billions of images to photo-sharing websites.  Taken together, these sources of imagery can be thought of as constituting a distributed camera capturing the entire world at unprecedented scale, and continually documenting its cities, mountains, buildings, people, and events.  In this talk I will talk about how we might use this distributed camera as a fundamental new tool for science, engineering, and environmental monitoring, and how a key problem is *calibration* — determining the exact location in the world, orientation, and time of each photograph, and relating it to all other photos, in an efficient, automatic way.  I will describe our ongoing work on new computer vision algorithms for solving this calibration problem.
  12. Title: Arboricity, h-Index and Graph Algorithms. Jayme L Szwarcfiter, Federal University of Rio de Janeiro, Brazil.
    Abstract: We propose a new data structure for manipulating graphs, which is particularly suited for designing dynamic algorithms. We describe the basic operations related to dynamic applications, as insertions and deletions of vertices or edges, and adjacency queries. Using the proposed tool, we develop several dynamic algorithms for solving problems as listing the cliques of a given size, recognizing diamond- free graphs, and finding simple, simplicial and dominated vertices. On the other hand, using the data struicture we also describe new static algorithms for the problems of counting subgraphs of size 4, recognizing cop-win graphs and recognizing strongly chordal graphs. Most of the above dynamic algorithms are the first of their kind in the literature, while the static algorithms improve in some aspect over existing methods.
    Bio: Jayme Szwarcfiter is an Emeritus Professor at the Federal University of Rio de Janeiro, Brazil. He is a member of the Brazilian Academy of Sciences and has received different awards, as the Prize of Scientific Merit of the Brazilian Computer Science Society. His interests include algorithms and graph theory.
  13. Title: Similarity search and mining in large databases of scientific data. Caetano Traina Jr. University of São Paulo, SP, Brazil
    Abstract: Storing, retrieving and analyzing scientific data is a hard endeavor because: the amount (cardinality) of data can be huge; the dimensionality of the elements tends to be large; and the correlations among the data elements cover all the distribution models available. We are tackling the problem from the similarity search approach. In a large scale, similarity-based algorithms can benefit from any correlations existing in the data, enable to focus only in the part of the data space where answers probably are, and gives to the user tools to locate the most promising dimensions. Thus it may be employed to help analyzing the data regardless of its distribution and sizes, and it also helps reducing the number of elements that need to be handled to answer a query. In this talk we present the main research lines being developed at GBdI on that subject.
    Bio: Caetano Traina Jr. is a Professor in the Computer Science Department at the University of São Paulo in São Carlos, Brazil. He received the BSC degree in electrical engineering and MSc and PhD degrees in Computer Science from the University of Sao Paulo, Brazil, in 1978, 1982 and 1986, respectively. His research interests include database design, indexing methods, query optimization and data mining.
  14. Title: Real-Time Associative Classification Algorithms for High-Dimensional Streaming Data. Adriano Alonso Veloso, CS Department, UFMG, Brazil.
    Abstract: Classification is a well-known Machine Learning task. Modern applications, including Sentiment Analysis and Entity Disambiguation, often involve high-dimensional streaming textual data. These applications pose serious challenges to current classification algorithms, either because the classifier must operate with limited resources, including labeled data and response time, or because data distribution changes over time. We discuss self-training, adaptive, highly incremental classification algorithms, with linear complexity with regard to the dimensionality of the data. These algorithms follow different learning settings, including supervised, semi-supervised and active learning.
    Bio: Adriano Veloso is an Associate Professor in the Computer Science Department at the Federal University of Minas Gerais. He received the BsC, MsC and PhD degrees in Computer Science from the Federal University of Minas Gerais, Brazil, in 2002, 2004 and 2009, respectively. His research interests include Data Mining, Machine Learning and Information Retrieval.
  15. Title: The Brazilian National Institute of Science and Technology for the Web: Bridging the Gap Between the Web and Society, Nivio Ziviani, CS Department, UFMG, Brazil
    Abstract: The National Institute of Science and Technology for the Web (InWeb) aims at developing models, algorithms and novel technologies that make the information distribution and services through the Web more effective and safe. In this talk we present the main research lines of InWeb with emphasis in the information retrieval line, mainly research and development of techniques to efficiently retrieve relevant information to users.
    Bio: Nivio Ziviani is a Professor Emeritus of Computer Science at the Universidade Federal de Minas Gerais, Brazil. He is a member of the Brazilian Academy of Sciences and of the National Order of the Scientific Merit in the class Comendador. His research interests includes algorithms and data structures, information retrieval and Web information systems.


Work in Progress

  1. Title: From Power to Status in Large Scale Online Exchanges. Bruno Abrahao, CS Department, Cornell.
    Abstract: Online social networks where the main purpose of interaction is the acquisition of specific resources of interest represent a promising venue for the study of social exchange. Sociological theories dating back to the 1960’s predict that situations where the possession of resources is not uniform lead to power imbalances. Actors lacking a certain desired resource find themselves in a position of dependence on resource owners. According to Power-Dependence Theory, this power-unequal situation will tend toward balance, an outcome which could be reached in several ways. Among power-balancing mechanisms, status giving figures as a way through which a low-power actor may lessen their dependence on a more powerful partner. This mechanism has not received a test in a large, real-world context, however. We analyze data from CouchSurfing.com, an international online hospitality network, to confirm the predictions regarding status giving at a massive scale never before addressed by previous work. We consider hosting as a social exchange process: the host offers a resource (i.e., hospitality) to other members (“surfers”), an exchange which create a degree of dependency. In turn, we make use of mutualuser-reported ratings to quantify status. Our investigation demonstrates a statistically-higher tendency of CouchSurfers to give status to hosts, especially when the effect of resource scarcity comes into play. Joint work with Bogdan State (Stanford) and Karen Cook (Stanford)
  2. Title: Signaling Schemes for Revenue Maximization. Renato Paes Leme, CS Department, Cornell.
    Abstract: Signaling is an important topic in the study of asymmetric information in economic settings. In particular, the transparency of information available to a seller in an auction setting is a question of major interest. We introduce the study of signaling when conducting a second price auction of a probabilistic good whose actual instantiation is known to the auctioneer but not to the bidders. This framework can be used to model impressions selling in display advertising. We study the problem of computing a signaling scheme that maximizes the auctioneer’s revenue in a Bayesian setting. While the general case is proved to be computationally hard, several cases of interest are shown to be polynomially solvable. In addition, we establish a tight bound on the minimum number of signals required to implement an optimal signaling scheme and show that at least half of the maximum social welfare can be preserved within such a scheme. ( joint work with Yuval Emek, Michal Feldman, Iftah Gamzu, and Moshe Tennenholtz. http://arxiv.org/abs/1202.1590 )
  3. Title: Intelligible Models for Classification and Regression. Yin Lou, CS Department, Cornell.
    Abstract: Complex models for regression and classification have high accuracy, but are unfortunately no longer interpretable by users. We study the performance of generalized additive models (GAMs), which combine single-feature models called \emph{shape functions} through a linear function. Since the shape functions can be arbitrarily complex, GAMs are more accurate than simple linear models. But since they do not contain any interactions between features, they can be easily interpreted by users. We present the first large-scale empirical comparison of existing methods for learning GAMs. Our study includes existing spline and tree-based methods for shape functions and penalized least squares, gradient boosting, and backfitting for learning GAMs. We also present a new method based on tree ensembles with an adaptive number of leaves that consistently outperforms previous work.  We complement our experimental results with a bias-variance analysis that explains how different shape models influence the additive model. Our experiments show that shallow bagged trees with gradient boosting distinguish itself as the best method on low- to medium-dimensional datasets.
  4. Title: Estimating the Prevalence of Deception in Online Review Communities. Myle Ott, CS Department, Cornell.
    Abstract: Consumers’ purchase decisions are increasingly influenced by user-generated online reviews. Accordingly, there has been growing concern about the potential for posting deceptive opinion spam—fictitious reviews that have been deliberately written to sound authentic, to deceive the reader. But while this practice has received considerable public attention and concern, relatively little is known about the actual prevalence, or rate, of deception in online review communities, and less still about the factors that influence it. We propose a generative model of deception which, in conjunction with a deception classifier, we use to explore the prevalence of deception in six popular online review communities: Expedia, Hotels.com, Orbitz, Priceline, TripAdvisor, and Yelp. We additionally propose a theoretical model of online reviews based on economic signaling theory, in which consumer reviews diminish the inherent information asymmetry between consumers and producers, by acting as a signal to a product’s true, unknown quality. We find that deceptive opinion spam is a growing problem overall, but with different growth rates across communities. These rates, we argue, are driven by the different signaling costs associated with deception for each review community, e.g., posting requirements. When measures are taken to increase signaling cost, e.g., filtering reviews written by first-time reviewers, deception prevalence is effectively reduced.
  5. Title: Asynchronous Large-Scale Graph Processing Made Easy. Guozhang Wang, CS Department, Cornell.
    Abstract: Scaling large-scale iterative graph processing applications through parallel computing has become a very important problem. Several graph processing frameworks have been proposed that insulate developers from low-level details of parallel programming. Most of these frameworks are based on the bulk synchronous (BSP) model in order to simplify application development while achieving good scalability. However, in the BSP model, vertices are updated in fixed rounds, which often leads to slow convergence. In contrast, asynchronous executions can significantly accelerate convergence. Unfortunately, asynchronous models do not provide the programming simplicity and scalability advantages of bulk synchronous models. In this talk, I will present GRACE, our new graph programming platform, that separates application logic from execution policies. GRACE provides a synchronous iterative graph programming model for users to easily implement, test, and debug their applications. GRACE also contains a carefully designed and implemented parallel execution engine for both synchronous and user-defined built-in asynchronous execution policies.
Copyright © 2012 InWeb - Instituto Nacional de Ciência e Tecnologia para a Web - Todos os direitos reservados.
XHTML 1.1 OKXHTML 1.1 CSS 2.1 OKCSS 2.1 razz