Restricted Area  |

Information Retrieval

Information retrieval is an activity related to recovering information objects that are stored in a computer-accessible medium. An information object usually contains text, such as documents, Web pages, and books, and other types of content, such as images, sounds, and graphics. Representation and organization of such objects allow people to access relevant information from the manifestation of an information need expressed, for example, in the form of a query.

The study of new techniques for information retrieval has gained importance with the rapid expansion of the Web, which brought numerous challenges for the IR techniques available so far. In fact, finding useful information on the Web is still a tedious and difficult task. There is a virtually unlimited amount of information, expressed in various ways, and with varying quality. The number of people interested in accessing such information is enormous, and is growing. This combination of factors requires new methods and technologies for information management and retrieval. In this context, IR techniques and solutions are fundamentally important to face problems included in Challenge 2.

Several research results in this line will derive from understanding and applying results from Challenge 1, which will provide a better understanding of the interests and activities of people on the Web. The incorporation of such knowledge allows the development of better IR techniques for modeling Web information. For instance, important breakthroughs are expected for the development of theme-specific crawlers and query processors that are able to understand the (social) interests of users [45, 46, 113]. The same is anticipated for the creation of new algorithms and data structures that are capable of treating the various Web dimensions more adequately [22-24] (goals 2.1 and 2.2). These and other results will be merged into a library of software components for treating Web information. Such library will be used to create a national portal for science and technology and to develop new applications. Application possibilities include presenting advertisements based on the content and on user interests [76], selective information dissemination [90] and recommendation [82] (goal 2.3). Notice that many IR techniques are also important to face some aspects of Challenge 1, since much evidence generated by people in social networking and relationship sites come up as text (for instance, tags, descriptions, and comments).

Recognizing geographic context from Web pages [21, 49] and from keywords used in search queries [103] can also provide evidence to improve the results of queries in which the user seeks local content or services [69, 84] (goals 2.1, 2.3 and 2.8). Finally, we plan to investigate multilingual information retrieval methods, in which information can be found in a language that is different from the one used in the query [95] (goal 2.9).

Notice that additional research results from Challenges 1 and 3 can have a direct impact on this line of research. For instance, discovering malicious or opportunistic behavior patterns can help detecting low-quality content (goal 2.4). Improvements in the networking infrastructure and in presentation tools will make access easier and will allow a more adequate usage of information.

The Information Retrieval (IR) research line will be carried out, primarily, by the researchers Nivio Ziviani (UFMG), Marcos Gonçalves (UFMG), Clodoveu Davis (UFMG), Edleno Moura (UFAM), Arnaldo Araújo (UFMG), Fabiano Botelho (CEFET-MG), Viviane Orengo (UFRGS), and Leandro Wives (UFRGS).

Copyright © 2010 InWeb - Instituto Nacional de Ciência e Tecnologia para a Web - All rights reserved.
XHTML 1.1 OKXHTML 1.1 CSS 2.1 OKCSS 2.1 razz