VIBE: A VISUALIZATION SYSTEM
FOR EXPLORING LARGE
SCIENTIFIC DATABASES
Kenneth M. Sochats
and
James G. Williams
University of Pittsburgh
Abstract
The idea of utilizing visualization for information retrieval is presented through a new paradigm for query response handling. The technique has been generalized to permit retrieval on bibliographic, numeric and other types of data sources. The paradigm is based on parallel queries specified by what we have termed points of interest. Each point of interest is defined by a number of key-terms, mathematical relationships, properties or other characteristics and a display position. Documents, records or other data objects represented by icons will be positioned in this display according to a user-specified scoring mechanism (e.g. frequency count, magnitude, etc.) of the occurrence of specified terms of these points of interest. The resulting interactive display is navigable and modifiable by the user and permits the user to visualize the relationships between entities in very large retrieved sets.
This visualization method has been implemented through a visualization system, called VIBE. The results of the application of VIBE to several OSTI data sets (Energy Science and Technology Database and Tiger Team data) as well as other test databases will be presented.
Introduction
The technologies for the creation, collection, storage and processing of information have seen dramatic improvements in capabilities. Databases with terabytes of information are becoming increasingly common, and magnitude larger stores of information are envisioned for the future. However, techniques for effectively and efficiently retrieving information from these vast data stores have seen only incremental gains in performance.
Visualization techniques have proved valuable for applications in scientific research, engineering, medicine, meteorology and other disciplines. Information retrieval shares an exploratory nature with many of the areas where visualization has been successfully applied. Thus, information retrieval appears to be an area where visualization might be a valuable tool.
This paper explores the use of visualization techniques to enhance the information retrieval process. The techniques developed are applicable to traditional bibliographic databases as well as numeric and other types of databases. More important, the techniques perform well on databases of mixed types. The following section presents a quick review of the basic principles behind computerized information retrieval systems and reviews some of the problems associated with these systems. This is followed by a review of the recent work in the area of visualization. Next, we introduce the VIBE system (Visualization By Example), a prototype system for visualizing a collection of data objects. Finally, we present some examples of retrieval using Department of Energy and other data that we used to explore and enhance the capabilities of VIBE.
Information Retrieval
Information retrieval systems traditionally come in two forms. The most common is the retrospective search system which searches a full database on the basis of a query submitted by a user. Less common, but still important, is the current awareness or selective dissemination of information system. This system searches a smaller database consisting of current data, recently added to the database, on the basis of an interest profile submitted by a user. The profile, in effect, is a static and semi-permanent query. In both situations, it is assumed that the query (or profile) is well formed and can be matched against the objects in the database.
Over the years a large number of retrieval systems have been proposed, based on various query (or profile) formulations (vector, Boolean, extended Boolean, fuzzy) and various algorithms to match database objects to the query. Four principles lie behind these systems:
(1) The goal of the retrieval is to identify a set of objects or object references for further study.
(2) A query defines a subset of the database, namely the set of "relevant" objects.
(3) The results of the query are presented sequentially.
(4) The retrieved objects can be ordered (at least in many models) by their degree of correspondence to the query.
By deciding which objects to present and which to exclude, these systems limit the user's view of the database. In effect, the user is being asked to accept the system's judgment as to which objects are relevant. These characteristics result in a number of difficulties common to information retrieval systems as listed below (see, for example, Borgman, 19)(6):
·
The user may not have a good idea of what to search for.·
The user probably does not know how the query relates to the database, and hence does not know whether the query has an appropriate scope.·
The user may not be aware of the structure of the database.·
The user may not be skilled in formulating or reformulating complex queries.·
The user probably does not know the specific database object evaluation method used, and hence does not know why a specific object is included in or excluded from the retrieved set.·
Ordering the output in a sequential list obscures the many dimensions and characteristics by which objects may be related to the query and to each other.One might conclude that many of these problems exist because the retrieval system is not "transparent" to the user. The user has no way of "seeing" the structures or processes that he or she is dealing with.
In practice, traditional information retrieval systems work reasonably well when the user needs a small set of objects and can formulate a well-defined query for the selection of this set. However, there are retrieval situations where the user's problem may not be solved by a small set of retrieved objects. In other cases it may be difficult to formulate a precise query. Suppose that we want to get an overview of a collection of documents in a bibliographic database relative to our specific interest. Such an overview cannot easily be obtained from the lists of documents returned by traditional retrieval systems. As another example, suppose that we have a database of the properties of metals and we have a number of required properties for a particular application. The ordered linear output list produced from a traditional retrieval system will not be very helpful in selecting a metal with a desired cluster of properties. Traditional retrieval systems are not very helpful in situations such as these.
Visualization
Human visualization is the process of forming a mental image of a domain space. It is a cognitive process performed by humans in an attempt to form a mental image of the nature of functions, objects and processes. The entity being visualized may be concrete, such as an organ of the human body, or abstract, such as a multidimensional space or lines of magnetic force.
In attempting to describe anything, whether it be an object or an event, we have two basic choices for representation: linguistic or graphic. Language as a means of description is very powerful and can describe a wide range of events or objects, but it also has limitations, such as speed of processing, memory requirements, etc. Graphical descriptions can show spatial relationships among a large number of objects much more quickly and with less memory requirements than natural language, but can be limiting in terms of the scope of objects and events that can be described in an understandable manner.
Many of us use the human mind's ability to organize and locate things spatially. Other spatial abilities that humans have are those of judging relative size and distance. Additionally, the human brain can easily remember the position of objects and patterns of objects. The human visual system is also incredibly good at distinguishing among a large range of colors or hues. It is these human capabilities that we utilize when we present data graphically, as curves, bar charts, scatter diagrams, etc. The advance of powerful computers with raster graphical displays have given us new possibilities for communicating visually.
The term visualization and the field of scientific visualization (McCormick et al., 1987; DeFanti et al.,1989; Warner, 1990) are based upon these visual capabilities of humans, on computational science, and on the advance of computer technology. Given hundreds or thousands of data points on several variables, it is literally impossible for a human to look at them in a tabular display or listing and derive any relationships among the data points. But a graphical presentation of these same points can be very quickly interpreted by a scientist or engineer. As the dimensionality of the space (number of variables) grows, special visualization techniques are required to aid the human in interpreting the data points. It is important to understand that visualization is not the process being automated. An important idea behind visualization is to represent large and/or abstract datasets so that the structure and function of possible systems or processes can be understood by humans. The datasets themselves exhibit a high degree of entropy; visualization is utilized to reduce the effect of this entropy and provide information contained in the data.
Scientific visualization has become a powerful tool for many disciplines. The growth of the field is based on the dramatic improvements of computer graphics in the last decade, both in hardware and software technology. Simulation models on supercomputers, high volume data sources (as satellites and medical imaging systems) and the analysis of complex algorithms produce huge amounts of data that are impossible to examine directly by humans. As an alternative to data reduction methods (as statistics), which will result in a loss of detail, visualization may be used. Traditional applications for scientific visualization are molecular modeling, medical imaging, environment control, meteorology, gas and fluid dynamics, astrophysics etc., often applications where data has an inherent position. Other attributes of a visualization may be more abstract-as visualizing a force, stress or temperature.
Scientific visualization is, however, also applied to problems where the graphics determine a completely abstract picture, i.e., where no natural mapping between problem and graphical attributes exists (e.g., visualization of fractal algorithms). Thus the idea of utilizing visualization for abstract problems has already been introduced. However, in articles presenting the promises of this new field and in most applications we find few examples of these "abstract visualizations."
The VIBE Approach
The traditional information retrieval system asks that the user formulate a query in the absence of full knowledge of the database and returns what the system deems to be the proper set of objects (documents, records, etc.) in response to the query. To obtain further information, the user formulates a new query, with only the additional information on the database that he or she can infer from the response to the previous query. In VIBE, we propose a new paradigm for query response handling. We ask the user to identify one or more points of interest (POIs), which may be queries, general categories of interest, or specific reference points. In response, the system provides a view of all or a substantial portion of the database entities organized as the system sees the relationships among these objects and the points of interest. Furthermore, the view presented is two-dimensional, with the points of interest located as the user deems best to display the relevant relationships.
Each pol, visually represented by a unique icon, consists of a set of keywords, values or relationships describing a subject of interest to the user, and a position. The database objects are compared to the POI descriptions using a user-defined "scoring" algorithm. This yields a vector of scores for each object over every POI. This vector of POI scores is used to position an icon representing the object in the display space on the screen. Thus, the position of an object shows how the object relates to the POIs and how the objects relate to each other. As the system is used, the user will come to associate relative position with the contents or attributes of objects.
VIBE displays are distinguished from all other statistical displays by the fact that they have user-implied "conceptual" scales rather than Cartesian axes based upon primary data values. The idea behind VIBE's positioning mechanism is that an object that matches a PO1 should be placed in the same position as the POI. A PO1 may thus be seen as a (simplified) example or prototype object that is given an example or prototype position.
The data objects in the database are evaluated (scored) with respect to each of the POIs. The scoring mechanism is specifiable by the user and may be as simple as a frequency count of POI terms. All objects that meet minimum criteria (that get a score within a user-defined range) are positioned by the system on the display.
With only one POI, all objects that are considered important will be positioned on top of this P01 on the display-and VIBE will perform as any nonvisual object retrieval system (as the collection of icons on top of a POI may be presented sequentially, or as a list ordered by POI score). However, with more than one POI, an object will be placed between the POIs that it scores on, depending on the relative POI scores. If the object gets a score on one POI, only on top of this POI; if it gets a score on two POIs-between these POIs, etc. This paradigm is the underlying philosophy of VIBE: the position of an object icon should give an indication of the content of the related object.
A simple example of a VIBE display for VIBE operating on a bibliographic database is given below in Figure 1. Here, we have three POIs, their icons presented as circles. In our example, the POIs are named document retrieval, scientific visualization and virtual reality. A typical keyword based specification of these POIs may be as follows:
DOCUMENT RETRIEVAL
document retrieval
retrieval of document*
SCIENTIFIC VISUALIZATION
visualization
VIRTUAL REALITY
virtual realit*
artificial realit*
(The asterisk implies a wild card matching mechanism)
Figure 1 A VIBE. display.
The positioning of four objects relative to these POIs are shown in Figure 1. Each object is represented by a rectangular icon. We find two object icons between the scientific visualization and virtual reality POIs. Their positions tell us that the related objects are only influenced by these two POIs, as they seem to fall on the line between the two. One object seems to be more closely related to the first P01, another to the second. The icon positioned on top of the scientific visualization POI is, clearly, only influenced by this P01. At last we find an icon in the middle of the triangle defined by the three POIs. Its related object seems to be influenced by all of the POIs-perhaps by equal strength.
VIBE uses the keywords given for each P01 to determine a score for an object on a POI. This score determines the influence from each POI on an object and will be the sum of the frequency counts for each of the POI terms. The actual positioning of an icon for an object D is performed by a positioning function.
Input to this function is as follows:
·
the object score vector D [dl, dn] where n is the number of POIs. di represents the sum of frequencies for all keywords in POIi on D, normalized with the average score on this POI (over all objects).·
the POI position vector P (p1, p2 pn] where n is the number of POIs. p represents the display position (x, y) for POIi..These two vectors are combined into the set S = [(dl, p]). (d2, p2),... (dn, pn).
1. If an object is influenced by (gets a score on) only one P01 k (dj = 0, j <> k), its icon is positioned on top of this P01, at position Pk.
2. If an object scores on two or more POls, two elements from the set S are removed. We will call these sa and sb. Each of these elements consists of a score value and a position-(da, pa), (db, pb)-an intermediate position (pi) for the object relative to the positions given by pa and pb is then computed. This position will be on the line between pa and pb, closest to the position that has the highest score. The distance from pa to this intermediate position pi will be determined by
pi = db*L
/(da + db)
where L is the distance from pa to pb. The element (di, pi) is then added to S.
3.If only one element is left in S apply rule I else rule 2.
It is easily shown by mathematical induction that the final object position is independent of the order in which the elements from the set S are selected.
Figure 2 Example positioning of objects relative to two POIs.
An example of how this algorithm works is given in figure 4. Here we have two POIs-A and B. The four objects displayed have a score of (1,0), (1,]), (0,1) and (1,0.5) on A and B respectively. As seen, one object will be placed on top of A (1, 0), one on top of B (0, 1), two between A and B, one in the middle (I, I) and one closer to A(I,0.5).
Figure 3 shows another example, where an object is positioned relative to three POIs. Three POIs are positioned on the display (marked with circles). An object with scores 0.3, 0.1 and 0.6 on POI A, B and C, respectively, is to be positioned. We may start the positioning process by using the object's scores on POI A and B, and the positions of these POIs. This gives an intermediate position between A and B, 1/4 off from A, as shown in Figure 3. The score connected to this intermediate position will be 0.4 (sum of scores for A and B). The final object position is then found 6/10 off from this point in the direction of C (as C has score 0.6). Since the algorithm is independent of the order in which elements from the set S are chosen, evaluating an object with respect to A and B then C, or A and C then B, or B and C then A will all produce the same location.
Figure 3 Example - positioning of objects relative to three POIs.
Logically, every keyword described in a POI will define an axis. Thus, VIBE projects a virtual multidimensional coordinate system into a two-dimensional display. In practice, this implies that objects may be placed in the same position for several reasons. The coincidence of two objects may be real-they are identical with regard to the POIs, or false-resulting from the projected superposition of distinct locations. However, by carefully positioning POIs, by moving POIs, etc. this problem may be controlled.
By "clicking" on an object VIBE will present all the information on the object that is available in the database. If more than one icon is positioned in the same location, VIBE will present the topmost object-moving through the object stack for each "button click." Such an overlay of icons will be shown by a line under the icon, each line representing an object.
An example of a more complex VIBE display is given in Figure 4. Here we have five POIs (A...E) and several object icons. The annotations provide an idea as to how a VIBE display can be interpreted.
Icons can characterize or describe the objects they represent in many different ways. The attributes of the icons include its size, its color, and its shape. Of these attributes, only size is automatically applied in all cases. The size of an icon is an indication of the importance of the object it represents and is derived from the scores that a object gets in relation to the specified POIs. For example, an object that gets a low score on all POIs will be displayed as a small icon. Likewise, an object that gets a high score on one or more POIs will be displayed as a larger icon. In practice, the objects shown as larger icons should be more closely related to one or more POIs than the smaller icons.
As can be seen from the previous discussion, interpreting information from a VIBE display may not always be easy. It will be dependent on the P01 definitions, the positioning of the POIs, and the quality of the area of inquiry. Or in other words-it is dependent on the user's knowledge of the data. As seen-VIBE is not a tool for automatic reasoning; it's a dumb servant for a smart user.
The current version of VIBE is a prototype. It is implemented in C, runs under UNIX and MS-DOS and is based on the X Windows and Microsoft Windows Systems. The prototype works on a collection of objects represented as flat files. Such object collections can, for example, be retrieved from current bibliographic database systems by giving a filter-query. VIBE will compute scores for each object on each POI, as explained above. POI terms may be weighted and restricted to selected parts of an object (the search will then only be performed on these parts). A POI may also be given a weight to strengthen/reduce its influence.
Figure 4 Retrieving information from VIBE displays.
A dynamic, window-based user interface is used, giving the users the possibility of repositioning POIs, changing POI definitions by changing weights on POIs or keywords, or by asking the system to ignore the influence from certain POIs. Redisplays after such changes are performed immediately, based on a table of POI component data kept in memory. Displays can be saved, retrieved individually or overlaid with other displays for comparison, or copied to a laser printer.
Several additional features are planned. Colors may be used to Show the influence from POIs-with a saturation dependent on the degree of influence. In addition color can be combined with selected object attributes (as object source, author, institution, time of publication, country of origin, etc.) in order to separate different types of objects. It also seems reasonable to give users the opportunity of operating with several layers of displays. This gives the possibility of analyzing clusters of objects by repositioning these in a subdisplay with different POIs.
VIBE Versus Retrieval Systems
The differences between the features of VIBE and other retrieval systems can best be explained through an example. VIBE has been used for a research project funded by the Office of Scientific and ~7chnical Information (OSTI), Department of Energy (DOE). This research project investigated methods of extracting meta information from large scientific bibliographic databases that extends beyond the information that can be extracted using the traditional Boolean search mechanisms of traditional systems. In this project, VIBE was used on a sample collection of objects from the DOEIOSTI Energy Science and Technology Database, on a very specialized and constrained subject area (inertial confinement). Suppose we need information on the relation between three topics lasers, plasma and fusion. The data collection set may contain objects on
1. lasers, plasma and fusion
2a. lasers and plasma
2b. lasers and fusion
2c. plasma and fusion
3a. lasers
3b. plasma
3c. fusion
Objects in category 1 may give us information about all three topics. Category 2 objects contain information about two of the topics and objects in category 3 relate to only a single topic. It is difficult for the retrieval system user to predict a priori the results of any of these queries. Some of these queries may return no object references while others may return more objects than the user can deal with. In a traditional retrieval system, the user must therefore develop a strategy for querying the database. The user may start with a type I query that would naturally retrieve the fewest objects and proceed toward type 3 queries. Alternatively, the user might pursue the opposite query strategy, starting with querying type 3 objects and moving toward a type 1 query until a workable set was found. Regardless of the approach, as many as seven queries may have to be developed in order to cover the types of objects described above.
We see that queries that express our need for information will be dependent on the content of the database. With traditional systems, this implies that we are forced into an iterative query process, perhaps providing queries that return too few or too many objects.
With VIBE, each of the topics lasers, plasma and fusion could be described as POIs. VIBE will effectively treat these POIs as multiple parallel queries and present the results in a display as shown schematically in Figure 5. In this diagram, objects of each type are positioned in different locations, as seen by the diagram annotations. Type 3 objects, those involving one topic only, will have their icons positioned on top of the topic POI. Type 2 objects will be positioned along the lines connecting the two topics, which are contained in the document. Their exact positions will depend on the relative influence from each POI. Type 3 documents, referencing all three topics, will be positioned inside the figure defined by the POI vertices. In fact, VIBE gives an answer to all of the above seven queries in this one display. In addition, a VIBE display will show the relative influence from each of the POIs, made possible by the frequency count of each search term. Since the initial retrieval output is visual, the user can easily interpret a very large retrieval set, perhaps thousands of documents. The graphical interface allows the user to further navigate
Figure 5 Schematical VIBE display with three POIs.
and explore the retrieved set. The actual VIBE display, as used on the DOD/OSTI data is presented in Figure 6. Note that overlaying icons (same position and size) are visualized by a line under the icon of the first document. Thus, we have stacks of icons on top of the three POIs representing documents that got a score on only one P01. More information on documents, such as the title. abstract, keywords, authors, etc., can be obtained by pointing at the appropriate icon and clicking. One should note that using VIBE is a dynamic process and that a hard copy of a VIBE display, such as the one in Figure 6, is only a static snapshot of the process. Some of these same features could be added to a traditional system by implementing groups of terms (like POIs), frequency counts, etc. However, the output from such a system could result in unwieldy lists of documents. With visualization, it becomes possible to give a holistic presentation, where it is possible to get a quick overview of even larger document collections - as the relative score values can be interpreted much faster from a display than by the score values themselves.
Figure 6 An actual VIBE display with three POIs.
Example VIBE Displays
As an example of how visualization may be used in the information retrieval process we shall apply VIBE to 102 countries of the world database. However, one should note that a static presentation, as this will not give full credit to a dynamic system. It could perhaps be compared to presenting a movie as a couple of slides. In addition, VIBE's capabilities as an interface will not be seen through this presentation.
The world database consists of 102 countries, with data on infant mortality, life expectancy, literacy, and GNP per capita (source: The World Almanac, 1991). Figure 7 presents a VIBE diagram of these data standardized by subtracting average values and dividing by standard deviations. Negative, or lower than average, values are represented as influence toward a low POI, positive values toward a high POI. We shall classify low infant mortality, high life expectancy and the high GNP/capita as "good" POIs, the others as "bad" POIs.
Figure 7 World data.
As seen from Figure 7, most of the countries fall into two categories:
1. A "first world" influenced by the "good" POIs.
2. A "third world" influenced by the "bad" POIs.
By giving a color to the high GNP/capita POI all objects that are influenced by this POI will get the same color. With such a display one would see that all countries with higher than average CNP/capita are in the "first world."
We may use color to show the influence from the low GNP/capita POI as well, but in Figure 8 an alternative display is presented. Here we have asked VIBE to visualize the displacement of icons, by drawing lines from the former position (no icon) to the new position (icon), when the low CNP/capita P01 is introduced. As seen, nearly all the "third world" countries are influenced by this POI. In addition, about half of the "first world" countries have moved toward this POI, some significantly. We may retrieve the names of these countries (e.g., China, Albania, Colombia, Jordan) in addition to other information by "clicking" on the icons. The remaining "first world" countries are unaffected by introducing the low GNP/capita POI, since their score on this POI is zero (i.e., their GNP/capita is above the mean value).
From these few examples we may generate many questions about the world database, e.g. Will the two distinct groups show up for other POIs (health care, education, transportation, defense)?
Figure 8 World data (low GNP and displacement).
The size of an icon is an indication of the importance of the object it represents and is derived from the scores that an object gets in relation to the specified POIs. For example, an object that gets a low score on all POIs will be displayed as a small icon. Likewise, an object that gets a high score on one or more POIs will be displayed as a larger icon. In practice, the objects shown as larger icons should be more closely related to one or more POIs than the smaller icons.
As a second example, we shall use data from the Energy Science and Technology Database of the Office of Scientific and Technical Information (OSTI), Department of Energy (DOE). VIBE has been used for a research project funded by DOEIOSTI. This research project investigated methods of extracting meta information from large scientific bibliographic databases that extends beyond the information that can be extracted using the traditional Boolean search mechanisms of such systems. In this project, VIBE was used on a sample collection of objects from this database on a very specialized and constrained subject area-inertial confinement. Each object reference in this database consists of a full abstract and several indexing terms organized in a hierarchy, together with other information on objects (title, author names, country of origin, etc.). The indexing has been performed manually, by professional indexers.
We will present two simple examples of hypothesis testing on these object representations. In particular, we were interested in examining the following questions:
i.
Do indexers assign a few major index terms to each object, or are the object representations over indexed by the introduction of several major index terms?ii. Where terms are part of a hierarchical structure, do indexers use only the most detailed term and the term above this, or are all other terms in the structure included as well?
A VIBE diagram for question i was made on the basis of the terms are presented in Figure 9. Note that overlaying icons are visualized by a line under the first icon, each line representing another object. As can be seen, most of the documents fall on top of one of the POIs, each describes one major term (indicated next to the POI icon in the figure). At least with these terms, it seems that very few objects fall between POIs. Thus, it seems that indexers manage to limit the number of major terms assigned to each document.
Figure 10 presents a VIBE diagram for question ii. As can be seen, all but three objects either fall on top of a POI or are on the line between two POIs in the hierarchy laser, laser target and ion beam laser target. The three objects that scored on all three POIs are general articles on the field. This diagram seems to confirm that indexers know the structure of these research areas and that they are careful when assigning terms to objects.
Figure 9 A VIBE diagram of OSTI/DOE major descriptor POIs.
Figure 10 A VIBE diagram of DOE/OSTI term hierarchy POIs.
Conclusions
All information retrieval systems, including VIBE, perform a mapping from a semantic level to a lexical level, where both objects and queries are represented by keywords. Description of concepts by lists of words docs not always work. The problems of synonymy (different words, same meaning) and polysomy (same words, different meaning) in natural language, makes it difficult to give a formalized description of a concept on a lexical level. Our choice of words when defining queries, points-of-interest and when we write or index objects will influence the result of the retrieval process. Indexing will also imply a classification, and indexing terms will often describe the major topics only. An object may therefore mention subjects that are not included in either the title, the abstract or in the indexing terms. When using traditional methods for presenting retrieved objects, as sequential lists, this may be a desired effect, as the task Is often to limit the number of selected objects. However, a strict classification will also restrict our view of the object collection, and objects of interest may be excluded from our queries.
However, these fundamental limitations of retrieval methods may be somewhat easier to overcome with a tool such as VIBE. The visualization techniques presented here make it possible for the user to cope with a large number of objects, thus making it less important to use restricted queries. By changing keywords, or keyword weights, the user may also get a visual impression of the influence from each term on the collection of objects. Further, the VIBE system can utilize any information on a object (by moving it closer to the POI of influence).
VIBE will perform best when it has all relevant information on an object. This article, as an example, could be categorized by the index terms object retrieval and visualization, but the subject virtual reality is also mentioned in the text. If VIBE had a comprehensive index list, or the full object text, the influence from a virtual reality POI could be taken into account-but perhaps with a lower weight than the more important terms.
Another important aspect of visual output from a retrieval system is that even if a large collection of objects is displayed, it will always be possible to retrieve every single object. Thus, the data reduction methods used here do not eliminate objects. This is important. When every object is displayed, additional graphical attributes such as color may be used to present new dimensions and therefore special outstanding objects may easily be identified.
The overall picture can be used to get information on which objects to retrieve: objects that are placed on top of a POI, objects that fall between two or more POIs, objects in clusters, or the odd objects that are positioned in an isolated position on the screen. It may also be used to get an idea at' the collection of objects as a whole, for example in order to compare different object collections.
Visualization offers a possibility of sending data from a computer system to a user with a bandwidth many times as high as with textual and numerical presentations.. Through such a form of communication, the power of the computer can be used to manipulate large amounts of data. It is important to understand that visualization is not the process being automated; only humans can visualize. But a visualization tool may help the user get an overview and an understanding of large data set.
Visualization has an important impact on the "management" of the retrieval process. In traditional retrieval systems, the number of relevant object references retrieved (as measured by recall) are purposely kept to a reasonable manageable level. This, naturally, can have an effect on the precision of the retrieval process. In VIBE, much larger retrieved sets can be presented. In fact, it is advantage to visualize all of the candidate objects. Combined with the idea of parallel queries, i.e., positioning of objects with regard to several POIs, this allows the user to define a posteriori "a retrieval set." This gives the user direct control over retrieval and the ability to assess the impact of the precision of the retrieval.
The extensive usage of word processing today ensures that most objects will be available in electronic form. Modern storage techniques, high capacity networks and standards for representing and communicating objects will greatly enhance the opportunity of retrieving the full text of objects or at least comprehensive abstracts from bibliographic databases. However, while the availability of objects becomes better, we will still use the same old and slow method of extracting information from these objects - namely reading. Thus, it will be more and more crucial to find methods that will help us to select a set of objects for further study. We believe that visualization will be an important basis for such methods.
The VIBE system has been used over a wide range of data sets, from object representations to quantitative data These experimental results have been promising. Our further efforts in this direction will be directed toward investigating the requirements necessary for a visualization environment that will provide scientists, engineers and others with the capability to formulate problems as graphical depictions and animated sequences. In particular we are interested in identifying potential applications for visualization. This work will be based on the VIBE display methodology.
Our future research in this area will be directed toward the development of alternative visualization strategies with the aim of giving the user the opportunity of choosing the strategy that will suit him or her best. We are also experimenting with using the VIBE display principle on quantitative data, and we have promising results in this direction,
References
Borgman, C. L. (1 986). Why Are Online Catalogs Hard To Use? Lessons Learned from Information-Retrieval Studies. Journal of the American Society for Information Science, 37(6): 387-400.
DeFanti, A., D. M. Brown, and B. H. McCormick (1989). Visualization. Expanding Scientific and Engineering Research Opportunities. Computer, August 1989, 12-25,
McCormick B H T A. DeFanti, and NI. D. Brown (1987). Visualization in Scientific Computing. Computer Graphics, 21(6).
Warner, J. (1990). Visual Data Analysis into the '90s. Pixel, 1(1): 40-44.
Bibliography
Clarkson, M. A. (1991). An Easier Interface. BYTE, 16(2): 277-282.
Donoho, A. W., D. L. Donoho, and M. Cask (1 988). Macspin: Dynamic Graphics on a Desktop Computer, In Dynamic Graphics for Statistics, W. S. Cleveland and NI. E. McGill (Eds.) pp.331-352. Wadsworth & Brooks/Cole, Belmont, California.
Fisher, H. T. (1982). Mapping-Information, Cambridge, Mass.: Abt Books.
Garling, T. (1989). The Role of Cognitive Maps in Spatial Decisions. Journal of Environmental Psychology, 9, 269-278.
Hasher, L. and R. T. Zacks (1 979). Automatic and effortful processes in memory. Journal of Experimental Psychology. General, 108, 356-388.
Jones, W. P. and C. W. Furnas (1987). Pictures of Relevance: A Geometric Analysis of Similarity Measures. Journal of the American Society for Information Science, 38(6): 420-442.
Korfhage, R. R. (1986). A Concept for Visual Navigation of a Database. In Proceedings of the IEEE Workshop on Visual Languages, Dallas, Texas, pp.143-148.
Myaeng, S. H. and R. R. Korfhage (1990). Integration of User Profiles: Models and Experiments in Information Retrieval. Information Processing and Management, 26(6): 719-738.
Naveh-Benjamin, M. (1 987). Coding of spatial location information: An automatic process? Journal of Experimental Psychology. Learning, Memory & Cognition, 13: 595-605.
Olsen, K. A., R. R. Korfhage, K. NI. Sochats, M. B. Spring, and J. C. Williams (1991). Visualization of a Document Collection: The V/BE System. Research Report LIS033/IS91001, School of Library and Information Science, University of Pittsburgh.
Raghavan, V. V.
and S. K. M. Wong (1986). A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the American Society for Information Science, 37(5): 279-287.Spring, M. (1990). Informing with Virtual Reality. Multimedia Review, 1(2): 5-13.