of 2-variate displays.Most of the them developed in the statistics world.Both visual perception and statistical fitting of the data are of major concern.The data size is relatively small,usually in the order of hundreds of items.The graphics are mostly variations on two dimensional point and line plots. Multivariate display are the basis for many recently developed mdmy techniques,most of which use colorful graphics created by high-speed graphics computation.The data is usually larger and more complicated.A majority of the techniques were developed within the period of 1987-1991. Animation is a powerful tool for visualizing mdmy scientific data.Various movie animation techniques on mdmv data,and a scalar visualization animation model are presented.In principle,any single frame visualization technique can be extended to animation if the data can be represented as a time series showing two-way correlations. 5.1 Techniques Based on 2-variate Displays This section highlights some of the tools and summarizes the general approach developed based on 2-variate displays.The discussion is based upon the book by Cleveland [Cle93],which has a good collection of elegant visualization techniques developed by Cleveland,Tukey,and others throughout the 80's.Tukey's exploratory data analysis [Tuk77]is an important milestone of data visualization;most of the techniques were developed with pencil and paper during the early 70's.Cleveland's work emphasizes the structure of data and the validity of statistical models fitted to data.A majority of the visualization techniques are two dimensional,with the exception of isosurface plotting.Color is rarely used.Most of the tools show correlations between two variates.Our discussion skips the formulas,algorithms,and theories;only the concepts and techniques are presented. 5.1.1 Data Types The basic data types for statistical data analysis are univariate,bivariate,trivariate,and hypervariate which represent data with one dimension and one,two,three,and four or more variates.Cleveland also describes the multiway data type for data with higher dimensionality. 5.1.2 Reference Grids The most common display unit in statistics visualization is a two dimensional scatterplot,as depicted in the left panel of Figure 1.In the middle panel,simple grid lines are drawn for enhancement of pattern perception,not for plotting accuracy.Grids are drawn in equal intervals instead of numerical values.These reference lines are particularly powerful when we need to do scanning and matching of a matrix of scatterplots. 5.1.3 Fitted Curve In statistics,fitting means finding a description of a data set.For example,if a data set fits into a normal distribution,the whole data set can then be described by two numbers:its mean and standard deviation.In 6
of 2-variate displays. Most of the them developed in the statistics world. Both visual perception and statistical fitting of the data are of major concern. The data size is relatively small, usually in the order of hundreds of items. The graphics are mostly variations on two dimensional point and line plots. Multivariate display are the basis for many recently developed mdmv techniques, most of which use colorful graphics created by high-speed graphics computation. The data is usually larger and more complicated. A majority of the techniques were developed within the period of 1987–1991. Animation is a powerful tool for visualizing mdmv scientific data. Various movie animation techniques on mdmv data, and a scalar visualization animation model are presented. In principle, any single frame visualization technique can be extended to animation if the data can be represented as a time series showing two-way correlations. 5.1 Techniques Based on 2-variate Displays This section highlights some of the tools and summarizes the general approach developed based on 2-variate displays. The discussion is based upon the book by Cleveland [Cle93], which has a good collection of elegant visualization techniques developed by Cleveland, Tukey, and others throughout the 80’s. Tukey’s exploratory data analysis [Tuk77] is an important milestone of data visualization; most of the techniques were developed with pencil and paper during the early 70’s. Cleveland’s work emphasizes the structure of data and the validity of statistical models fitted to data. A majority of the visualization techniques are two dimensional, with the exception of isosurface plotting. Color is rarely used. Most of the tools show correlations between two variates. Our discussion skips the formulas, algorithms, and theories; only the concepts and techniques are presented. 5.1.1 Data Types The basic data types for statistical data analysis are univariate, bivariate, trivariate, and hypervariate which represent data with one dimension and one, two, three, and four or more variates. Cleveland also describes the multiway data type for data with higher dimensionality. 5.1.2 Reference Grids The most common display unit in statistics visualization is a two dimensional scatterplot, as depicted in the left panel of Figure 1. In the middle panel, simple grid lines are drawn for enhancement of pattern perception, not for plotting accuracy. Grids are drawn in equal intervals instead of numerical values. These reference lines are particularly powerful when we need to do scanning and matching of a matrix of scatterplots. 5.1.3 Fitted Curve In statistics, fitting means finding a description of a data set. For example, if a data set fits into a normal distribution, the whole data set can then be described by two numbers: its mean and standard deviation. In 6
0 .0 Figure 1:Left:A simple 2D scatterplot.Middle:A scatterplot with visual reference grids.Right:A fitted curve is included in the plot. statistics visualization,fitting means finding a smooth curve that describes the underlying pattern.In the right panel of Figure 1,a curve fit to the data is plotted;a pattern not apparent from the scatterplot before may suddenly emerge.Fitting formulas are not discussed in this paper;[Tay90,Cle93]are good references for this matter. 5.1.4 Banking The perception of the orientations of line segments can be enhanced by adjusting the aspect ratio of the graph. The aspect ratio of a graph is defined as the height of the data rectangle divided by the width.A line segment with an orientation of 45 or-45 is the best to convey linear properties of the curve.This technique is known as the banking to 45 principle [CMM93].In Figure 2,the same curve is plotted in three different aspect ratios.Only Figure 2:The same curve is plotted in three different aspect ratio.The upper left one conveys more information than the other two. the upper left panel shows both a curve on the left and a straight line on the right.The banking method is covered in [Cle93]. 5.1.5 Scatterplot Matrix One of the more popular statistics mdmy visualization techniques is the scatterplot matrix which presents multiple adjacent scatterplots.Each display panel in a scatterplot matrix is identified by its row and column numbers in the matrix.For example,the identity of the upper left panel of the matrix in Figure 3 is(1,3),and the lower right panel is(3,1).The empty diagonal panels denote the variable names.Panel (2,1)is a scatterplot of parameter X against Y while panel (1,2)is the reverse,i.e.,Y versus X.In a scatterplot matrix,every variate is treated identically.The >
Figure 1: Left: A simple 2D scatterplot. Middle: A scatterplot with visual reference grids. Right: A fitted curve is included in the plot. statistics visualization, fitting means finding a smooth curve that describes the underlying pattern. In the right panel of Figure 1, a curve fit to the data is plotted; a pattern not apparent from the scatterplot before may suddenly emerge. Fitting formulas are not discussed in this paper; [Tay90, Cle93] are good references for this matter. 5.1.4 Banking The perception of the orientations of line segments can be enhanced by adjusting the aspect ratio of the graph. The aspect ratio of a graph is defined as the height of the data rectangle divided by the width. A line segment with an orientation of 45 or 45 is the best to convey linear properties of the curve. This technique is known as the banking to 45 principle [CMM93]. In Figure 2, the same curve is plotted in three different aspect ratios. Only Figure 2: The same curve is plotted in three different aspect ratio. The upper left one conveys more information than the other two. the upper left panel shows both a curve on the left and a straight line on the right. The banking method is covered in [Cle93]. 5.1.5 Scatterplot Matrix One of the more popular statistics mdmv visualization techniques is the scatterplot matrix which presents multiple adjacent scatterplots. Each display panel in a scatterplot matrix is identified by its row and column numbers in the matrix. For example, the identity of the upper left panel of the matrix in Figure 3 is (1,3), and the lower right panel is (3,1). The empty diagonal panels denote the variable names. Panel (2,1) is a scatterplot of parameter X against Y while panel (1,2) is the reverse, i.e., Y versus X. In a scatterplot matrix, every variate is treated identically. The 7
0 0 00 ● Figure 3:A scatterplot matrix displays of data with three variates X,Y,and Z. basic idea is to visually link features in one panel with features in others.The redundancy is designed to improve the effect of visual linking.The technique is further enhanced with the help of reference grids.The pattern can be detected in both horizontal and vertical directions.The concept of linking is also discussed in [BMMS91]. The idea of pairwise adjacencies of variables is also a basis for the hyperbox [AC91],hierarchical axis [MGTS90,MTS91a,MTS91b],and HyperSlice [vWvL93].Despite its popularity in mdmv visualization applica- tions,nobody knows the identity of the original inventor [Cle93].The technique was first presented in [CCKT83]. A variety of powerful tools using this kind of multi-panel display are presented in [Cle93].The scatterplot matrix is also implemented in XmdvTool [War941. 5.1.6 Other Two Dimensional Analytical Techniques Cleveland's book also includes other powerful graphical techniques such as medium-difference plot,quantile- quantile plot,spread-location plot,given plot,and conditional plot,fitting tools such as loess and bisquare;and visual perception techniques such as jittering and outlier deletion. 5.2 Multivariate Visualization Techniques The scatterplot matrix uses multiple 2-way displays in an effort to provide correlation information among many variates simultaneously.The techniques described in this section are,however,aimed at extending the possibilities of multivariate correlation.All the techniques,with the exception of brushing and parallel coordinates,were developed after the 1987 NSF workshop.All of them claim positive results with real life mdmv scientific data. These techniques are also aimed at presenting much larger data sets than those appropriate for the statistical visualization techniques.Today's scientific data is huge;terabyte sized data will soon be common.A static scatterplot is just not big enough to display more than a few hundred data items.These techniques are broadly categorized into five sub-groups: Brushing allows direct manipulation of a mdmy visualization display.Only brushing a scatterplot matrix is described. 8
X Y Z Figure 3: A scatterplot matrix displays of data with three variates X, Y , and Z. basic idea is to visually link features in one panel with features in others. The redundancy is designed to improve the effect of visual linking. The technique is further enhanced with the help of reference grids. The pattern can be detected in both horizontal and vertical directions. The concept of linking is also discussed in [BMMS91]. The idea of pairwise adjacencies of variables is also a basis for the hyperbox [AC91], hierarchical axis [MGTS90, MTS91a, MTS91b], and HyperSlice [vWvL93]. Despite its popularity in mdmv visualization applications, nobody knows the identity of the original inventor [Cle93]. The technique was first presented in [CCKT83]. A variety of powerful tools using this kind of multi-panel display are presented in [Cle93]. The scatterplot matrix is also implemented in XmdvTool [War94]. 5.1.6 Other Two Dimensional Analytical Techniques Cleveland’s book also includes other powerful graphical techniques such as medium-difference plot, quantilequantile plot, spread-location plot, given plot, and conditional plot; fitting tools such as loess and bisquare; and visual perception techniques such as jittering and outlier deletion. 5.2 Multivariate Visualization Techniques The scatterplot matrix uses multiple 2-way displays in an effort to provide correlation information among many variates simultaneously. The techniques described in this section are, however, aimed at extending the possibilities of multivariate correlation. All the techniques, with the exception of brushing and parallel coordinates, were developed after the 1987 NSF workshop. All of them claim positive results with real life mdmv scientific data. These techniques are also aimed at presenting much larger data sets than those appropriate for the statistical visualization techniques. Today’s scientific data is huge; terabyte sized data will soon be common. A static scatterplot is just not big enough to display more than a few hundred data items. These techniques are broadly categorized into five sub-groups: Brushing allows direct manipulation of a mdmv visualization display. Only brushing a scatterplot matrix is described. 8
Panel matrix involves pairwise two dimensional plots of adjacent variates.Techniques included are Hyper- Slice and hyperbox.Both of them are elaborations of the scatterplot matrix. Iconography uses variates to determine values of parameters of small graphical objects,called icons or glyphs.Thousands of data points are represented by thousands of these icons which create a visual display characterized by varying texture patterns determined by the data.The mappings of data values to graphical parameters are usually chosen to generate texture patterns that hopefully bring insight into the data.Three iconographic techniques are described:stick figure icon,autoglyph,and color icons. Hierarchical displays map a subset of variates into different hierarchical levels of the display.Hierarchical axis,dimension stacking,and worlds within worlds belong to this group.These techniques support,or at least enable,dynamic interactive analysis. Non-Cartesian displays map data into non-Cartesian axes.They include parallel coordinates and VisDB. Parallel coordinates is the only technique that is capable of studying both multidimensional objects and multidimensional data. 5.2.1 Brushing Brushing was first presented in [BC87].It is included as one of the many direct manipulation techniques in [Cle93].There are two kinds of brushing a scatterplot matrix:labeling and enhanced linking.Labeling involves an interactive brush(e.g.,a mouse pointer)that causes information label(s)to pop-up for particular display item(s). In enhanced linking,the brush is an adjustable rectangle.It is used to cover a set of points in one of the panels. Figure 4 shows a rectangle brush in panel (3,2).Data inside the rectangle is displayed with a"+"instead of a"o." + 0 Figure 4:Enhanced brushing with the square brush located on panel (3,2). The same changes are applied to the corresponding data points in the other panels.By looking at different panels and comparing the vertical and horizontal extent of the brush,this enhanced linking technique provides a powerful direct manipulation tool for visual conditioning analysis.It is shown that the effect of brushing is more intense in a dynamic interactive display.In general,brushing can be added to many other mdmv visualization techniques [War94].More applications can be found in [Cle93]. 9
Panel matrix involves pairwise two dimensional plots of adjacent variates. Techniques included are HyperSlice and hyperbox. Both of them are elaborations of the scatterplot matrix. Iconography uses variates to determine values of parameters of small graphical objects, called icons or glyphs. Thousands of data points are represented by thousands of these icons which create a visual display characterized by varying texture patterns determined by the data. The mappings of data values to graphical parameters are usually chosen to generate texture patterns that hopefully bring insight into the data. Three iconographic techniques are described: stick figure icon, autoglyph, and color icons. Hierarchical displays map a subset of variates into different hierarchical levels of the display. Hierarchical axis, dimension stacking, and worlds within worlds belong to this group. These techniques support, or at least enable, dynamic interactive analysis. Non-Cartesian displays map data into non-Cartesian axes. They include parallel coordinates and VisDB. Parallel coordinates is the only technique that is capable of studying both multidimensional objects and multidimensional data. 5.2.1 Brushing Brushing was first presented in [BC87]. It is included as one of the many direct manipulation techniques in [Cle93]. There are two kinds of brushing a scatterplot matrix: labeling and enhanced linking. Labeling involves an interactive brush (e.g., a mouse pointer) that causes information label(s) to pop-up for particular display item(s). In enhanced linking, the brush is an adjustable rectangle. It is used to cover a set of points in one of the panels. Figure 4 shows a rectangle brush in panel (3,2). Data inside the rectangle is displayed with a “+” instead of a “.” X Y Z Figure 4: Enhanced brushing with the square brush located on panel (3,2). The same changes are applied to the corresponding data points in the other panels. By looking at different panels and comparing the vertical and horizontal extent of the brush, this enhanced linking technique provides a powerful direct manipulation tool for visual conditioning analysis. It is shown that the effect of brushing is more intense in a dynamic interactive display. In general, brushing can be added to many other mdmv visualization techniques [War94]. More applications can be found in [Cle93]. 9
5.2.2 HyperSlice HyperSlice [vWvL93]is one of the techniques invented during the elaboration and assessment stage.Like the scatterplot matrix,it has a matrix of panels,although each individual scatterplot is replaced with color or grey shaded graphics representing a scalar function of the variates.Furthermore,panels along the diagonal show the scalar function in terms of a single variate. HyperSlice defines a focal point of interest c=(c1,c2,...,c)and a set of scalar widths w:,where =1,...,n.Only data within the range R=[c/2,c+:/2]are displayed in the panel matrix.The rest of the data only appears if the user steers the focal point near it.Color Plate 1 shows the display of a HyperSlice of four variates.Like the coordinate system used in the scatterplot matrix,a HyperSlice panel is identified by a X5 X4 X3 X2 才 S X1 X2 X3 X4 X5 Figure 5:Navigate a five variate HyperSlice by dragging panel (4,2). horizontal and a vertical coordinate.For an off-diagonal panel i,j such thatij,the color shows the value of the scalar function that results from fixing the values of all variates except i andj to the values of the focal point, while varying i and j over their ranges in R.The diagonal panels show a graph of the scalar function versus one variate which changes over its range in R. The most important improvement of HyperSlice over the traditional scatterplot matrix is the idea of interactively navigating in the data around a user defined focal point.The user changes the focal point by interacting with any of the panels,as shown in Figure 5.The user moves the mouse into any panel and defines a direction by button down,move,and up.For example,the boldface arrow in panel(4,2)represents such an interaction.The direction of each arrow shows the motion of the focal point when the focal point is dragged in panel(4,2).Notice that the length (magnitude)of the vertical arrows across the X2 row,is the same as the vertical component of the arrow in(4,2).Similarly,each horizontal arrow in column X4 has the same length as the horizontal component of the arrow in panel (4,2).Panels solely related to X1,X3,and Xs move perpendicular to the image plan.Since the matrix is somewhat similar to an orthogonal matrix(along the grey diagonal panel),the motion on the upper left half is the mirror projection of the lower right. Interactive data navigation is a welcome addition to direct manipulation graphics.The use of the width scalar supports the notion of multiresolution analysis,and begins to address more than two-way correlations.Changing the focal point in one panel affects two variates which in turn results in simultaneous visual changes in displays of 10
5.2.2 HyperSlice HyperSlice [vWvL93] is one of the techniques invented during the elaboration and assessment stage. Like the scatterplot matrix, it has a matrix of panels, although each individual scatterplot is replaced with color or grey shaded graphics representing a scalar function of the variates. Furthermore, panels along the diagonal show the scalar function in terms of a single variate. HyperSlice defines a focal point of interest c = (c1; c2; ; cn) and a set of scalar widths wi , where i = 1; ; n. Only data within the range R = [ci wi=2; ci + wi=2] are displayed in the panel matrix. The rest of the data only appears if the user steers the focal point near it. Color Plate 1 shows the display of a HyperSlice of four variates. Like the coordinate system used in the scatterplot matrix, a HyperSlice panel is identified by a X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 Figure 5: Navigate a five variate HyperSlice by dragging panel (4,2). horizontal and a vertical coordinate. For an off-diagonal panel i; j such that i 6= j, the color shows the value of the scalar function that results from fixing the values of all variates except i and j to the values of the focal point, while varying i and j over their ranges in R. The diagonal panels show a graph of the scalar function versus one variate which changes over its range in R. The most important improvement of HyperSlice over the traditional scatterplot matrix is the idea of interactively navigating in the data around a user defined focal point. The user changes the focal point by interacting with any of the panels, as shown in Figure 5. The user moves the mouse into any panel and defines a direction by button down, move, and up. For example, the boldface arrow in panel (4,2) represents such an interaction. The direction of each arrow shows the motion of the focal point when the focal point is dragged in panel (4,2). Notice that the length (magnitude) of the vertical arrows across the X2 row, is the same as the vertical component of the arrow in (4,2). Similarly, each horizontal arrow in column X4 has the same length as the horizontal component of the arrow in panel (4,2). Panels solely related to X1, X3, and X5 move perpendicular to the image plan. Since the matrix is somewhat similar to an orthogonal matrix (along the grey diagonal panel), the motion on the upper left half is the mirror projection of the lower right. Interactive data navigation is a welcome addition to direct manipulation graphics. The use of the width scalar supports the notion of multiresolution analysis, and begins to address more than two-way correlations. Changing the focal point in one panel affects two variates which in turn results in simultaneous visual changes in displays of 10