SPEC CPU2000 Benchmark Description File Benchmark Name: 187.facerec Benchmark Author: Jan C. Vorbrüggen (jvorbrueggen@mediasec.de) Benchmark Program General Category: Image Processing Benchmark Description: This is an implementation of the face recognition system described in M. Lades et al. (1993), IEEE Trans. Comp. 42(3):300-311. In this application, an object - here, faces photographed frontally - are represented as labeled graphs. In the simplest case, used here, the graph is a regular grid. To each vertex of the grid graph a set of features are attached; they are computed from the Gabor wavelet transform of the image and represent it in the surroundings of a vertex. An edge of the graph is labeled with the vector connecting its two vertices and represents the topographical relationship of those vertices. An object represented in this way can now be compared to a new image in a process called elastic graph matching. This is done by first determining the Gabor wavelet transform for the new image. Then, for a given correspondance between the graph's vertices and a set of image points, a function taking into account both the similarity of the feature vectors at every vertex and its corresponding image point, and the distortion of the graph generated by the set of image points, measured as the change in the edge labels, can be computed. This graph similarity function is then the objective function of an optimization process that varies the set of corresponding points in the image. This optimization process is implemented in two steps: The global move step keeps the graph rigid and moves it systematically over all of the image, resulting in a placement that has the highest similarity to the graph. This step can be considered as finding the object (face) in the image. The local move step then takes this placement as the starting position, and visits every vertex in random order. At each vertex, the similarity function is evaluated on a small subgrid surrounding the current position. (This is a small change from the algorithm as originally published, where the trial moves at each node were random as well.) If the similarity function's value is improved at one of those positions, the change is made permanent; such a move is called a hop. One round visiting each vertex position is called a sweep. The local move step terminates when a sweep is completed without a hop having been performed. The benchmark consists of the following main phases: o Face Learning: The system has no prior knowledge of the class of object it is supposed to recognize. It "learns" this by extracting a canonic graph from one so-called canonic image; that image and the position at which the graph is to be extracted are specified by the user. o Graph Generation: For each of the images in the album gallery (see Input Description, below), the Gabor wavelet transform is computed, and the global move step is performed using the canonic graph. The resultant graph is extracted from the transform and stored. o Recognition: For each of the images in the probe gallery (see Input Description, below), the Gabor wavelet transform is computed, and the global move step is performed using the canonic graph. Then, a local move step is performed using each of the stored graphs. The resultant vector of similarity values is searched for the maximal value; the associated graph (and image) indicate the person recognized. The parts that take the most computational time have the following characteristics: o Gabor Wavelet Transform: The transform is performed by computing the forward fast Fourier transform (FFT) of the image, multiplying it with a number (here, 40) of kernels, computing the backward FFT for each of the results, and inserting the absolute value for each pixel into a two-dimensional array of feature vectors. This last step is similar to performing a transpose of a large matrix, and stresses a processor's memory subsystem. Finally, each feature vector is normalized. Run time is proportional to the sum of the number of entries in the album and probe galleries. o Global Move: This takes only a smallish part of the total run time. It is dominated by the computation of the feature similarity function, which basically is the scalar product of the two feature vectors (one from the graph, one from the image transform). Again, run time is proportional to the sum of the number of entries in the album and probe galleries. o Local Move: The local move step is dominated by the computation of the similarity function. In addition to the feature similarity function described above, now also the distortion of the grid introduced by the hops has to be taken into account. This is done incrementally, i.e., the contribution of the vertex currently being considered to the distortion is computed for both its old and its new position, which entails handling the nine different positions a vertex can be in the graph. The run time of this phase is proportional to the product of the number of entries in the album and probe galleries. The program allocates its memory on reading the run parameters (see below), and makes use of it while generating the graphs. During each recognition step, practically all of the code is exercised. Input Description: The program first reads a set of text files that contain the parameter settings governing the current run. It then reads a number of grey-level images (256 by 256 pixels in size) of as many different persons' faces, which constitute the album gallery to perform comparisons on, and computes the labeled graphs representing these faces. It then reads another set of images, the probe gallery, of the same persons as are represented in the album gallery, but which differ in head pose, hair style, addition or removal of glasses, and so on. For each of these, it computes the Gabor wavelet transform, localizes the face using the global move step, and performs the local move step for each graph from the album gallery. This sequence of events is similar to what would be required when a set of persons, of which a reference image each is given, are searched for in a larger database, e.g., one extracted from a set of TV clips. This form of an image database search is one typical use of the original application. For the test data set, the album and probe gallery each contain two images. One of these images is the same as the canonic image mentioned above; the result of this comparison is known a priori and is used as a consistency check for the implementation. For the training data set, the album gallery contains 13 images and the probe gallery contains 26 images (two per person in the album gallery). For the reference data set, the album gallery contains 42 images and the probe gallery contains 84 images (one to three per person in the album gallery); these are all distinct from the images in the training data set. Output Description: On startup, the program first prints all parameters governing the run to standard output. It then prints a line for each entry in the album gallery with the name of the image file, the position of the labeled graph as determined by the global move step, and its similarity with the canonic graph. For each of the entries in the probe gallery, the same information is output; this is followed by a line giving the name of the best match from the album gallery and the similarity computed with the corresponding graph. Finally, after all comparisons have been performed, a summary is printed reporting the total number of comparisons, the number of correct and incorrect comparisons, and the total number of hops and of sweeps performed. During the comparison process, for each entry in the probe gallery the number of hops and of sweeps, seperately for the best matching entry and for all entries in the album gallery accumulated, is reported to a secondary output file named hops.dat. This file must satisfy somewhat less stringent requirements for comparison to the reference output than the primary report to standard output described above. Of the 84 entries in the probe gallery of the reference data set, 80 entries (95%) result in the correct entry from the album gallery being the best match. Programming Language: FORTRAN 90 Known portability issues: None References: M. Lades et al. (1993), Distortion Invariant Object Recognition in the Dynamic Link Architecture, IEEE Trans. Comp. 42(3):300-311. http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/computerVision/contents.html