GRAPP 2007
Fitting 3D morphable models using implicit representations
First presented at the Second International Conference
on Computer Graphics Theory and Applications 2007,
extended and revised for JVRB
urn:nbn:de:0009612799
Abstract
We consider the problem of approximating the 3D scan of a real object through an affine combination of examples. Common approaches depend either on the explicit estimation of pointtopoint correspondences or on 2dimensional projections of the target mesh; both present drawbacks. We follow an approach similar to [ IF03 ] by representing the target via an implicit function, whose values at the vertices of the approximation are used to define a robust cost function. The problem is approached in two steps, by approximating first a coarse implicit representation of the whole target, and then finer, local ones; the local approximations are then merged together with a Poissonbased method. We report the results of applying our method on a subset of 3D scans from the Face Recognition Grand Challenge v.1.0.
Keywords: 3D Morphable Models, nonrigid registration, implicit surface representations
Subjects: Computer Graphics
We consider the problem of approximating a target 3D surface with an affine combination of (registered) examples, under the assumption that both the examples and the target belong to the same object class. When the assumption holds, the affine space generated by the examples represents a model of the object class, usually known as 3D Morphable Model in case of faces [ BV99 ]. The model is parameterized by the coefficients of the affine combination. We consider the class of human faces; our goal is to find the best possible combination of the available examples that approximates a given target. This is essentially a problem of nonrigid registration, where the available deformations are constrained by the model; it can arise not only when building a 3D Morphable Model [ ACP03, BV06 ], but also for performing 3D shape analysis as done for images by [ BV03 ].
In general, we might approach the problem in different ways, depending on the representation of the target surface:

the target is explicitly represented as a triangular mesh, and the problem is solved in ³;

the target is projected to ², as depth map or cylindrical projection;

the target is represented implicitly in ³.
The first strategy is essentially based on the Iterated ClosestPoint (ICP) registration algorithm [ BM92, TL94 ]. Although originally used for rigid registration, it can be extended to nonrigid registration and to the case at hand. This was demonstrated in [ ACP03 ], where the possibility of using examplesbased models was also mentioned. The drawback of this type of approaches is the need of an explicit estimation of the pointtopoint correspondence between the model and the target. Letting aside the problems due to holes in the target, the search for corresponding points in ³ might be computationally inefficient.
The second approach, adopted by [ BV99 ] and [ BV06 ], avoids the explicit estimation of pointtopoint correspondences, by minimizing the difference between the cylindrical projections to ² of the target and the model. As well as avoiding the direct search for corresponding points, the projection to ² has the advantage of reducing the problem dimensionality. However, this comes at the cost of loosing some information about the 3D surfaces, since the projection is not a parameterization of the original surface. Moreover, the projection is usually nonlinear and occlusions frequently occur.
In this paper we investigate the possibility of solving the problem representing the target implicitly in ³. In doing so, we avoid the explicit correspondence estimation, while still working in the original domain of the data. Although it has never been applied to our specific problem, [ IF03 ] have used an implicit representation to reconstruct surfaces from uncalibrated video sequences. In their work, however, it is not the target data that are represented implicitly, but rather the model, a deformable template that has to be matched to the data. Also related to ours is the work of [ SSB05 ], where the problem of registering the target with a fixed template (not an examplebased model) was solved representing both surfaces implicitly. The implicit representations where found using Support Vector Machines (SVM).
Our work is based on a simpler implicit representation of the target mesh, based on Radial Basis Functions (RBF), while the model is still represented explicitly. The implicit representation defines a sort of potential field in the space surrounding the target surface, and can be used to define a robust cost function. In order to reduce the complexity of computing the RBF for large meshes, we first minimize the cost function over a low resolution implicit representation, and afterward we minimize it over higher resolution patches of the target. Finally, the different patches are blended together with a Poissonediting approach [ YZX04 ]. In our work we assume that the target is already coarsely aligned with the model.
In the following two subsection we briefly expose the basic notions our method builds upon: threedimensional Morphable Models, and implicit representations of surfaces.
A triangular mesh is defined by a graph M = (V,E); V is a set of n vertices in ³ and E is a set of edges connecting the vertices. For convenience we write V as a matrix
A set of m registered meshes, denoted by M_{i} , will share the same connectivity E but have different vertices positions V_{i} . We can naturally define the subspace of their affine combinations, with coefficients a = (a_{1},...,a_{m}) ∈ ^{m} , as
with
The affine constraint is needed in order to avoid scaling effects.
Note that rewriting eq. ( 3 ) in terms of the barycenter
we can eliminate the constraint on the sum of the coefficients:
Threedimensional Morphable Models (3DMM) use a representation for the affine combination based on the Principal Component Analysis (PCA) of the subspace spanned by the V_{i}  . Without going into the details of the PCA [ HTF01, pp. 6263], we only report that for 3DMM eq. ( 5 ) is written as
where the U_{i} are the principal components and the coefficient vector a is replaced by α ∈ ^{m1} . See figure 1 for a simple twodimensional example.
Assuming the examples are correctly registered, any rigid transformation has been factored out from the model. It can be explicitly included by defining a rotation matrix R and a translation vector t which are applied to V(α):
where the last term is simply an ntimes repetition of the column vector t. R and t are parameterized by a vector ρ ∈ ^{6} , holding the coefficients of the transformation (three for the rotation and three for the translation). Taking into account the rigid transformation, the model is defined by M(α,ρ) = (V(α,ρ),E).
An implicit representation of a given surface T ⊂ ³ is a function F_{T} : ³ → such that the surface is one of its level sets. That is, F_{T}(v) = h for each v ∈ T, and F_{T}(v) ≠ h otherwise, where h is a constant, typically zero. Clearly, an example of implicit representation is the Euclidean distance
In order to define a cost function based on such an implicit representation, we are interested in an analytic form for the implicit function F_{T} ; we build it following the lines of [ TO99, CBC01 ]. Given a certain 3D mesh, they look for a function F(v) which is zero on the vertices and different from zero on a set of offsurface points. The offsurface points are required to avoid the trivial solution of a function identically zero over the whole space. They are chosen to lie on the normal to the surface at the mesh vertices. Let us denote by w_{j} the vertices and the offsurface points. Then, given a radial basis function φ(x), there exist a choice of scalar weights d_{j} and of a degree one polynomial P(v) such that the function
satisfies the constraints F(w_{j}) = h_{j} and is also smooth, in the sense that minimizes the energy
a generalization to ³ of the thinplate energy. In figure 1 we show such an example of F_{T}(v), where T is a polyline in 2D.
In practice, the unknown vectors d and p (coefficients of the polynomial) are the solutions of a linear system. Given the matrices
Note that when the input mesh is large, the above linear system, if dense, might become intractable. A possible solution is to induce sparsity by choosing a basis with compact support, but this on turn creates extrapolation problems. In order to maintain sparsity and achieve good extrapolation behavior an option is to define multiple levels of resolution, as done by [ OBS03 ]. Without resorting to a compact support basis, one can use Fast Multipole Methods to reduce both the storage and the computational cost, as done in [ CBC01 ].
Given a 3DMM M(α,ρ) as defined in section 2.1, and a target mesh T, we formulate the approximation problem as the optimization
where D is a suitable function measuring the approximation cost.
Figure 1. On the left, an example of a 2D morphable model M(α): two polylines, in red and blue, representing the characters 'S' and '3'. The green curves are linear combinations of the two examples, and can be written as the average shape plus a deformation along the only principal direction. The amount of the deformation is given by the coefficient α, whose values for the three green curves are respectively α = {0.5,0,0.5}. If we consider the average shape (α = 0), it generates the implicit representation on the right, where the graylevels correspond to the values of F_{T}(x), as defined by eq. ( 9 ) with φ(x) = x. The figure in the middle shows the corresponding cost function D(M(α),T) as defined by eq. ( 15 ) with l(x) = x².
Assuming we have an implicit representation F_{T} for the target mesh, we can define the cost of approximating T by M(α,ρ) as
where the v_{i} are the vertices of M(α,ρ), and l might be a quadratic loss or an Mestimator. The choice of an Mestimator might be necessary when parts of the model M are not present in the target T and influence the correct approximation of the rest. Observe that in the above definition we are essentially treating the values F_{T}(v_{i}) as residuals of the approximation. In fact, if F_{T} was the Euclidean distance function and l(x) = x², then D would correspond to an l_{2} norm.
In order to find the minimum of the cost function ( 15 ), we use a modified Newton's method. For the sake of clarity, in the following discussion we denote by θ the vector of all model coefficients, without distinctions between α and ρ. Recall that the exact Newton's method consists in iteratively updating θ by adding the solution p of the linear system
where ∇²D(θ) denotes the Hessian matrix of D at θ. The exact method converges to a minimum of D when sufficiently close to it, but it does not in general; nevertheless, there are a number of standard modifications that make it more efficient and robust. We employ a simple scheme, which keeps the Hessian matrix sufficiently positive definite by adding a multiple of the identity when required [ NW99, Ch.6.3]. Another modification to the exact method is that the update length is not unitary, but it is determined by a backtracking procedure which reduces the length if the update does not reduce D [ NW99, Ch.3.1].
Having the exact form of the implicit representation allows us to compute analytically the gradient and the Hessian matrices of the cost function:
and
In the above equations we used l'_{i} and l''_{i} to denote the first and second derivatives of the loss function computed at F_{T}(v_{i}); ∇F_{i} and ∇²F_{i} denotes the gradient vector and the Hessian matrix of F_{T} with respect to the spatial coordinates, computed at v_{i}.
In practice we choose l to be the Tukey estimator, that is
with derivatives
Note that the effectiveness of the Tukey estimator in reducing the influence of model vertices not present in the target depends on the scale of the constant c: the smaller the constant, the smaller the influence of missing vertices. On the other hand, small c's require a good prealignment of the model with the target. In all our experiments we used c = 5.0.
The gradient and Hessian of F_{T} depends on the choice of the basis φ. Following [ CBC01 ], we use the biharmonic basis function φ(x) = x, which results in the smoothest solution to the interpolation problem among all radial basis functions. Such a function with noncompact support is also more suitable to inter and extrapolation. The derivatives are easily computed
Accordingly, we have
In many cases, in particular to avoid overfitting, it might be convenient to add to the cost function ( 15 ) a regularization term which penalizes excessively large model coefficients:
where the parameters η_{α} and η_{ρ} weight the effect of the regularization on the shape and the rigid parameters, respectively.
A standard way to choose the regularization terms consists in deriving them from prior probabilities. In the case of the shape coefficients, the PCA model assumes for α a normal distribution with unit variance; for the rigid coefficients we also assume a Gaussian distribution with zero mean, but with empirically chosen variances. In both cases, we can define a regularization term proportional to the inverse loglikelihood. For a generic coefficient θ with variance σ² we have
so that we set (Σ_{ρ} is the diagonal covariance matrix for the rigid coefficients)
and
Figure 2. Examples of input data and the results of each approximation step. On the left column are shown the target (top image) and the implicit representation of its subsampling (bottom). Fitting the model to the latter results in the shape in the middle column, top image. The next step consist in separately fitting the segments of the model, shown in the bottomcenter image, to corresponding parts of the target. The result is shown in the right column: on the top are the different approximations stitched together, on the bottom the result of blending them.
As noted in section 2.2, a direct solution of the interpolation problem by solving the linear system is computationally and storage intensive. In fact, for high resolution target meshes, the system is too large to be allocated in memory. Rather than following one of the methods mentioned in section 2.2, we decided to overcome the problem by adopting a simpler multiresolution approach (refer to figure 2 for examples of the intermediate steps outputs):

Coarse Approximation. We first compute a coarse approximation of the target. We select a subset of the target mesh vertices, sampled with probabilities proportional to the areas of the adjoining triangles, and use them as constraints of the interpolation problem. The resulting implicit surface is fit with the procedure described in the previous section.

Partitioning. Once we have obtained the coarse approximation of the target, we can achieve a higher resolution by repeating the process on smaller patches of the target and finally merging the local approximations. To this aim, we manually defined four regions on the model topology; the regions are visible in figure 2, middle image of the bottom row. Given the coarse approximation, we compute the bounding box of each region, we expand it in all directions by a fixed length (2 cm in our experiments), and select all the target′s vertices falling inside the box. This results in four overlapping subsets of the target vertices, each one associated with a different segment of the model.

Finer Approximations. For each subset of the target vertices we build an (approximate) implicit representation by sampling its vertices as done in step 1. Although the subsets are again subsampled, the ratio between the number of sampled vertices and the total number of vertices clearly increases, resulting in a more precise representation. The implicit surfaces are fit again with the usual procedure.
It is clear that the approximation method explained in the previous section provides local results, which do not match precisely at their boundaries. In order to merge them smoothly, we use a variational approach akin to the Poissonbased mesh editing of [ YZX04 ], to which we refer for more details. The main idea is to keep fixed the positions of the vertices in the interiors of the segments and let the other vertices relax to the positions which minimize an elastic energy. The procedure is as follows:

for each vertex, compute its minimum distance from the boundaries, by fast marching [ KS98 ];

define the interior as the set of vertices with distance greater than a certain threshold (we used 0.5 cm);
Steps 1 to 3 define the mesh interior Ω and its complement, the boundary region : an area with given width that surrounds the segments boundaries. We denote by ∂Ω the set of vertices in the interior Ω that are connected to any vertex in the boundary region . While the vertices in Ω are fixed, the positions of the vertices in are obtained by solving the Poisson equation:
In practice, one solves the Poisson equation for the field of displacements from the coarse approximation to the fine one. This formulation yields a sparse linear system of equations of the type Δd_{i} = 0, where Δ is the discrete Laplacian operator over the mesh [ Tau95 ], and d_{i} is the displacement of the ith vertex in . The system is solved under the boundary constraints at ∂Ω, where the displacements are known.
The only parameter that has to be set for the blending process is the width of the boundary region . The smaller the width, the smaller the blending effect, and more details will be retained from the local approximations of the segments. On the contrary, by increasing the width one extends the effect of blending. In order to obtain visually pleasant results, the user will have to set for a tradeoff between blending and details. However, the choice of the width is highly taskdependant; for instance, if the main use of the approximations is to perform face recognition, the blending might be skipped altogether.
The above method has been tested using as model a subsampled version of the mixed expressionidentity 3D Morphable Model built with the algorithm described in [ BV06 ]. The subsampling is simple, since the reference template was built by subdivision of a lowresolution one. Out of around 40k vertices at full resolution, we retained approximately 2.5k. As well as reducing the number of vertices, we also discarded the expression shape components and all the texture components. The targets were a set of 165 range scans, randomly sampled from the range data distributed for the Face Recognition Grand Challenge v.1.0 [ PFS05 ]. The scans were distributed with a list of landmark positions, which we used to prealign the faces to the model. The approximations have been performed using 99 model components and 500 sampled vertices for building the implicit representations. We should remark that these are preliminary experiments, and a more indepth study will follow.
We evaluated the goodness of the fit by cylindrically projecting the target and the approximation, and computing the average absolute error (a radial distance) on the four segments. The rest of the head model was not taken into account because of the hair and clothes often present in the scans. Of the 165 results, we rejected three of them in which the error was more than three standard deviations larger than the average.
In the remaining 163 results, the average radial error was 1.09 mm on the blended highresolution result, with an average improvement with respect to the low resolution approximation of 0.39 mm. Some of the results are shown in figure 4.
Indeed, a comparison with our previous method, [ BV06 ], confirms the good performance of the new method based on the implicit representations, both in terms of stability and of approximation accuracy. Of the 165 examples used, the algorithm of [ BV06 ] failed in 21 cases, ten times more than the new one, and in the remaining cases, the average error was of 2.11 mm. This last result in particular is remarkable, since the algorithm used for comparison is directly minimizing the radial error, while the new algorithm minimizes it only indirectly.
We also ran a small experiment to assess the dependency of the approximation on the number of principal components used and the number of vertices sampled from the target surface. We repeatedly computed the low resolution approximations for a small subset of test data (only four examples), varying the two aforementioned parameters. The results are in figure 3. The dependency on the number of principal components used in the model behaves as expected: up to a point, the more components are used the better the approximation performance. This is not true anymore when input is too noisy, as it probably occurs when the number of sampled vertices is only 100. More interestingly, the dependency on the number of sampled vertices shows that an increase on the sample rate pays off only up to a point, while the improvement from 500 to 1000 is minimal.
Figure 3. Dependency of the approximation on the number of principal components used by the model and the number of vertices sampled from the target surface.
Figure 4. Some examples of approximations (second row, the original are on the top row). The average pervertex radial errors for the three examples are, respectively:0.80, 0.86 and 0.80 mm. The bottom two rows show error maps for the segmented approximations (third row) and the low resolution approximation (last row). Black is zero error, white is 3mm or more. The improvements in the segmented approximations are especially evident in the nose region.
We have presented a method which makes simple use of the implicit representation of a surface to find its optimal approximation in terms of an affine combination of examples. Implicit representations are appealing in general, because they are topologyfree and typically quite robust to holes in the data. In our setting, they also offer the advantage of completely avoiding the problem of estimating correspondences.
As we saw, however, the use of implicit representations poses serious computational problems when dealing with highresolution meshes. Therefore, we have proposed to tackle the problem in a multiresolution fashion, and we have shown how this approach can provide good results without computing fullresolution representations. We should remark that, in assessing the method, we considered its performance only in absolute terms, without comparing it with respect to other, already published algorithms. This is certainly a deficiency of our work, and we will have to correct it in the future. Nevertheless, we believe that the absolute performance achieved by our method on the Face Recognition Grand Challenge range data is a strong indicator of its applicability to a realworld scenario.
The subsampling of the original surface is further motivated by the consideration that the model has also a fixed level of resolution. The resolution of the model does not only dependent on the number of vertices of its mesh, but especially on the number of training examples and of components used. It seems therefore reasonable to tweak the resolution of the target so that it matches the one of the model. On the other hand, subsampling poses problems. This is particularly clear when considering the approximation results obtained on targets containing clothes or hairs. In the best case these data are irrelevant, in the worst they are harmful, since they will cause distortions in the lowresolution implicit representation, which might be enhanced by an unlucky subsampling. Preprocessing of the targets which removes these data would certainly improve the method's performance and stability.
The future development will focus on two problems: first, the choice of the optimal segments in which the model is partitioned, and second, the integration of a texture model in the approximation scheme.
[ACP03] The space of human body shapes: reconstruction and parameterization from range scans, ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2003), (2003), no. 3, 587—594, issn 07300301.
[BM92] A Method for Registration of 3D Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, (1992), no. 2, 239—256 issn 01628828.
[BV99] A Morphable Model for the Synthesis of 3D Faces, Proceedings of the 26st International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'99), 1999, pp. 187—194, isbn 0201485605.
[BV03] Face Recognition Based on Fitting a 3D Morphable Model, IEEE Trans. Pattern Anal. Mach. Intell., (2003), no. 9, 1063—1074, issn 01628828.
[BV06] Registration of Expressions Data using a 3D Morphable Model, Journal of Multimedia, (2006), no. 4, 37—45, issn 17962048.
[CBC01] Reconstruction and Representation of 3D Objects with Radial Basis Functions, Proc. of ACM SIGGRAPH 2001, 2001, pp. 67—76, isbn 158113374X.
[HTF01] The Elements of Statistical Learning, Springer, 2001, Springer Series in Statistic, New York, USA, isbn 0387952845.
[IF03] Implicit Meshes for Modeling and Reconstruction, Proc. Conf. Computer Vision and Pattern Recognition 2003, Volume II, 2003, pp. 483—492, isbn 0769519008.
[KS98] Computing geodesic paths on manifolds, Proc. Natl. Acad. Sci. USA, (1998), no. 15, 8431—8435, issn 00278424.
[NW99] Numerical Optimization, Springer, 1999, Springer Series in Operations Research, New York, NY, USA, isbn 0387987932.
[OBS03] A Multiscale Approach to 3D Scattered Data Interpolation with Compactly Supported Basis Functions, SMI '03: Proceedings of the Shape Modeling International 2003, Washington, DC, USA, IEEE Computer Society, 2003, pp. 153—164, isbn 0769519091.
[PFS05] Overview of the Face Recognition Grand Challenge, Computer Vision and Pattern Recognition (CVPR 2005), Los Alamitos, CA, USA, June 2005, , pp. 947—954, IEEE Computer Society, issn 10636919.
[SSB05] Support Vector Machines for 3D Shape Processing, Computer Graphics Forum, (2005), no. 3, 285—294, issn 01677055.
[Tau95] A signal processing approach to fair surface design, SIGGRAPH '95: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, 1995, pp. 351—358, New York, NY, USA, ACM Press, isbn 0807917014.
[TL94] Zippered polygon meshes from range images, SIGGRAPH '94: Proceedings of the 21st annual conference on Computer graphics and interactive techniques, 1994, pp. 311—318, New York, NY, USA, ACM Press, isbn 0897916670.
[TO99] Shape Transformation Using Variational Implicit Functions, Proc. of ACM SIGGRAPH 99, 1999, pp. 335—342, isbn 0201485605.
[YZX04] Mesh Editing with PoissonBased Gradient Field Manipulation, ACM Transactions on Graphics, (2004), no. 3, 641—648, issn 07300301.
Fulltext ¶
 Volltext als PDF ( Size 1.7 MB )
License ¶
Any party may pass on this Work by electronic means and make it available for download under the terms and conditions of the Digital Peer Publishing License. The text of the license may be accessed and retrieved at http://www.dipp.nrw.de/lizenzen/dppl/dppl/DPPL_v2_en_062004.html.
Recommended citation ¶
Curzio Basso, and Alessandro Verri, Fitting 3D morphable models using implicit representations. JVRB  Journal of Virtual Reality and Broadcasting, 4(2007), no. 18. (urn:nbn:de:0009612799)
Please provide the exact URL and date of your last visit when citing this article.