GRAPP 2007
Fitting 3D morphable models using implicit representations
First presented at the Second International Conference
on Computer Graphics Theory and Applications 2007,
extended and revised for JVRB
urn:nbn:de:0009-6-12799
Abstract
We consider the problem of approximating the 3D scan of a real object through an affine combination of examples. Common approaches depend either on the explicit estimation of point-to-point correspondences or on 2-dimensional projections of the target mesh; both present drawbacks. We follow an approach similar to [ IF03 ] by representing the target via an implicit function, whose values at the vertices of the approximation are used to define a robust cost function. The problem is approached in two steps, by approximating first a coarse implicit representation of the whole target, and then finer, local ones; the local approximations are then merged together with a Poisson-based method. We report the results of applying our method on a subset of 3D scans from the Face Recognition Grand Challenge v.1.0.
Keywords: 3D Morphable Models, non-rigid registration, implicit surface representations
Subjects: Computer Graphics
We consider the problem of approximating a target 3D surface with an affine combination of (registered) examples, under the assumption that both the examples and the target belong to the same object class. When the assumption holds, the affine space generated by the examples represents a model of the object class, usually known as 3D Morphable Model in case of faces [ BV99 ]. The model is parameterized by the coefficients of the affine combination. We consider the class of human faces; our goal is to find the best possible combination of the available examples that approximates a given target. This is essentially a problem of non-rigid registration, where the available deformations are constrained by the model; it can arise not only when building a 3D Morphable Model [ ACP03, BV06 ], but also for performing 3D shape analysis as done for images by [ BV03 ].
In general, we might approach the problem in different ways, depending on the representation of the target surface:
-
the target is explicitly represented as a triangular mesh, and the problem is solved in ³;
-
the target is projected to ², as depth map or cylindrical projection;
-
the target is represented implicitly in ³.
The first strategy is essentially based on the Iterated Closest-Point (ICP) registration algorithm [ BM92, TL94 ]. Although originally used for rigid registration, it can be extended to non-rigid registration and to the case at hand. This was demonstrated in [ ACP03 ], where the possibility of using examples-based models was also mentioned. The drawback of this type of approaches is the need of an explicit estimation of the point-to-point correspondence between the model and the target. Letting aside the problems due to holes in the target, the search for corresponding points in ³ might be computationally inefficient.
The second approach, adopted by [ BV99 ] and [ BV06 ], avoids the explicit estimation of point-to-point correspondences, by minimizing the difference between the cylindrical projections to ² of the target and the model. As well as avoiding the direct search for corresponding points, the projection to ² has the advantage of reducing the problem dimensionality. However, this comes at the cost of loosing some information about the 3D surfaces, since the projection is not a parameterization of the original surface. Moreover, the projection is usually non-linear and occlusions frequently occur.
In this paper we investigate the possibility of solving the problem representing the target implicitly in ³. In doing so, we avoid the explicit correspondence estimation, while still working in the original domain of the data. Although it has never been applied to our specific problem, [ IF03 ] have used an implicit representation to reconstruct surfaces from uncalibrated video sequences. In their work, however, it is not the target data that are represented implicitly, but rather the model, a deformable template that has to be matched to the data. Also related to ours is the work of [ SSB05 ], where the problem of registering the target with a fixed template (not an example-based model) was solved representing both surfaces implicitly. The implicit representations where found using Support Vector Machines (SVM).
Our work is based on a simpler implicit representation of the target mesh, based on Radial Basis Functions (RBF), while the model is still represented explicitly. The implicit representation defines a sort of potential field in the space surrounding the target surface, and can be used to define a robust cost function. In order to reduce the complexity of computing the RBF for large meshes, we first minimize the cost function over a low resolution implicit representation, and afterward we minimize it over higher resolution patches of the target. Finally, the different patches are blended together with a Poisson-editing approach [ YZX04 ]. In our work we assume that the target is already coarsely aligned with the model.
In the following two subsection we briefly expose the basic notions our method builds upon: three-dimensional Morphable Models, and implicit representations of surfaces.
A triangular mesh is defined by a graph M = (V,E); V is a set of n vertices in ³ and E is a set of edges connecting the vertices. For convenience we write V as a matrix
A set of m registered meshes, denoted by Mi , will share the same connectivity E but have different vertices positions Vi . We can naturally define the subspace of their affine combinations, with coefficients a = (a1,...,am) ∈ m , as
with
The affine constraint is needed in order to avoid scaling effects.
Note that rewriting eq. ( 3 ) in terms of the barycenter
we can eliminate the constraint on the sum of the coefficients:
Three-dimensional Morphable Models (3DMM) use a representation for the affine combination based on the Principal Component Analysis (PCA) of the subspace spanned by the Vi - . Without going into the details of the PCA [ HTF01, pp. 62-63], we only report that for 3DMM eq. ( 5 ) is written as
where the Ui are the principal components and the coefficient vector a is replaced by α ∈ m-1 . See figure 1 for a simple two-dimensional example.
Assuming the examples are correctly registered, any rigid transformation has been factored out from the model. It can be explicitly included by defining a rotation matrix R and a translation vector t which are applied to V(α):
where the last term is simply an n-times repetition of the column vector t. R and t are parameterized by a vector ρ ∈ 6 , holding the coefficients of the transformation (three for the rotation and three for the translation). Taking into account the rigid transformation, the model is defined by M(α,ρ) = (V(α,ρ),E).
An implicit representation of a given surface T ⊂ ³ is a function FT : ³ → such that the surface is one of its level sets. That is, FT(v) = h for each v ∈ T, and FT(v) ≠ h otherwise, where h is a constant, typically zero. Clearly, an example of implicit representation is the Euclidean distance
In order to define a cost function based on such an implicit representation, we are interested in an analytic form for the implicit function FT ; we build it following the lines of [ TO99, CBC01 ]. Given a certain 3D mesh, they look for a function F(v) which is zero on the vertices and different from zero on a set of off-surface points. The off-surface points are required to avoid the trivial solution of a function identically zero over the whole space. They are chosen to lie on the normal to the surface at the mesh vertices. Let us denote by wj the vertices and the off-surface points. Then, given a radial basis function φ(x), there exist a choice of scalar weights dj and of a degree one polynomial P(v) such that the function
satisfies the constraints F(wj) = hj and is also smooth, in the sense that minimizes the energy
a generalization to ³ of the thin-plate energy. In figure 1 we show such an example of FT(v), where T is a poly-line in 2D.
In practice, the unknown vectors d and p (coefficients of the polynomial) are the solutions of a linear system. Given the matrices
Note that when the input mesh is large, the above linear system, if dense, might become intractable. A possible solution is to induce sparsity by choosing a basis with compact support, but this on turn creates extrapolation problems. In order to maintain sparsity and achieve good extrapolation behavior an option is to define multiple levels of resolution, as done by [ OBS03 ]. Without resorting to a compact support basis, one can use Fast Multipole Methods to reduce both the storage and the computational cost, as done in [ CBC01 ].
Given a 3DMM M(α,ρ) as defined in section 2.1, and a target mesh T, we formulate the approximation problem as the optimization
where D is a suitable function measuring the approximation cost.
Figure 1. On the left, an example of a 2D morphable model M(α): two poly-lines, in red and blue, representing the characters 'S' and '3'. The green curves are linear combinations of the two examples, and can be written as the average shape plus a deformation along the only principal direction. The amount of the deformation is given by the coefficient α, whose values for the three green curves are respectively α = {-0.5,0,0.5}. If we consider the average shape (α = 0), it generates the implicit representation on the right, where the gray-levels correspond to the values of FT(x), as defined by eq. ( 9 ) with φ(x) = ||x||. The figure in the middle shows the corresponding cost function D(M(α),T) as defined by eq. ( 15 ) with l(x) = x².
Assuming we have an implicit representation FT for the target mesh, we can define the cost of approximating T by M(α,ρ) as
where the vi are the vertices of M(α,ρ), and l might be a quadratic loss or an M-estimator. The choice of an M-estimator might be necessary when parts of the model M are not present in the target T and influence the correct approximation of the rest. Observe that in the above definition we are essentially treating the values FT(vi) as residuals of the approximation. In fact, if FT was the Euclidean distance function and l(x) = x², then D would correspond to an l2 norm.
In order to find the minimum of the cost function ( 15 ), we use a modified Newton's method. For the sake of clarity, in the following discussion we denote by θ the vector of all model coefficients, without distinctions between α and ρ. Recall that the exact Newton's method consists in iteratively updating θ by adding the solution p of the linear system
where ∇²D(θ) denotes the Hessian matrix of D at θ. The exact method converges to a minimum of D when sufficiently close to it, but it does not in general; nevertheless, there are a number of standard modifications that make it more efficient and robust. We employ a simple scheme, which keeps the Hessian matrix sufficiently positive definite by adding a multiple of the identity when required [ NW99, Ch.6.3]. Another modification to the exact method is that the update length is not unitary, but it is determined by a backtracking procedure which reduces the length if the update does not reduce D [ NW99, Ch.3.1].
Having the exact form of the implicit representation allows us to compute analytically the gradient and the Hessian matrices of the cost function:
and
In the above equations we used l'i and l''i to denote the first and second derivatives of the loss function computed at FT(vi); ∇Fi and ∇²Fi denotes the gradient vector and the Hessian matrix of FT with respect to the spatial coordinates, computed at vi.
In practice we choose l to be the Tukey estimator, that is
with derivatives
Note that the effectiveness of the Tukey estimator in reducing the influence of model vertices not present in the target depends on the scale of the constant c: the smaller the constant, the smaller the influence of missing vertices. On the other hand, small c's require a good pre-alignment of the model with the target. In all our experiments we used c = 5.0.
The gradient and Hessian of FT depends on the choice of the basis φ. Following [ CBC01 ], we use the biharmonic basis function φ(x) = ||x||, which results in the smoothest solution to the interpolation problem among all radial basis functions. Such a function with non-compact support is also more suitable to inter- and extrapolation. The derivatives are easily computed
Accordingly, we have
In many cases, in particular to avoid overfitting, it might be convenient to add to the cost function ( 15 ) a regularization term which penalizes excessively large model coefficients:
where the parameters ηα and ηρ weight the effect of the regularization on the shape and the rigid parameters, respectively.
A standard way to choose the regularization terms consists in deriving them from prior probabilities. In the case of the shape coefficients, the PCA model assumes for α a normal distribution with unit variance; for the rigid coefficients we also assume a Gaussian distribution with zero mean, but with empirically chosen variances. In both cases, we can define a regularization term proportional to the inverse log-likelihood. For a generic coefficient θ with variance σ² we have
so that we set (Σρ is the diagonal covariance matrix for the rigid coefficients)
and
Figure 2. Examples of input data and the results of each approximation step. On the left column are shown the target (top image) and the implicit representation of its subsampling (bottom). Fitting the model to the latter results in the shape in the middle column, top image. The next step consist in separately fitting the segments of the model, shown in the bottom-center image, to corresponding parts of the target. The result is shown in the right column: on the top are the different approximations stitched together, on the bottom the result of blending them.
As noted in section 2.2, a direct solution of the interpolation problem by solving the linear system is computationally and storage intensive. In fact, for high resolution target meshes, the system is too large to be allocated in memory. Rather than following one of the methods mentioned in section 2.2, we decided to overcome the problem by adopting a simpler multi-resolution approach (refer to figure 2 for examples of the intermediate steps outputs):
-
Coarse Approximation. We first compute a coarse approximation of the target. We select a subset of the target mesh vertices, sampled with probabilities proportional to the areas of the adjoining triangles, and use them as constraints of the interpolation problem. The resulting implicit surface is fit with the procedure described in the previous section.
-
Partitioning. Once we have obtained the coarse approximation of the target, we can achieve a higher resolution by repeating the process on smaller patches of the target and finally merging the local approximations. To this aim, we manually defined four regions on the model topology; the regions are visible in figure 2, middle image of the bottom row. Given the coarse approximation, we compute the bounding box of each region, we expand it in all directions by a fixed length (2 cm in our experiments), and select all the target′s vertices falling inside the box. This results in four overlapping subsets of the target vertices, each one associated with a different segment of the model.
-
Finer Approximations. For each subset of the target vertices we build an (approximate) implicit representation by sampling its vertices as done in step 1. Although the subsets are again sub-sampled, the ratio between the number of sampled vertices and the total number of vertices clearly increases, resulting in a more precise representation. The implicit surfaces are fit again with the usual procedure.
It is clear that the approximation method explained in the previous section provides local results, which do not match precisely at their boundaries. In order to merge them smoothly, we use a variational approach akin to the Poisson-based mesh editing of [ YZX04 ], to which we refer for more details. The main idea is to keep fixed the positions of the vertices in the interiors of the segments and let the other vertices relax to the positions which minimize an elastic energy. The procedure is as follows:
-
for each vertex, compute its minimum distance from the boundaries, by fast marching [ KS98 ];
-
define the interior as the set of vertices with distance greater than a certain threshold (we used 0.5 cm);
Steps 1 to 3 define the mesh interior Ω and its complement, the boundary region : an area with given width that surrounds the segments boundaries. We denote by ∂Ω the set of vertices in the interior Ω that are connected to any vertex in the boundary region . While the vertices in Ω are fixed, the positions of the vertices in are obtained by solving the Poisson equation:
In practice, one solves the Poisson equation for the field of displacements from the coarse approximation to the fine one. This formulation yields a sparse linear system of equations of the type Δdi = 0, where Δ is the discrete Laplacian operator over the mesh [ Tau95 ], and di is the displacement of the i-th vertex in . The system is solved under the boundary constraints at ∂Ω, where the displacements are known.
The only parameter that has to be set for the blending process is the width of the boundary region . The smaller the width, the smaller the blending effect, and more details will be retained from the local approximations of the segments. On the contrary, by increasing the width one extends the effect of blending. In order to obtain visually pleasant results, the user will have to set for a trade-off between blending and details. However, the choice of the width is highly task-dependant; for instance, if the main use of the approximations is to perform face recognition, the blending might be skipped altogether.
The above method has been tested using as model a sub-sampled version of the mixed expression-identity 3D Morphable Model built with the algorithm described in [ BV06 ]. The sub-sampling is simple, since the reference template was built by subdivision of a low-resolution one. Out of around 40k vertices at full resolution, we retained approximately 2.5k. As well as reducing the number of vertices, we also discarded the expression shape components and all the texture components. The targets were a set of 165 range scans, randomly sampled from the range data distributed for the Face Recognition Grand Challenge v.1.0 [ PFS05 ]. The scans were distributed with a list of landmark positions, which we used to pre-align the faces to the model. The approximations have been performed using 99 model components and 500 sampled vertices for building the implicit representations. We should remark that these are preliminary experiments, and a more in-depth study will follow.
We evaluated the goodness of the fit by cylindrically projecting the target and the approximation, and computing the average absolute error (a radial distance) on the four segments. The rest of the head model was not taken into account because of the hair and clothes often present in the scans. Of the 165 results, we rejected three of them in which the error was more than three standard deviations larger than the average.
In the remaining 163 results, the average radial error was 1.09 mm on the blended high-resolution result, with an average improvement with respect to the low resolution approximation of 0.39 mm. Some of the results are shown in figure 4.
Indeed, a comparison with our previous method, [ BV06 ], confirms the good performance of the new method based on the implicit representations, both in terms of stability and of approximation accuracy. Of the 165 examples used, the algorithm of [ BV06 ] failed in 21 cases, ten times more than the new one, and in the remaining cases, the average error was of 2.11 mm. This last result in particular is remarkable, since the algorithm used for comparison is directly minimizing the radial error, while the new algorithm minimizes it only indirectly.
We also ran a small experiment to assess the dependency of the approximation on the number of principal components used and the number of vertices sampled from the target surface. We repeatedly computed the low resolution approximations for a small subset of test data (only four examples), varying the two aforementioned parameters. The results are in figure 3. The dependency on the number of principal components used in the model behaves as expected: up to a point, the more components are used the better the approximation performance. This is not true anymore when input is too noisy, as it probably occurs when the number of sampled vertices is only 100. More interestingly, the dependency on the number of sampled vertices shows that an increase on the sample rate pays off only up to a point, while the improvement from 500 to 1000 is minimal.
Figure 3. Dependency of the approximation on the number of principal components used by the model and the number of vertices sampled from the target surface.
Figure 4. Some examples of approximations (second row, the original are on the top row). The average per-vertex radial errors for the three examples are, respectively:0.80, 0.86 and 0.80 mm. The bottom two rows show error maps for the segmented approximations (third row) and the low resolution approximation (last row). Black is zero error, white is 3mm or more. The improvements in the segmented approximations are especially evident in the nose region.
We have presented a method which makes simple use of the implicit representation of a surface to find its optimal approximation in terms of an affine combination of examples. Implicit representations are appealing in general, because they are topology-free and typically quite robust to holes in the data. In our setting, they also offer the advantage of completely avoiding the problem of estimating correspondences.
As we saw, however, the use of implicit representations poses serious computational problems when dealing with high-resolution meshes. Therefore, we have proposed to tackle the problem in a multi-resolution fashion, and we have shown how this approach can provide good results without computing full-resolution representations. We should remark that, in assessing the method, we considered its performance only in absolute terms, without comparing it with respect to other, already published algorithms. This is certainly a deficiency of our work, and we will have to correct it in the future. Nevertheless, we believe that the absolute performance achieved by our method on the Face Recognition Grand Challenge range data is a strong indicator of its applicability to a real-world scenario.
The sub-sampling of the original surface is further motivated by the consideration that the model has also a fixed level of resolution. The resolution of the model does not only dependent on the number of vertices of its mesh, but especially on the number of training examples and of components used. It seems therefore reasonable to tweak the resolution of the target so that it matches the one of the model. On the other hand, sub-sampling poses problems. This is particularly clear when considering the approximation results obtained on targets containing clothes or hairs. In the best case these data are irrelevant, in the worst they are harmful, since they will cause distortions in the low-resolution implicit representation, which might be enhanced by an unlucky sub-sampling. Preprocessing of the targets which removes these data would certainly improve the method's performance and stability.
The future development will focus on two problems: first, the choice of the optimal segments in which the model is partitioned, and second, the integration of a texture model in the approximation scheme.
[ACP03] The space of human body shapes: reconstruction and parameterization from range scans, ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2003), (2003), no. 3, 587—594, issn 0730-0301.
[BM92] A Method for Registration of 3-D Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, (1992), no. 2, 239—256 issn 0162-8828.
[BV99] A Morphable Model for the Synthesis of 3D Faces, Proceedings of the 26st International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'99), 1999, pp. 187—194, isbn 0-201-48560-5.
[BV03] Face Recognition Based on Fitting a 3D Morphable Model, IEEE Trans. Pattern Anal. Mach. Intell., (2003), no. 9, 1063—1074, issn 0162-8828.
[BV06] Registration of Expressions Data using a 3D Morphable Model, Journal of Multimedia, (2006), no. 4, 37—45, issn 1796-2048.
[CBC01] Reconstruction and Representation of 3D Objects with Radial Basis Functions, Proc. of ACM SIGGRAPH 2001, 2001, pp. 67—76, isbn 1-58113-374-X.
[HTF01] The Elements of Statistical Learning, Springer, 2001, Springer Series in Statistic, New York, USA, isbn 0-387-95284-5.
[IF03] Implicit Meshes for Modeling and Reconstruction, Proc. Conf. Computer Vision and Pattern Recognition 2003, Volume II, 2003, pp. 483—492, isbn 0-7695-1900-8.
[KS98] Computing geodesic paths on manifolds, Proc. Natl. Acad. Sci. USA, (1998), no. 15, 8431—8435, issn 0027-8424.
[NW99] Numerical Optimization, Springer, 1999, Springer Series in Operations Research, New York, NY, USA, isbn 0-387-98793-2.
[OBS03] A Multi-scale Approach to 3D Scattered Data Interpolation with Compactly Supported Basis Functions, SMI '03: Proceedings of the Shape Modeling International 2003, Washington, DC, USA, IEEE Computer Society, 2003, pp. 153—164, isbn 0-7695-1909-1.
[PFS05] Overview of the Face Recognition Grand Challenge, Computer Vision and Pattern Recognition (CVPR 2005), Los Alamitos, CA, USA, June 2005, , pp. 947—954, IEEE Computer Society, issn 1063-6919.
[SSB05] Support Vector Machines for 3D Shape Processing, Computer Graphics Forum, (2005), no. 3, 285—294, issn 0167-7055.
[Tau95] A signal processing approach to fair surface design, SIGGRAPH '95: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, 1995, pp. 351—358, New York, NY, USA, ACM Press, isbn 0-80791-701-4.
[TL94] Zippered polygon meshes from range images, SIGGRAPH '94: Proceedings of the 21st annual conference on Computer graphics and interactive techniques, 1994, pp. 311—318, New York, NY, USA, ACM Press, isbn 0-89791-667-0.
[TO99] Shape Transformation Using Variational Implicit Functions, Proc. of ACM SIGGRAPH 99, 1999, pp. 335—342, isbn 0-201-48560-5.
[YZX04] Mesh Editing with Poisson-Based Gradient Field Manipulation, ACM Transactions on Graphics, (2004), no. 3, 641—648, issn 0730-0301.
Fulltext ¶
- Volltext als PDF ( Size 1.7 MB )
License ¶
Any party may pass on this Work by electronic means and make it available for download under the terms and conditions of the Digital Peer Publishing License. The text of the license may be accessed and retrieved at http://www.dipp.nrw.de/lizenzen/dppl/dppl/DPPL_v2_en_06-2004.html.
Recommended citation ¶
Curzio Basso, and Alessandro Verri, Fitting 3D morphable models using implicit representations. JVRB - Journal of Virtual Reality and Broadcasting, 4(2007), no. 18. (urn:nbn:de:0009-6-12799)
Please provide the exact URL and date of your last visit when citing this article.