A Statistical Model of Human Pose and Body Shape
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
EUROGRAPHICS 2009 / P. Dutré and M. Stamminger Volume 28 (2009), Number 2 (Guest Editors) A Statistical Model of Human Pose and Body Shape N. Hasler1 , C. Stoll1 , M. Sunkel1 , B. Rosenhahn2 , and H.-P. Seidel1 1 MPI Informatik, Saarbrücken 2 University of Hannover Abstract Generation and animation of realistic humans is an essential part of many projects in today’s media industry. Especially, the games and special effects industry heavily depend on realistic human animation. In this work a unified model that describes both, human pose and body shape is introduced which allows us to accurately model muscle deformations not only as a function of pose but also dependent on the physique of the subject. Coupled with the model’s ability to generate arbitrary human body shapes, it severely simplifies the generation of highly realistic character animations. A learning based approach is trained on approximately 550 full body 3D laser scans taken of 114 subjects. Scan registration is performed using a non-rigid deformation technique. Then, a rotation invariant encoding of the acquired exemplars permits the computation of a statistical model that simultaneously encodes pose and body shape. Finally, morphing or generating meshes according to several constraints simultaneously can be achieved by training semantically meaningful regressors. Categories and Subject Descriptors (according to ACM CCS): Computer Graphics [I.3.7]: Three-Dimensional Graphics and Realism—Computer Graphics [I.3.5]: Computational Geometry and Object Modeling— 1. Introduction The automatic generation and animation of realistic hu- mans is an increasingly important discipline in the computer graphics community. The applications of a system that si- multaneously models pose and body shape include crowd generation for movie or game projects, creation of custom avatars for online shopping or games, or usability testing of virtual prototypes. But also other problems such as hu- Figure 1: Muscle bulging is not only a function of the un- man tracking or even biometric applications can benefit from derlying skeleton but is instead closely correlated with the such a model. physique of the subject. Realistic results for human animation can be obtained by simulating the tissue deformation on top of modeled skele- tal bones [SPCM97, DCKY02]. This approach has been re- jor difficulties with most previously suggested methods like searched extensively but involves a lot of manual modeling SCAPE [ASK∗ 05] or [ACP03, SMT04, WPP07, WSLG07] since not only the surface but also the bones, muscles, and is that they rely on different means for encoding shape and other tissues have to be designed. Additionally, these meth- pose. Pose (and the effects thereof, e.g., muscle bulging) are ods tend to be computationally expensive since they involve stored with the help of an underlying skeleton while body physically based tissue simulation. shape is encoded using variational methods or envelope skin- ning. During animation, the outcomes of the two methods In order to reduce the required amount of manual mod- have to be combined in an additional step. eling, several systems have been proposed that attempt to create general human models from 3D scans. One of the ma- Allen et al., on the other hand, presented a method that submitted to EUROGRAPHICS 2009.
2 N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape Figure 2: A few examples of scans included in our database. learns skinning weights for corrective enveloping from 3D- the body. For example, the encoding of a hand remains un- scan data [ACPH06]. They propose to use a maximum a changed even if the arm is raised and rotated between two posteriori estimation for solving a highly nonlinear func- scans. A similar encoding has recently been proposed by tion which simultaneously describes pose, skinning weights, Kircher and Garland [KG08] in the context of mesh editing. bone parameters, and vertex positions. The authors are able The resulting shape coefficients and their variances are to change weight or height of a character during animation analyzed using a statistical method. Similar to Blanz et and the muscle deformation looks significantly more real- al. [BV99], regression functions are trained to correlate them istic than linear blend skinning. However, since this func- to semantically significant values like weight, body fat con- tion has a high number of degrees of freedom, the optimiza- tent, or pose. Allen et al. [ACP03] uses a simple linear tion procedure is very expensive. Additionally, the number method to generate models conforming to a set of seman- of support poses that can be computed per subject is limited tic constraints. Allen et al. do not show quantitative analy- by the complexity of the approach, which in turn bounds the sis of the accuracy of their morphing functions. In contrast achievable realism. Seo and Magnenat-Thalmann describe a method for gener- A major benefit of the shared encoding is that it is eas- ating human bodies given a number of high level semantic ily possible to encode correlations between pose and body constraints [SMT04] and evaluate the accuracy of linear re- shape, e.g., the body surface deformation generated by the gression based morphing functions. In Section 5 we present motion performed by an athletic person exhibits different a quantitative analysis comparing linear and non-linear re- properties than the same motion carried out by a person with gression functions for various body measures. Scherbaum less pronounced skeletal muscles (cf. Fig. 1). et al. [SSSB07] have concluded in the context of face mor- phing that although non-linear regression functions are nu- The model we propose is based on statistical analysis of merically more accurate, the visual difference to the linear over 550 full body 3D scans taken of 114 subjects. Addition- counterpart is minimal. ally, all subjects are measured with a commercially available impedance spectroscopy body fat scale and a medical grade Thus, morphing a scan to conform to a given set of con- pulse oximeter. straints only involves solving a single linear equation system in the minimum norm sense. We present several applications A sophisticated semi-automatic non-rigid model registra- of our model, including animation of skeleton data produced tion technique is used to bring the scans into semantic cor- by a motion capture system (Sec. 5.4), fitting of 3D scan respondence, i.e. to fit a template model to every scan. This data (Sec. 2), and generation of realistic avatars given only a approach is similar to [ARV07]. The registration ultimately sparse number of semantic constraints (Sec. 5.2). leads to a semantically unified description of all scans. I.e. the same vertices denote the same semantic position on the The primary contributions can be summarized as follows: surface of a scan. • A large statistical model of human bodies that jointly en- Models based on statistical analysis of human body shape codes pose and body shape is introduced. The primary ad- that operate directly on vertex positions have the distinct vantage of this approach is that correlations between body disadvantage that body parts that are rotated have a com- shape and pose are encoded in addition to the pure infor- pletely differently representation. For this reason most pre- mation on pose and shape. vious techniques [ASK∗ 05, SMT04, WPP07, WSLG07] en- • A locally translation and rotation invariant encoding for code the surface relative to an embedded skeleton. Instead, the meshes is introduced. This fundamentally non-linear we opt to use an encoding that is locally translation and ro- transformation subsequently allows us to compute a linear tation invariant in the sense that a body part retains its exact statistical model on the data. encoding if it is translated or rotated relative to the rest of • A quantitative evaluation of linear and non-linear regres- submitted to EUROGRAPHICS 2009.
N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape 3 sors for morphing according to semantically meaningful constraints. Additionally, an extension to a semi-automatic non-rigid registration scheme is proposed that establishes semantic correspondence between the scans, a handle based paradigm for modeling human body shapes is presented, and our database of approx. 550 full body 3D scans is made avail- able for scientific purposes [Has08]. The rest of the paper is structured as follows. In the fol- lowing Section the scan database and the registration scheme that brings scans into semantic correspondence are intro- duced. The rotation invariant model representation is de- scribed in Section 3. The regression framework is presented in Section 4, applications and experimental evaluation re- Figure 3: Every dot marks a scan taken from a subject in sults are shown in Section 5, and a brief summary is given in a given pose. Top: SCAPE-like approach - only one subject Section 6. is scanned in different poses Bottom: Our model covers the space of body shapes more densely. 2. Scan Database & Registration The procedure to register all scans from the database has to be robust and almost fully automatic since the massive num- easily. Furthermore, sex, age, and self-declared fitness level ber of scans does not allow tedious manual intervention to are noted. Likewise, a number of measures are captured with be performed on every scan. We have consequently opted a commercially available impedance spectroscopy body fat for a two stage process. First, skeleton-based deformation of scale, namely weight, body fat percentage, percentage of the template is used to estimate pose and rough size of the muscle tissue, water content, and bone weight. We also use scanned subject. Then a non-rigid deformation technique is a medical grade pulse oximeter to capture the oxygenation employed to fit the template to the scanned point cloud. This of the subjects’ hemoglobin and their pulse. The scans are is necessary and desirable since the stability of non-rigid reg- arranged in a database Ss,p , where s is the subject identifier istration increases significantly when the initial mesh config- and p the pose, containing the points and their respective uration is close to the point cloud that is to be matched. For normals as generated by the scanner. This database is avail- complex poses (cf. Fig. 2) direct non-rigid deformation con- able to the scientific public [Has08]. verges very slowly. In contrast, due to the limited number In order to create a unified model of all the captured data, of degrees of freedom, a skeleton based method converges the scans have to be brought into semantic correspondence. quickly even in extreme cases. Using skeleton fitting for a The simplest and most common way to achieve this is to fit first estimate also allows us to use the resulting data for pose a single template model to every scan (cf. [BV99, ACP03, regression. SMT04, ASK∗ 05]). The process depends only on a few manually placed cor- We create a symmetric template by triangulating a 3D respondences for each scan as the scans do not feature any scan, enforcing symmetry by hand and manually fitting a prescribed landmarks as present for example in the CAE- skeleton into the model. Skinning is performed using the SAR database [RDP99]. The template, an example of a la- technique by Baran and Popović [BP07]. The template t can beled scan, the skeleton based fitting, and the final registra- be parameterized by the parameter vector Y consisting of n tion result are shown in Figure 4. points Pt = pt1 , · · · , ptn and l triangles Tt = tt1 , · · · , ttl . 2.1. Scan Database 2.2. Skeleton Based Pose Estimation The project is based on a database of dense full body 3D The skeleton based fitting employs an approach commonly scans, captured using a Vitronic laser scanner. Of the 114 used in marker-less motion capture systems [BM98]. Any subjects aged 17 to 61, 59 are male and 55 female. In ad- rigid body motion can be modeled as a single rotation around dition to one standard pose that allows the creation of a a chosen axis followed by a suitable translation. Together, shape-only model, all subjects are scanned in at least 9 the transformation can be stored as a twist ξ with 6 degrees poses selected randomly from a set of 34 poses, see Fig. 3 of freedom. See Murray et al. [MSZ94] for mathematical for the scan distribution. Unlike SCAPE or more recently properties and a more in depth description. The deformation [ACPH06] we sample the space more densely. This allows of the template model is additionally governed by a kine- our model to capture pose-body-shape correlations more matic chain with k joints which arises from the embedded submitted to EUROGRAPHICS 2009.
4 N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape plate vertices, but as a projection onto a plane fitted to a lo- cal patch of the target. Amberg et al. are forced to handle vertices bordering on holes in the target surface specially to avoid artefacts. This is not necessary if the inverse procedure is used and allows us to skip the border labeling step. To find this projection, we first find the closest mesh ver- tex pti for each target surface point psj . We discard matches where the angle between the respective normals are above a threshold εn = 30 ◦ or the distance between the points is big- ger than εd = 50 mm. We now go through all mesh vertices Figure 4: The registration pipeline. Left to right: The tem- pti and gather the set Pi of all points that were matched to plate model, the result after pose fitting, the result after non- it. We then use a least-squares procedure to fit a local plane rigid registration, transfer of captured surface details, and to them and project the point pti onto that plane, resulting in the original scan annotated with manually selected land- a target point p̃ti unless Pi contains fewer than 4 points. In marks are shown. the latter case the set is discarded and we assign the closest point of the surface psj as the target position p̃ti unless this point also fails the requirements given above. Our distance energy term can now simply be expressed as skeleton. Only simple revolute joints are considered, which can be parameterized by a single angle γi . By parameterizing 2 Ed (X) = ∑ kp̃ti − Ti pti k (3) the complete pose of a person as a vector Ξ = [ξ, γ1 . . . γk ] we can easily generate a linear system of constraint equations to The regularization term Es (X) in Amberg’s original paper optimize the pose. An ICP style optimisation scheme similar is expressed as the Frobenius-Norm of the transformation to Bregler et al. [BMP04] is used that generates up to three matrices of neighboring vertices in the mesh. This original constraint equations per point of the template surface. Ad- term does not take into account irregular sampling of the ditionally, the manually selected landmark coordinates are mesh, and thus may exhibit some artifacts. Additionally, in used to ensure global stability of the fitting process. The re- our case missing data needs to be extrapolated only in lo- sults from this step are used on the one hand as training data calized regions where we have holes in the scan. In our ex- to learn regression functions that aim to modify the pose of periments we therefore confine the second order differences a subject. On the other hand, the extracted pose can be used of the transformation matrices by applying a Laplacian con- to initialize the non-rigid registration technique described in straint with cotangent edge weights. This regularization term the following. tries to make changes in the transformation matrices over the mesh as smooth as possible, and not as similar as possible as originally proposed. The regularization term can be written 2.3. Non-Rigid Registration as The posed template from Section 2.2 is used as initialization Es (X) = ∑ k ∑ wi j (Ti − T j )k F (4) for a more detailed non-rigid registration step, that captures j the remaining details of the current scan. The procedure fol- lows the ideas presented in the work by Allen et al. [ACP03] where wi j are the cotangent Laplacian weights based on the and Amberg et al. [ARV07]. The registration is expressed as original template mesh configuration and kkF denotes the a set of 3 × 4 affine transformation matrices Ti associated Frobenius Norm. with each vertex pti of the posed template, which are orga- The final term El (X) is a simple landmark distance term: nized in a single 4n × 3 matrix 2 El (X) = ∑ kl̃ti − Ti pti k (5) X = [T . . . Tn ]> (1) We define the cost function E(X) of our deformation as a The general registration procedure follows the work of combination of three energy terms: Ed (X), which penalizes Amberg and coworkers. The energy term (2) can be written distance between template and target surface, Es (X), which as a linear system and solved in the least-squares sense for acts as a regularization term to generate a smooth deforma- a given configuration. This process is iterated tmax = 1500 tion, and finally El (X), which is a simple landmark distance times, during which we change the energy function weight- term, ing terms in the following way: β = k0 · eλt (6) E(X) = αEd (X) + βEs (X) + γEl (X). (2) where t is the number of the iteration, k0 = 3000, and Unlike Amberg et al., we do not express the distance term k0 using the closest point of the target surface from the tem- λ = ln /tmax (7) k∞ submitted to EUROGRAPHICS 2009.
N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape 5 with k∞ = 0.01. Additionally, α can be kept constant at 1 and γ = kγ · β with kγ = 2. 3. Model Representation It is useful to encode the registered models in such a way that the relevant differences between a pair of scans can easily be extracted. For example, if the subject raises an arm between two considered scans, we want the representation of the hand of the person to be identical, although both position and ro- Figure 5: Due to the relative rotation encoding (RRE) di- tation of the hand relative to the main body have changed. rect linear interpolation of two scans (left and right) results A common method to achieve this is to embed a skeleton in realistic intermediate poses (middle right) whereas linear into the model and encode the surface relative to the skele- interpolation of vertex positions fails as can be seen, e.g., in ton [ASK∗ 05]. This method, however, is forced to resort to the subject’s degenerated right arm (middle left). interpolation when a given surface point cannot be assigned uniquely to a single bone of the skeleton. Additionally, the correlation between pose and body shape becomes harder to Kircher and Garland [KG08] have introduced a similar rep- capture. So in the following we describe an encoding that resentation for editing mesh animations. allows us to describe both pose and body shape in a unified way. As we show in Section 5.1, this non-linear transforma- Reconstructing a mesh from this encoding involves solv- tion allows us to work with linear functions for modifying ing two sparse linear systems. First we need to reconstruct pose or body shape without significant loss of accuracy. Ui = Ri Si . Then a Poisson reconstruction yields the com- plete mesh. Creating the per-triangle rotations Ri from the Translational invariance can easily be achieved by using relative rotations Ri, j requires solving a sparse linear sys- variational approaches, as for example introduced by Yu et tem, which can be created by reordering Equation (8). For al. [YZX∗ 04] or Sorkine et al. [SCOL∗ 04]. The mesh can every set of neighbouring triangles equations of the form be reconstructed given only the original connectivity and the triangle gradients by solving a sparse linear Poisson system. Ri, j · R j − Ri = 0 (9) We can also edit and modify the shape by ‘exploding’ it, are added to a sparse linear equation system. As long as the applying an arbitrary transformation to each triangle sepa- model is encoded and decoded without modification, the ro- rately. If we then solve the so modified linear system, we tations Ri we receive from this system are identical to the effectively stitch the triangles back together in such a way input up to floating point accuracy and a global rotation that the prescribed triangles transformations are maintained Rg . However, if any modification is applied to the encoded as good as possible. This fundamental idea has been used for model, the resulting matrices Ri are not necessarily pure ro- shape editing (as for example in [ZRKS05]), but also forms tation matrices but may contain scale or shear components. the basis of SCAPE [ASK∗ 05]. To improve the stability of the reconstruction we perform Unfortunately, this variational representation is not matrix ortho-normalization of the resulting Ri using singu- rotational-invariant, meaning that the same shape will be en- lar value decomposition. coded differently depending on its orientation. To remedy Now that a reversible procedure for encoding a model in this, we encode each triangle as a transformation Ui relative a locally rotation invariant way has been described, we can to a rest-pose triangle ti . This transformation can be split up think about how exactly the different components of the de- into a rotation Ri and a remaining stretching deformation scription are represented. The main requirement of the en- Si using polar-decomposition. The stretching deformation is coding is that linear interpolation leads to intact represen- by construction rotation-invariant, which means that we only tations. Shear matrices are already in a suitable format if need to construct a relative encoding for the rotation matri- only one half of the symmetric matrices are stored. Rotation ces Ri . This can be achieved by storing relative rotations be- matrices, however, are badly suited for direct interpolation. tween pairs of neighboring triangles, i.e. Evaluation of a number of different encodings leads directly Ri, j = Ri · R−1 (8) to rotation vectors because this representation allows easy j interpolation and unlike quaternions all possible combina- where i and j are neighboring triangles. So, for every triangle tions of values are valid and, in contrast to Euler Angles, the three relative rotations connecting it with its neighbors can encoding does not suffer from gimbal lock [Ple89]. Addi- be generated. This encoding may seem wasteful as three ro- tionally, in order to further linearise the encoding space, all tations are stored instead of just one but this redundancy sig- parameters are stored relative to the corresponding triangle nificantly improves the stability of the reconstruction when of the mean model, which is constructed by averaging all a deformation is applied to the encoded model. Recently, components of all models in the relative encoding. The re- submitted to EUROGRAPHICS 2009.
6 N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape sulting final encoding has 15 degrees of freedom per triangle However, these components correspond to the eigenvec- (nine for rotation and six for in-plane deformation). tors with the smallest variation in the space of body shapes and poses and not the variation of l(). Consider, for ex- The advantage of this complex representation is that it is ample, a function that subtly changes the shape of a per- hard to generate inconsistent meshes and that many com- son’s kneecaps. This function primarily depends on certain mon deformations, namely scaling during shape morphing components that probably correspond to low eigenvalues be- and rotation during pose modification, are linear operations. cause they do not dramatically impact the whole body. So, This improves the quality of trained regression functions sig- when solving Equation (10) the coefficients of s for these nificantly and allows us to use linear regressors without vis- components are comparably high, whereas the main compo- ible loss of quality. As shown in Figure 5 it is even possible nents primarily concerned with changing height or weight to linearly interpolate between two poses of a subject and are low. obtain realistic results. Consequently, we propose to perform cross validation Unfortunately, high frequency information, as present for slightly differently. The order in which components are dis- example in the wrinkles of the pants the subjects are wear- carded is not determined by their variance in the space ing, is very hard to represent with our model. Subjects some- spanned by A but by their correlation with l() which is in- times adjust the fit of the pants between scans and the in- dicated by the magnitude of the initial components of s. A tra subject variance of wrinkles is even higher. So we opted comparison of cross validation accuracy for both methods to use a simple detail transfer procedure to re-add high fre- plus a non-linear Support Vector Regression based technique quency information after morphing, similar to displacement is shown in Table 1. The accuracy of our filtered approach is subdivision surfaces by Lee et al. [LMH00]. This step im- very close to the precision of the non-linear approach. proves firstly the accuracy of the estimated regression func- tions as noise is removed, secondly the efficiency since Obviously, morphing a subject d to conform to a semantic the computationally intensive steps operate on lower qual- constraint k just involves walking along the function’s gra- ity meshes, and thirdly the visual quality as high frequency dient d0 = d + v, where v = λs for a suitable λ. For several information is retained during morphing instead of getting semantic constraints, the solution is slightly more involved smoothed out. It works as follows: After fitting, the base since the gradients are not necessarily orthogonal. Basically mesh is subdivided using the simple mid-edge scheme and we want to find a new gradient z, such that projected onto the scan. The offsets of the subdivision ver- hz, vi i tices in normal direction of the base mesh are stored with = kvi k. (11) kvi k each triangle. During recall the mesh is subdivided again and Equation (11) is used to generate one row for every con- the stored offsets are added. straint vi . Solving the frequently underdeterminded system for z in the minimum norm sense yields a solution that 4. Regression changes the subject as little as possible. In Section 3 a model jointly encoding human pose and shape 4.1. Semantic Model Basis has been presented. In this section regression is introduced as a powerful tool to incorporate further information such It is interesting to rotate the original PCA basis such that as gender, height, or joint angle into the model to obtain a the first vectors correspond to semantically meaningful di- fully statistical and morphable model. Starting from a set of rections since this allows morphing a subject while keep- encoded scans A = [a1 . . . ai ] and a function l() discretised ing some constraints constant. For example, increasing body at the positions of the scans, attaching semantic values l = height normally results in additional weight. However, by [l1 . . . li ]> to every scan, we compute a gradient direction s keeping weight constant while increasing height results in and a corresponding offset o such that slimmer subjects. The Gram-Schmidt algorithm [GVL96] is used to span o arg min [1|A] −l (10) the subspace of the original PCA space such that it is or- [os] s thogonal to all given semantic morphing vectors. Then PCA This function can simply be computed in a least-squares is used to generate a basis for the remaining subspace. The sense. However, the achievable generalisation is limited, new basis and all morphing vectors now span the original since the function is severely overtrained. Much better gen- PCA space. Thereafter, all scans are transformed to the new eralization performance can be achieved if the number of base. The reconstruction of these models using only the sub- components is reduced. Cross validation provides a well es- space not spanned by semantic variables leads to a represen- tablished means to select the optimal number of components. tation in which all models are invariant to the semantic con- The classical approach discards components in the order the straint variables. The PCA of these reconstructions in com- PCA suggests, starting with the elements corresponding to bination with the semantic constraint vectors represents the the smallest eigenvalues. final basis. The desirable properties of this basis are that the submitted to EUROGRAPHICS 2009.
N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape 7 Regressor Height Weight Body Fat Muscles Waist simple linear 3.68 3.03 3.38 5.63 2.78 filtered linear 1.43 1.33 2.02 2.41 0.945 non-linear 1.15 1.17 1.66 2.38 0.858 Table 1: Root mean squared errors estimated by 10-fold cross validation for different semantic variables and regres- sors on a model that contains every subject exactly once. Weight is measured in kg, Body Fat, and Muscles in %, and Height and Waist Girth in cm. Figure 6: The effects of applying the muscledness function. first vectors directly represent semantically meaningful gra- Left to right: Original, a selective mask was applied to in- dient directions and the remaining human body shape space crease only upper body muscles, full body muscle augmen- is spanned by a PCA basis. By applying Gram-Schmidt on tation, and an extremely muscular caricature. the semantic vectors as well and normalizing their lengths, we can enforce orthonormality of the transformation. A mix- ing matrix and scale factors can be used to directly specify Height Weight Body Fat Waist Girth semantically meaningful constraints. However, in most cases µ 2.04 1.47 2.78 2.00 this step is unnecessary because the semantic constraints are σ 0.688 0.520 0.853 0.715 held constant at specific values and solving of linear systems Table 2: Mean and standard deviation of angles (in degrees) is only performed on the remaining subspace. So minimum between morphing directions computed by the non-linear norm solutions behave as expected. model for all subjects. 5. Applications and Results In this section applications of our model are presented. First cannot easily be evaluated quantitatively. For example, in the foundation of our model is validated by quantitatively order to define a muscledness function the subjects in the evaluating the accuracy of different semantics based mor- database have to be labelled somehow. Yet, it is hard for a phing strategies. Then character modeling schemes are de- human judge to assign a number to the muscledness of a per- scribed and last our approach for animating characters based son. It is much easier to compare two given scans and decide on skeletal angles is introduced. on the more muscled subject. Each random pairing of scans defines a gradient direction. The judge only chooses the sign of the gradient towards greater muscularity. By first normal- 5.1. Semantic Morphing izing and then averaging the gradients, a general muscularity In line with research conducted by Allen et al. [ACP03] function can be generated. Results of applying the function and Seo and Magnenat-Thalmann [SMT04] we present body are shown in Figure 6. shape morphing driven by high level semantic variables. We Since morphing functions can operate directly on the rel- also conduct quantitative analysis of the accuracy of the ative encoding, it is trivial to constrain morphing to selected trained functions. In Table 1 mean squared errors generated body parts. A multiplicative mask allows deformation to oc- by 10-fold cross validation of different semantic functions cur in selected areas. The reconstruction process spreads out are presented. It is doubtful that changes within the range the error arising at the edge of the selected region evenly, of these error bounds would be perceptible. The data also preventing the development of steps in the surface. Figure 6 shows that, as a result of the non-linear relative rotation en- shows selective morphing of the upper body using the mus- coding, the achievable accuracy is only slightly better when cledness function. using non-linear rather than linear regression. This assump- tion is confirmed by the observations summarised in Table 2. Mean and standard deviation of the angles between morph- 5.2. Character Generation ing directions for selected functions computed for all sub- As shown recently, it is essential for the perception of the jects in the shape only model are shown. All of these val- diversity of a crowd that the body shapes of the characters ues are small indicating that morphing directions are highly differ significantly [MLD∗ 08]. It is consequently important collinear independent of the position in shape space. Appar- to have a simple method that allows the generation of di- ently, the relative rotation encoding linearises the space suf- verse body shapes. Yet, it may also be important to be able to ficiently so that it is now admissible to use a linear function tightly control a generated character’s body shape. Employ- to represent the changes. ing our model both objectives can be achieved by combining Unfortunately, many semantically meaningful functions two different approaches. submitted to EUROGRAPHICS 2009.
8 N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape Figure 7: Several women were randomly generated using the semantic basis. We applied the constraints sex = female and weight = 65 kg. As expected the taller the woman the slimmer she is. Figure 8: An example of a randomly generated set of char- The PCA projects the largest variances of a dataset in acters. the first components while noise like features are displaced to the last components. This is a most welcome feature for many applications. For the purpose of generating a random jectories is a simple option. In this section a different ap- character that exhibits a unique physique while creating a proach is introduced. By adding moveable handles to the natural, human look, a PCA based shape-only model is the body model a very intuitive way of changing body shape tool of choice. We use a model consisting only of scans of is established. This avenue is opened by our use of Pois- subjects in the resting pose. The model that contains all scans son reconstruction in the last step which allows the inclusion cannot be used for this purpose since it contains pose depen- of additional positional constraints. The deformation that is dent components in the first PCA vectors. So, a randomly required to conform to the constraints is distributed evenly. generated character with the limited model would exhibit In fact, mesh editing has been performed using this tech- not just changes in shape but also in pose. Yet, for animation nique [YZX∗ 04]. However, we want the constraints to influ- the generated characters can be plugged directly into the full ence the body shape realistically instead of just deforming model as described in Section 5.4. the initial shape in the least squares sense because this in- The advantage of the PCA based technique is that the evitably leads to unrealistic distortions. This, however, can diversity of the generated characters is very high. On the easily be mended by projecting the candidate body model downside, control over the type of generated character is into the PCA space spanned by the body shape database fairly low. Neither gender nor body weight or height can since primarily valid body models can be represented in this easily be controlled as the PCA vectors do not, unlike often space. It may thus not be possible to represent the proposed assumed, directly pertain to a single semantically relevant model in shape-space and constraints are not met exactly. measure. We can, however, take a given starting point, gen- So, we iterate between deforming the model and project- erated e.g. with the above technique and apply the morphing ing it back into the space of body shapes. After about 10 described in Section 4 to enforce a given set of constraints. iterations the system converges. Still, the method acts as a Unfortunately, this approach, although workable, may pro- very strong regularizer, so that constraints are frequently not duce suboptimal results since the morphed distance may be met exactly and it may be necessary to exaggerate them to large, introducing artifacts on the way. achieve a desired effect. A simple editing session is shown in Figure 9. It is much more efficient to use the semantic basis intro- duced in Section 4.1. This technique allows us to specify se- mantic constraints such as height between 1.70 m and 1.90 m 5.4. Animation and sex between 0.9 and 1.1 male and allow the system to fill in the details. Ultimately, this approach is used to generate Every scan’s pose is estimated during registration. Thus we the models displayed in Figure 7 whereas Figure 8 shows a can easily train functions that each change a specific degree crowd that was generated without constraints. of freedom of the pose. That way, we can morph any scan into a specified pose. Since animations are frequently pa- rameterised by a set of joint angles for every frame, we can 5.3. Handle Based Body Shape Modeling simply morph a given model to conform to these constraints to animate it. This works well for most joints and poses. Additional adjustments may be deemed necessary by the responsible artist. Obviously morphing along semantic tra- However, improvements are possible if two minor issues submitted to EUROGRAPHICS 2009.
N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape 9 Figure 9: A handle based interface for editing body shapes. The red markers are held in place while the yellow arrows Figure 10: By adding positional constraints during recon- show where attached markers are moved. First the height is struction of deformed models it is possible to ameliorate ac- increased, then hip width is decreased, and last the crotch is cumulated pose errors. Here the effect of using positional raised. constraints is demonstrated. Left: No constraints, Middle: Constraints, Right: Reference Pose. are addressed. Some functions’ areas of influence are not lo- calised to the expected area but also include areas on the mirrored side of the body. This is a result of the choice of scanned poses. In most poses arm movements are symmet- ric. Fortunately, the effect can be compensated for by com- puting some functions, namely all functions concerning arm movement, only on one side of the body. Similarly, it proved beneficial to split PCA vectors into left and right halfs dur- ing reprojection of candidate models to assist independent motion of left and right arms. Furthermore, absolute positioning accuracy of end effec- tors can be improved by following Wang et al. [WPP07] who propose to add positional constraints to the tips of limbs dur- ing the poisson reconstruction step. This step also serves to Figure 11: Animation result. The subject on the right is correct correlations that were wrongly learned. For exam- crossing his legs. This is significant as no subject in the ple, in one of the poses the subjects stand on one leg (cf. database has been scanned in a similar pose. Fig. 2). In order to keep balance and not to move for the 10 s it takes to perform the scan a very unrelaxed upper body posture was commonly adopted by the subjects. This has led to undesired correlations. Also note that these artifacts can- not be prevented unless fast 3D-scanning of moving subjects is performed, which is not available today at a comparable accuracy. A side-by-side comparison of using end effector constraints vs. not using them is shown in Figure 10. Fig- ure 11 shows several frames from an animation. In one of the frames the subject crosses his legs. This is significant as no pose of a scan in the database is even close to the dis- played pose. A subject is morphed into a pose that is not in the database and compared to a scan in a similar pose in Figure 12: left: Morphing a scan into a pose that is not Fig. 12. Additionally, a person who is not in the database is in the database in comparison with the original scan. The represented by our model. hands are not turned into the same direction which results in the main difference between the two. Middle: The scan of a person who is not in the database is projected into the space 6. Conclusions and Future Work of body shapes. Right: The subject on the left is morphed We describe a model of human pose and body shape. A ro- into the spear-thrower’s pose. tation invariant encoding of the scan database allows us to submitted to EUROGRAPHICS 2009.
10 N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel / A Model of Pose and Body Shape train semantic regression functions. These in turn can be [BV99] B LANZ V., V ETTER T.: A morphable model for the syn- used to generate and animate human avatars. Since pose and thesis of 3d faces. In ACM SIGGRAPH Papers (New York, NY, USA, 1999), ACM Press, pp. 187–194. body shape are stored in a single model, correlations be- tween them are automatically exploited to induce realistic [DCKY02] D ONG F., C LAPWORTHY G. J., K ROKOS M. A., YAO J.: An anatomy-based approach to human muscle modeling muscle bulging and fat deformation during animation. and deformation. IEEE Trans. on Vis. and Comp. Graphics 8, 2 Comparison of linear and non-linear regression shows that (2002), 154–170. the numerical cross-validation errors are slightly lower for [GVL96] G OLUB G. H., VAN L OAN C. F.: Matrix Computations non-linear regression but the visual quality is equivalent. So (Johns Hopkins Studies in Mathematical Sciences). The Johns Hopkins University Press, Oct. 1996. for display oriented applications linear regression is suffi- cient. [Has08] H ASLER N.: http://www.mpi-inf.mpg.de/ resources/scandb/, Dec. 2008. A designer can perform semantic morphing either on the [KG08] K IRCHER S., G ARLAND M.: Free-form motion process- whole body or by applying a simple mask to selected body ing. ACM Trans. on Graphics 27, 2 (2008), 1–13. parts. This allows the fine grained generation of realistic hu- [LMH00] L EE A., M ORETON H., H OPPE H.: Displaced subdivi- man models. Additionally, a handle based body shape edit- sion surfaces. In ACM SIGGRAPH Papers (New York, NY, USA, ing paradigm provides the artist with an intuitive tool that 2000), ACM Press, pp. 85–94. seamlessly integrates with semantic constraint based editing. [MLD∗ 08] M C D ONNELL R., L ARKIN M., D OBBYN S., C OLLINS S., O’S ULLIVAN C.: Clone attack! perception of As most of our system is implemented in Matlab the per- crowd variety. ACM Trans. on Graphics 27, 3 (2008), 1–8. formance is not optimal. Reconstructing one model takes ap- [MSZ94] M URRAY R. M., S ASTRY S. S., Z EXIANG L.: A Math- prox. 20 s on a recent machine. However, a more efficient ematical Introduction to Robotic Manipulation. CRC Press, Inc., implementation and possibly a hierarchical multi-resolution Boca Raton, FL, USA, 1994. approach could improve the performance significantly. [Ple89] P LETINCKX D.: Quaternion calculus as a basic tool in computer graphics. The Visual Computer 5, 1 (Jan. 1989), 2–13. It may also be interesting to estimate body shapes from images as this would open a number of applications for ex- [RDP99] ROBINETTE K., DAANEN H., PAQUET E.: The caesar project: a 3-d surface anthropometry survey. In Proc. 3-D Digital ample in the human motion capture field or in biometric Imaging and Modeling (1999), pp. 380–386. identification. [SCOL∗ 04] S ORKINE O., C OHEN -O R D., L IPMAN Y., A LEXA M., R ÖSSL C., S EIDEL H.-P.: Laplacian surface editing. In Acknowledgements Proc. Symposium on Geometry Processing (New York, NY, USA, 2004), ACM Press, pp. 175–184. We would like to thank Michael Wand and the anonymous [SMT04] S EO H., M AGNENAT-T HALMANN N.: An example- reviewers for valuable feedback and all subjects who volun- based approach to human body manipulation. Graphical Models teered to get scanned. 66, 1 (January 2004), 1–23. [SPCM97] S CHEEPERS F., PARENT R. E., C ARLSON W. E., M AY S. F.: Anatomy-based modeling of the human muscula- References ture. In ACM SIGGRAPH Papers (New York, NY, USA, 1997), [ACP03] A LLEN B., C URLESS B., P OPOVI Ć Z.: The space of ACM Press, pp. 163–172. human body shapes: reconstruction and parameterization from [SSSB07] S CHERBAUM K., S UNKEL M., S EIDEL H.-P., B LANZ range scans. ACM Trans. on Graphics 22, 3 (2003), 587–594. V.: Prediction of individual non-linear aging trajectories of faces. [ACPH06] A LLEN B., C URLESS B., P OPOVI Ć Z., H ERTZMANN In Proc. Eurographics (Prague, Czech Republic, 2007), vol. 26 of A.: Learning a correlated model of identity and pose-dependent Computer Graphics Forum, Blackwell, pp. 285–294. body shape variation for real-time synthesis. In Proc. SCA [WPP07] WANG R. Y., P ULLI K., P OPOVI Ć J.: Real-time en- (2006), pp. 147–156. veloping with rotational regression. In ACM SIGGRAPH Papers [ARV07] A MBERG B., ROMDHANI S., V ETTER T.: Optimal (New York, NY, USA, 2007), ACM Press, p. 73. step nonrigid icp algorithms for surface registration. Proc. CVPR [WSLG07] W EBER O., S ORKINE O., L IPMAN Y., G OTSMAN (June 2007), 1–8. C.: Context-aware skeletal shape deformation. Computer Graph- [ASK∗ 05] A NGUELOV D., S RINIVASAN P., KOLLER D., ics Forum 26, 3 (Sept. 2007), 265–274. T HRUN S., RODGERS J., DAVIS J.: Scape: shape completion [YZX∗ 04] Y U Y., Z HOU K., X U D., S HI X., BAO H., G UO B., and animation of people. ACM Trans. on Graphics 24, 3 (2005), S HUM H.-Y.: Mesh editing with poisson-based gradient field 408–416. manipulation. In ACM SIGGRAPH Papers (New York, NY, USA, [BM98] B REGLER C., M ALIK J.: Tracking people with twists 2004), ACM Press, pp. 644–651. and exponential maps. In Proc. CVPR (Washington, DC, USA, [ZRKS05] Z AYER R., R ÖSSL C., K ARNI Z., S EIDEL H.-P.: 1998), IEEE Computer Society, p. 8. Harmonic guidance for surface deformation. In Proc. Eurograph- [BMP04] B REGLER C., M ALIK J., P ULLEN K.: Twist based ac- ics (Dublin, Ireland, 2005), vol. 24 of Computer Graphics Forum, quisition and tracking of animal and human kinematics. Interna- Eurographics, Blackwell, pp. 601–609. tional Journal of Computer Vision 56, 3 (2004), 179–194. [BP07] BARAN I., P OPOVI Ć J.: Automatic rigging and animation of 3d characters. ACM Trans. on Graphics 26, 3 (2007), 72. submitted to EUROGRAPHICS 2009.
You can also read