| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402 | ////  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.////  By downloading, copying, installing or using the software you agree to this license.//  If you do not agree to this license, do not download, install,//  copy or use the software.//////                          License Agreement//                For Open Source Computer Vision Library//// Copyright (C) 2014, OpenCV Foundation, all rights reserved.// Third party copyrights are property of their respective owners.//// Redistribution and use in source and binary forms, with or without modification,// are permitted provided that the following conditions are met:////   * Redistribution's of source code must retain the above copyright notice,//     this list of conditions and the following disclaimer.////   * Redistribution's in binary form must reproduce the above copyright notice,//     this list of conditions and the following disclaimer in the documentation//     and/or other materials provided with the distribution.////   * The name of the copyright holders may not be used to endorse or promote products//     derived from this software without specific prior written permission.//// This software is provided by the copyright holders and contributors "as is" and// any express or implied warranties, including, but not limited to, the implied// warranties of merchantability and fitness for a particular purpose are disclaimed.// In no event shall the Intel Corporation or contributors be liable for any direct,// indirect, incidental, special, exemplary, or consequential damages// (including, but not limited to, procurement of substitute goods or services;// loss of use, data, or profits; or business interruption) however caused// and on any theory of liability, whether in contract, strict liability,// or tort (including negligence or otherwise) arising in any way out of// the use of this software, even if advised of the possibility of such damage.#ifndef __OPENCV_SURFACE_MATCHING_HPP__#define __OPENCV_SURFACE_MATCHING_HPP__#include "surface_matching/ppf_match_3d.hpp"#include "surface_matching/icp.hpp"/** @defgroup surface_matching Surface MatchingNote about the License and Patents-----------------------------------The following patents have been issued for methods embodied in thissoftware: "Recognition and pose determination of 3D objects in 3D scenesusing geometric point pair descriptors and the generalized HoughTransform", Bertram Heinrich Drost, Markus Ulrich, EP Patent 2385483(Nov. 21, 2012), assignee: MVTec Software GmbH, 81675 Muenchen(Germany); "Recognition and pose determination of 3D objects in 3Dscenes", Bertram Heinrich Drost, Markus Ulrich, US Patent 8830229 (Sept.9, 2014), assignee: MVTec Software GmbH, 81675 Muenchen (Germany).Further patents are pending. For further details, contact MVTec SoftwareGmbH (info@mvtec.com).Note that restrictions imposed by these patents (and possibly others)exist independently of and may be in conflict with the freedoms grantedin this license, which refers to copyright of the program, not patentsfor any methods that it implements.  Both copyright and patent law mustbe obeyed to legally use and redistribute this program and it is not thepurpose of this license to induce you to infringe any patents or otherproperty right claims or to contest validity of any such claims.  If youredistribute or use the program, then this license merely protects youfrom committing copyright infringement.  It does not protect you fromcommitting patent infringement.  So, before you do anything with thisprogram, make sure that you have permission to do so not merely in termsof copyright, but also in terms of patent law.Please note that this license is not to be understood as a guaranteeeither.  If you use the program according to this license, but inconflict with patent law, it does not mean that the licensor will refundyou for any losses that you incur if you are sued for your patentinfringement.Introduction to Surface Matching--------------------------------Cameras and similar devices with the capability of sensation of 3D structure are becoming morecommon. Thus, using depth and intensity information for matching 3D objects (or parts) are ofcrucial importance for computer vision. Applications range from industrial control to guidingeveryday actions for visually impaired people. The task in recognition and pose estimation in rangeimages aims to identify and localize a queried 3D free-form object by matching it to the acquireddatabase.From an industrial perspective, enabling robots to automatically locate and pick up randomly placedand oriented objects from a bin is an important challenge in factory automation, replacing tediousand heavy manual labor. A system should be able to recognize and locate objects with a predefinedshape and estimate the position with the precision necessary for a gripping robot to pick it up.This is where vision guided robotics takes the stage. Similar tools are also capable of guidingrobots (and even people) through unstructured environments, leading to automated navigation. Theseproperties make 3D matching from point clouds a ubiquitous necessity. Within this context, I willnow describe the OpenCV implementation of a 3D object recognition and pose estimation algorithmusing 3D features.Surface Matching Algorithm Through 3D Features----------------------------------------------The state of the algorithms in order to achieve the task 3D matching is heavily based on@cite drost2010, which is one of the first and main practical methods presented in this area. Theapproach is composed of extracting 3D feature points randomly from depth images or generic pointclouds, indexing them and later in runtime querying them efficiently. Only the 3D structure isconsidered, and a trivial hash table is used for feature queries.While being fully aware that utilization of the nice CAD model structure in order to achieve a smartpoint sampling, I will be leaving that aside now in order to respect the generalizability of themethods (Typically for such algorithms training on a CAD model is not needed, and a point cloudwould be sufficient). Below is the outline of the entire algorithm:As explained, the algorithm relies on the extraction and indexing of point pair features, which aredefined as follows:\f[\bf{{F}}(\bf{{m1}}, \bf{{m2}}) = (||\bf{{d}}||_2, <(\bf{{n1}},\bf{{d}}), <(\bf{{n2}},\bf{{d}}), <(\bf{{n1}},\bf{{n2}}))\f]where \f$\bf{{m1}}\f$ and \f$\bf{{m2}}\f$ are feature two selected points on the model (or scene),\f$\bf{{d}}\f$ is the difference vector, \f$\bf{{n1}}\f$ and \f$\bf{{n2}}\f$ are the normals at \f$\bf{{m1}}\f$ and\f$\bf{m2}\f$. During the training stage, this vector is quantized, indexed. In the test stage, samefeatures are extracted from the scene and compared to the database. With a few tricks likeseparation of the rotational components, the pose estimation part can also be made efficient (checkthe reference for more details). A Hough-like voting and clustering is employed to estimate theobject pose. To cluster the poses, the raw pose hypotheses are sorted in decreasing order of thenumber of votes. From the highest vote, a new cluster is created. If the next pose hypothesis isclose to one of the existing clusters, the hypothesis is added to the cluster and the cluster centeris updated as the average of the pose hypotheses within the cluster. If the next hypothesis is notclose to any of the clusters, it creates a new cluster. The proximity testing is done with fixedthresholds in translation and rotation. Distance computation and averaging for translation areperformed in the 3D Euclidean space, while those for rotation are performed using quaternionrepresentation. After clustering, the clusters are sorted in decreasing order of the total number ofvotes which determines confidence of the estimated poses.This pose is further refined using \f$ICP\f$ in order to obtain the final pose.PPF presented above depends largely on robust computation of angles between 3D vectors. Even thoughnot reported in the paper, the naive way of doing this (\f$\theta = cos^{-1}({\bf{a}}\cdot{\bf{b}})\f$remains numerically unstable. A better way to do this is then use inverse tangents, like:\f[<(\bf{n1},\bf{n2})=tan^{-1}(||{\bf{n1}  \wedge \bf{n2}}||_2, \bf{n1} \cdot \bf{n2})\f]Rough Computation of Object Pose Given PPF------------------------------------------Let me summarize the following notation:-   \f$p^i_m\f$: \f$i^{th}\f$ point of the model (\f$p^j_m\f$ accordingly)-   \f$n^i_m\f$: Normal of the \f$i^{th}\f$ point of the model (\f$n^j_m\f$ accordingly)-   \f$p^i_s\f$: \f$i^{th}\f$ point of the scene (\f$p^j_s\f$ accordingly)-   \f$n^i_s\f$: Normal of the \f$i^{th}\f$ point of the scene (\f$n^j_s\f$ accordingly)-   \f$T_{m\rightarrow g}\f$: The transformation required to translate \f$p^i_m\f$ to the origin and rotate    its normal \f$n^i_m\f$ onto the \f$x\f$-axis.-   \f$R_{m\rightarrow g}\f$: Rotational component of \f$T_{m\rightarrow g}\f$.-   \f$t_{m\rightarrow g}\f$: Translational component of \f$T_{m\rightarrow g}\f$.-   \f$(p^i_m)^{'}\f$: \f$i^{th}\f$ point of the model transformed by \f$T_{m\rightarrow g}\f$. (\f$(p^j_m)^{'}\f$    accordingly).-   \f${\bf{R_{m\rightarrow g}}}\f$: Axis angle representation of rotation \f$R_{m\rightarrow g}\f$.-   \f$\theta_{m\rightarrow g}\f$: The angular component of the axis angle representation    \f${\bf{R_{m\rightarrow g}}}\f$.The transformation in a point pair feature is computed by first finding the transformation\f$T_{m\rightarrow g}\f$ from the first point, and applying the same transformation to the second one.Transforming each point, together with the normal, to the ground plane leaves us with an angle tofind out, during a comparison with a new point pair.We could now simply start writing\f[(p^i_m)^{'} = T_{m\rightarrow g} p^i_m\f]where\f[T_{m\rightarrow g} = -t_{m\rightarrow g}R_{m\rightarrow g}\f]Note that this is nothing but a stacked transformation. The translational component\f$t_{m\rightarrow g}\f$ reads\f[t_{m\rightarrow g} = -R_{m\rightarrow g}p^i_m\f]and the rotational being\f[\theta_{m\rightarrow g} = \cos^{-1}(n^i_m \cdot {\bf{x}})\\ {\bf{R_{m\rightarrow g}}} = n^i_m \wedge {\bf{x}}\f]in axis angle format. Note that bold refers to the vector form. After this transformation, thefeature vectors of the model are registered onto the ground plane X and the angle with respect to\f$x=0\f$ is called \f$\alpha_m\f$. Similarly, for the scene, it is called \f$\alpha_s\f$.### Hough-like Voting SchemeAs shown in the outline, PPF (point pair features) are extracted from the model, quantized, storedin the hashtable and indexed, during the training stage. During the runtime however, the similaroperation is perfomed on the input scene with the exception that this time a similarity lookup overthe hashtable is performed, instead of an insertion. This lookup also allows us to compute atransformation to the ground plane for the scene pairs. After this point, computing the rotationalcomponent of the pose reduces to computation of the difference \f$\alpha=\alpha_m-\alpha_s\f$. Thiscomponent carries the cue about the object pose. A Hough-like voting scheme is performed over thelocal model coordinate vector and \f$\alpha\f$. The highest poses achieved for every scene point lets usrecover the object pose.### Source Code for PPF Matching~~~{cpp}// pc is the loaded point cloud of the model// (Nx6) and pcTest is a loaded point cloud of// the scene (Mx6)ppf_match_3d::PPF3DDetector detector(0.03, 0.05);detector.trainModel(pc);vector<Pose3DPtr> results;detector.match(pcTest, results, 1.0/10.0, 0.05);cout << "Poses: " << endl;// print the posesfor (size_t i=0; i<results.size(); i++){    Pose3DPtr pose = results[i];    cout << "Pose Result " << i << endl;    pose->printPose();}~~~Pose Registration via ICP-------------------------The matching process terminates with the attainment of the pose. However, due to the multiplematching points, erroneous hypothesis, pose averaging and etc. such pose is very open to noise andmany times is far from being perfect. Although the visual results obtained in that stage arepleasing, the quantitative evaluation shows \f$~10\f$ degrees variation (error), which is an acceptablelevel of matching. Many times, the requirement might be set well beyond this margin and it isdesired to refine the computed pose.Furthermore, in typical RGBD scenes and point clouds, 3D structure can capture only less than halfof the model due to the visibility in the scene. Therefore, a robust pose refinement algorithm,which can register occluded and partially visible shapes quickly and correctly is not an unrealisticwish.At this point, a trivial option would be to use the well known iterative closest point algorithm .However, utilization of the basic ICP leads to slow convergence, bad registration, outliersensitivity and failure to register partial shapes. Thus, it is definitely not suited to theproblem. For this reason, many variants have been proposed . Different variants contribute todifferent stages of the pose estimation process.ICP is composed of \f$6\f$ stages and the improvements I propose for each stage is summarized below.### SamplingTo improve convergence speed and computation time, it is common to use less points than the modelactually has. However, sampling the correct points to register is an issue in itself. The naive waywould be to sample uniformly and hope to get a reasonable subset. More smarter ways try to identifythe critical points, which are found to highly contribute to the registration process. Gelfand et.al. exploit the covariance matrix in order to constrain the eigenspace, so that a set of pointswhich affect both translation and rotation are used. This is a clever way of subsampling, which Iwill optionally be using in the implementation.### Correspondence SearchAs the name implies, this step is actually the assignment of the points in the data and the model ina closest point fashion. Correct assignments will lead to a correct pose, where wrong assignmentsstrongly degrade the result. In general, KD-trees are used in the search of nearest neighbors, toincrease the speed. However this is not an optimality guarantee and many times causes wrong pointsto be matched. Luckily the assignments are corrected over iterations.To overcome some of the limitations, Picky ICP @cite pickyicp and BC-ICP (ICP using bi-uniquecorrespondences) are two well-known methods. Picky ICP first finds the correspondences in theold-fashioned way and then among the resulting corresponding pairs, if more than one scene point\f$p_i\f$ is assigned to the same model point \f$m_j\f$, it selects \f$p_i\f$ that corresponds to the minimumdistance. BC-ICP on the other hand, allows multiple correspondences first and then resolves theassignments by establishing bi-unique correspondences. It also defines a novel no-correspondenceoutlier, which intrinsically eases the process of identifying outliers.For reference, both methods are used. Because P-ICP is a bit faster, with not-so-significantperformance drawback, it will be the method of choice in refinment of correspondences.### Weighting of PairsIn my implementation, I currently do not use a weighting scheme. But the common approaches involve*normal compatibility* (\f$w_i=n^1_i\cdot n^2_j\f$) or assigning lower weights to point pairs withgreater distances (\f$w=1-\frac{||dist(m_i,s_i)||_2}{dist_{max}}\f$).### Rejection of PairsThe rejections are done using a dynamic thresholding based on a robust estimate of the standarddeviation. In other words, in each iteration, I find the MAD estimate of the Std. Dev. I denote thisas \f$mad_i\f$. I reject the pairs with distances \f$d_i>\tau mad_i\f$. Here \f$\tau\f$ is the threshold ofrejection and by default set to \f$3\f$. The weighting is applied prior to Picky refinement, explainedin the previous stage.### Error MetricAs described in , a linearization of point to plane as in @cite koklimlow error metric is used. Thisboth speeds up the registration process and improves convergence.### MinimizationEven though many non-linear optimizers (such as Levenberg Mardquardt) are proposed, due to thelinearization in the previous step, pose estimation reduces to solving a linear system of equations.This is what I do exactly using cv::solve with DECOMP_SVD option.### ICP AlgorithmHaving described the steps above, here I summarize the layout of the ICP algorithm.#### Efficient ICP Through Point Cloud PyramidsWhile the up-to-now-proposed variants deal well with some outliers and bad initializations, theyrequire significant number of iterations. Yet, multi-resolution scheme can help reducing the numberof iterations by allowing the registration to start from a coarse level and propagate to the lowerand finer levels. Such approach both improves the performances and enhances the runtime.The search is done through multiple levels, in a hierarchical fashion. The registration starts witha very coarse set of samples of the model. Iteratively, the points are densified and sought. Aftereach iteration the previously estimated pose is used as an initial pose and refined with the ICP.#### Visual Results##### Results on Synthetic DataIn all of the results, the pose is initiated by PPF and the rest is left as:\f$[\theta_x, \theta_y, \theta_z, t_x, t_y, t_z]=[0]\f$### Source Code for Pose Refinement Using ICP~~~{cpp}ICP icp(200, 0.001f, 2.5f, 8);// Using the previously declared pc and pcTest// This will perform registration for every pose// contained in resultsicp.registerModelToScene(pc, pcTest, results);// results now contain the refined poses~~~Results-------This section is dedicated to the results of surface matching (point-pair-feature matching and afollowing ICP refinement):Matches of different models for Mian dataset is presented below:You might checkout the video on [youTube here](http://www.youtube.com/watch?v=uFnqLFznuZU).A Complete Sample-----------------### Parameter TuningSurface matching module treats its parameters relative to the model diameter (diameter of the axisparallel bounding box), whenever it can. This makes the parameters independent from the model size.This is why, both model and scene cloud were subsampled such that all points have a minimum distanceof \f$RelativeSamplingStep*DimensionRange\f$, where \f$DimensionRange\f$ is the distance along a givendimension. All three dimensions are sampled in similar manner. For example, if\f$RelativeSamplingStep\f$ is set to 0.05 and the diameter of model is 1m (1000mm), the points sampledfrom the object's surface will be approximately 50 mm apart. From another point of view, if thesampling RelativeSamplingStep is set to 0.05, at most \f$20x20x20 = 8000\f$ model points are generated(depending on how the model fills in the volume). Consequently this results in at most 8000x8000pairs. In practice, because the models are not uniformly distributed over a rectangular prism, muchless points are to be expected. Decreasing this value, results in more model points and thus a moreaccurate representation. However, note that number of point pair features to be computed is nowquadratically increased as the complexity is O(N\^2). This is especially a concern for 32 bitsystems, where large models can easily overshoot the available memory. Typically, values in therange of 0.025 - 0.05 seem adequate for most of the applications, where the default value is 0.03.(Note that there is a difference in this paremeter with the one presented in @cite drost2010 . In@cite drost2010 a uniform cuboid is used for quantization and model diameter is used for reference ofsampling. In my implementation, the cuboid is a rectangular prism, and each dimension is quantizedindependently. I do not take reference from the diameter but along the individual dimensions.It would very wise to remove the outliers from the model and prepare an ideal model initially. Thisis because, the outliers directly affect the relative computations and degrade the matchingaccuracy.During runtime stage, the scene is again sampled by \f$RelativeSamplingStep\f$, as described above.However this time, only a portion of the scene points are used as reference. This portion iscontrolled by the parameter \f$RelativeSceneSampleStep\f$, where\f$SceneSampleStep = (int)(1.0/RelativeSceneSampleStep)\f$. In other words, if the\f$RelativeSceneSampleStep = 1.0/5.0\f$, the subsampled scene will once again be uniformly sampled to1/5 of the number of points. Maximum value of this parameter is 1 and increasing this parameter alsoincreases the stability, but decreases the speed. Again, because of the initial scene-independentrelative sampling, fine tuning this parameter is not a big concern. This would only be an issue whenthe model shape occupies a volume uniformly, or when the model shape is condensed in a tiny placewithin the quantization volume (e.g. The octree representation would have too much empty cells).\f$RelativeDistanceStep\f$ acts as a step of discretization over the hash table. The point pair featuresare quantized to be mapped to the buckets of the hashtable. This discretization involves amultiplication and a casting to the integer. Adjusting RelativeDistanceStep in theory controls thecollision rate. Note that, more collisions on the hashtable results in less accurate estimations.Reducing this parameter increases the affect of quantization but starts to assign non-similar pointpairs to the same bins. Increasing it however, wanes the ability to group the similar pairs.Generally, because during the sampling stage, the training model points are selected uniformly witha distance controlled by RelativeSamplingStep, RelativeDistanceStep is expected to equate to thisvalue. Yet again, values in the range of 0.025-0.05 are sensible. This time however, when the modelis dense, it is not advised to decrease this value. For noisy scenes, the value can be increased toimprove the robustness of the matching against noisy points.*/#endif
 |