Principal curves and surfaces are nonlinear generalizations of principal components and subspaces, respectively. They can provide insightful summary of high-dimensional data not typically attainable by classical linear methods. They were first defined by Trevor Hastie and Werner Stuetzle as "self-consistent" smooth curves which pass through the "middle" of a d-dimensional probability distribution or data cloud. Good biobliography on the subject is available at http://www.iro.umontreal.ca/~kegl/research/pcurves/ .
We present a novel algorithm to construct principal surfaces using methaphor of elasticity. The picture on the right symbolizes the idea behind the algorithm. The small points are datapoints, the big ones are a grid approximation of a principal curve. We define an elastic energy of such system and propose an effective algorithm to minimize it. The methodology of principal curves and surfaces construction that we propose has several advantages: 1) it is "natural"; 2) it is fast; 3) it is flexible and allows many variations and adaptations; 4) it allows easily constructing surfaces with any dimension and topology. |
PCA View View on the non-linear manifold
1) as a stand-alone full-featured tool VidaExpert fot multidimensional data visualization.
2) as a C++ package for fast construction of fine grid approximations to principal surfaces.
Binaries:
The sources are available upon request.