From: W. Trevor King Date: Sat, 16 Mar 2013 17:02:53 +0000 (-0400) Subject: posts:factor_analysis: Add a post on Factor Analysis X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=0994361dee518fecf917fa9dcdde5bce520aa459;p=blog.git posts:factor_analysis: Add a post on Factor Analysis This is the statistical approach used to analyze Software Carpentry surveys, so I've been figuring out how it works. --- diff --git a/posts/Factor_analysis.mdwn_itex b/posts/Factor_analysis.mdwn_itex new file mode 100644 index 0000000..f48295e --- /dev/null +++ b/posts/Factor_analysis.mdwn_itex @@ -0,0 +1,279 @@ +I've been trying to wrap my head around [factor analysis][FA] as a +theory for designing and understanding test and survey results. This +has turned out to be [[another|Gumbel-Fisher-Tippett_distributions]] +one of those fields where the going has been a bit rough. I think the +key factors in making these older topics difficult are: + +* “Everybody knows this, so we don't need to write up the details.” +* “Hey, I can do better than Bob if I just tweak this knob…” + +The resulting discussion ends up being overly complicated, and it's +hard for newcomers to decide if people using similar terminology are +in fact talking about the same thing. + +Some of the better open sources for background has been [Tucker and +MacCallum's “Exploratory Factor Analysis” manuscript][TM] and [Max +Welling's notes][MW]. I'll use Welling's terminology for this +discussion. + +The basic idea of factor analsys is to model $d$ measurable attributes +as generated by $k < d$ common factors and $d$ unique factors. +With $n = 4$ and $k = 2$, you get something like: + +[[!img factors.png + alt="Relationships between factors and measured attributes" + caption="Relationships between factors and measured attributes + (adapted from Tucker and MacCallum's Figure 1.2)" + ]] + +Corresponding to the equation ([Welling's eq. 1][MW-FA]): + +\[ + \mathbf{x} = \mathbf{A}\mathbf{y} + \mathbf{\mu} + \mathbf{\nu} +\] + +The independent random variables $\mathbf{y}$ are distributed +according to a Gaussian with zero mean and unit +variance $\mathcal{G}_\mathbf{y}[0,\mathbf{I}]$ (zero mean because +constant offsets are handled by $\mathbf{\mu}$; unit variance becase +scaling is handled by $\mathbf{A}$). The independent random +variables $\mathbf{\nu}$ are distributed according +to $\mathcal{G}_\mathbf{\nu}[0,\mathbf{\Sigma}]$, with (Welling's +eq. 2): + +\[ + \mathbf{\Sigma} \equiv \text{diag}[\sigma_1^2, \ldots, \sigma_d^2] +\] + +Because the only source of constant offset is $\mathbf{\mu}$, we can +calculate it by averaging out the random noise (Welling's eq. 6): + +\[ + \mathbf{\mu} = \frac{1}{N} \sum_{n=1}^N \mathbf{x}_n +\] + +where $N$ is the number of measurements (survey responders) +and $\mathbf{x}_n$ is the response vector for the $n^\text{th}$ +responder. + +How do we find $\mathbf{A}$ and $\mathbf{\Sigma}$? This is the tricky +bit, and there are a number of possible approaches. Welling suggests +using expectation maximization (EM), and there's an excellent example +of the procedure with a colorblind experimenter drawing colored balls +in his [EM notes][EM] (to test my understanding, I wrote +[[color-ball.py]]). + +To simplify calculations, Welling defines (before eq. 15): + +\[ +\begin{aligned} + \mathbf{A}' &\equiv [\mathbf{A}, \mathbf{\mu}] \\ + \mathbf{y}' &\equiv [\mathbf{y}^T, 1]^T +\end{aligned} +\] + +which reduce the model to + +\[ + \mathbf{x} = \mathbf{A}'\mathbf{y}' + \mathbf{\nu} +\] + +After some manipulation Welling works out the maximizing updates +(eq'ns 16 and 17): + +\[ +\begin{aligned} + \mathbf{A}'^\text{new} + &= \left( \sum_{n=1}^N \mathbf{x}_n + \mathbf{E}[\mathbf{y}'|\mathbf{x}_n]^T \right) + \left( \sum_{n=1}^N \mathbf{x}_n + \mathbf{E}[\mathbf{y}'\mathbf{y}'^T|\mathbf{x}_n] + \right)^{-1} \\ + \mathbf{\Sigma}^\text{new} + &= \frac{1}{N}\sum_{n=1}^N + \text{diag}[\mathbf{x}_n\mathbf{x}_n^T - + \mathbf{A}'^\text{new} + \mathbf{E}[\mathbf{y}'|\mathbf{x}_n]\mathbf{x}_n^T] +\end{aligned} +\] + +The expectation values used in these updates are given by (Welling's +eq'ns 12 and 13): + +\[ +\begin{aligned} + \mathbf{E}[\mathbf{y}|\mathbf{x}_n] + &= \mathbf{A}^T (\mathbf{A}\mathbf{A}^T + \mathbf{\Sigma})^{-1} + (\mathbf{x}_n - \mathbf{\mu}) \\ + \mathbf{E}[\mathbf{y}\mathbf{y}^T|\mathbf{x}_n] + &= \mathbf{I} - + \mathbf{A}^T (\mathbf{A}\mathbf{A}^T + \mathbf{\Sigma})^{-1} \mathbf{A} + + \mathbf{E}[\mathbf{y}|\mathbf{x}_n] \mathbf{E}[\mathbf{y}|\mathbf{x}_n]^T +\end{aligned} +\] + +Survey analysis +=============== + +Enough abstraction! Let's look at an example: [survey +results][survey]: + + >>> import numpy + >>> scores = numpy.genfromtxt('Factor_analysis/survey.data', delimiter='\t') + >>> scores + array([[ 1., 3., 4., 6., 7., 2., 4., 5.], + [ 2., 3., 4., 3., 4., 6., 7., 6.], + [ 4., 5., 6., 7., 7., 2., 3., 4.], + [ 3., 4., 5., 6., 7., 3., 5., 4.], + [ 2., 5., 5., 5., 6., 2., 4., 5.], + [ 3., 4., 6., 7., 7., 4., 3., 5.], + [ 2., 3., 6., 4., 5., 4., 4., 4.], + [ 1., 3., 4., 5., 6., 3., 3., 4.], + [ 3., 3., 5., 6., 6., 4., 4., 3.], + [ 4., 4., 5., 6., 7., 4., 3., 4.], + [ 2., 3., 6., 7., 5., 4., 4., 4.], + [ 2., 3., 5., 7., 6., 3., 3., 3.]]) + +`scores[i,j]` is the answer the `i`th respondent gave for the `j`th +question. We're looking for underlying factors that can explain +covariance between the different questions. Do the question answers +($\mathbf{x}$) represent some underlying factors ($\mathbf{y}$)? +Let's start off by calculating $\mathbf{\mu}$: + + >>> def print_row(row): + ... print(' '.join('{: 0.2f}'.format(x) for x in row)) + >>> mu = scores.mean(axis=0) + >>> print_row(mu) + 2.42 3.58 5.08 5.75 6.08 3.42 3.92 4.25 + +Next we need priors for $\mathbf{A}$ and $\mathbf{\Sigma}$. [[MDP]] +has an implementation for [[Python]], and their [FANode][] uses a +Gaussian random matrix for $\mathbf{A}$ and the diagonal of the score +covariance for $\mathbf{\Sigma}$. They also use the score covariance +to avoid repeated summations over $n$. + + >>> import mdp + >>> def print_matrix(matrix): + ... for row in matrix: + ... print_row(row) + >>> fa = mdp.nodes.FANode(output_dim=3) + >>> numpy.random.seed(1) # for consistend doctest results + >>> responder_scores = fa(scores) # hidden factors for each responder + >>> print_matrix(responder_scores) + -1.92 -0.45 0.00 + 0.67 1.97 1.96 + 0.70 0.03 -2.00 + 0.29 0.03 -0.60 + -1.02 1.79 -1.43 + 0.82 0.27 -0.23 + -0.07 -0.08 0.82 + -1.38 -0.27 0.48 + 0.79 -1.17 0.50 + 1.59 -0.30 -0.41 + 0.01 -0.48 0.73 + -0.46 -1.34 0.18 + >>> print_row(fa.mu.flat) + 2.42 3.58 5.08 5.75 6.08 3.42 3.92 4.25 + >>> fa.mu.flat == mu # MDP agrees with our earlier calculation + array([ True, True, True, True, True, True, True, True], dtype=bool) + >>> print_matrix(fa.A) # factor weights for each question + 0.80 -0.06 -0.45 + 0.17 0.30 -0.65 + 0.34 -0.13 -0.25 + 0.13 -0.73 -0.64 + 0.02 -0.32 -0.70 + 0.61 0.23 0.86 + 0.08 0.63 0.59 + -0.09 0.67 0.13 + >>> print_row(fa.sigma) # unique noise for each question + 0.04 0.02 0.38 0.55 0.30 0.05 0.48 0.21 + +Because the covariance is unaffected by the +rotation $\mathbf{A}\rightarrow\mathbf{A}\mathbf{R}$, the estimated +weights $\mathbf{A}$ and responder scores $\mathbf{y}$ can be quite +sensitive to the seed priors. The width $\mathbf{\Sigma}$ of the +unique noise $\mathbf{\nu}$ is more robust, because $\mathbf{\Sigma}$ +is unaffected by rotations on $\mathbf{A}$. + +Nomenclature +============ + +
+
$\mathbf{A}_{ij}$
+
The element from the $i^\text{th}$ row and $j^\text{th}$ + column of a matrix $\mathbf{A}$. For example here is a 2-by-3 + matrix terms of components: + +\[ + \mathbf{A} = \begin{pmatrix} + \mathbf{A}_{11} & \mathbf{A}_{12} & \mathbf{A}_{13} \\ + \mathbf{A}_{21} & \mathbf{A}_{22} & \mathbf{A}_{23} + \end{pmatrix} +\] + +
+
$\mathbf{A}^T$
+
The transpose of a matrix (or vector) $\mathbf{A}$. + $\mathbf{A}_{ij}^T=\mathbf{A}_{ji}$
+
$\mathbf{A}^{-1}$
+
The inverse of a matrix $\mathbf{A}$. + $\mathbf{A}^{-1}\dot\mathbf{A}=1$
+
$\text{diag}[\mathbf{A}]$
+
A matrix containing only the diagonal elements of + $\mathbf{A}$, with the off-diagonal values set to zero.
+
$\mathbf{E}[f(\mathbf{x})]$
+
Expectation value for a function $f$ of a random variable + $\mathbf{x}$. If the probability density of $\mathbf{x}$ is + $p(\mathbf{x})$, then $\mathbf{E}[f(\mathbf{x})]=\int d\mathbf{x} + p(\mathbf{x}) f(\mathbf{x})$. For example, + $\mathbf{E}[p(\mathbf{x})]=1$.
+
$\mathbf{\mu}$
+
The mean of a random variable $\mathbf{x}$ is given by + $\mathbf{\mu}=\mathbf{E}[\mathbf{x}]$.
+
$\mathbf{\Sigma}$
+
The covariance of a random variable $\mathbf{x}$ is given by + $\mathbf{\Sigma}=\mathbf{E}[(\mathbf{x}-\mathbf{\mu}) + (\mathbf{x}-\mathbf{\mu})^\mathbf{T}]$. In the factor analysis + model discussed above, $\mathbf{\Sigma}$ is restricted to a + diagonal matrix.
+
$\mathcal{G}_\mathbf{x}[\mu,\mathbf{\Sigma}]$ +
A Gaussian probability density for the random variables + $\mathbf{x}$ with a mean $\mathbf{\mu}$ and a covariance + $\mathbf{\Sigma}$. + +\[ + \mathcal{G}_\mathbf{x}[\mathbf{\mu},\mathbf{\Sigma}] + = \frac{1}{(2\pi)^{\frac{D}{2}}\sqrt{\det[\mathbf{\Sigma}]}} + e^{-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T + \mathbf{\Sigma}^{-1} + (\mathbf{x}-\mathbf{\mu})} +\] + +
+
$p(\mathbf{y}|\mathbf{x})$
+
Probability of $\mathbf{y}$ occurring given that $\mathbf{x}$ + occured. This is commonly used in Bayesian + statistics.
+
$p(\mathbf{x}, \mathbf{y})$
+
Probability of $\mathbf{y}$ and $\mathbf{x}$ occuring + simultaneously (the joint density). + $p(\mathbf{x},\mathbf{y})=p(\mathbf{x}|\mathbf{y})p(\mathbf{y})$
+
+ +Note: if you have trouble viewing some of the more obscure [Unicode][] +used in this post, you might want to install the [STIX fonts][STIX]. + +[FA]: http://en.wikipedia.org/wiki/Factor_analysis +[TM]: http://www.unc.edu/~rcm/book/factornew.htm +[MW]: http://www.ics.uci.edu/~welling/classnotes/classnotes.html +[MW-FA]: http://www.ics.uci.edu/~welling/classnotes/papers_class/LinMod.ps.gz +[survey]: http://web.archive.org/web/20051125011642/http://www.ncl.ac.uk/iss/statistics/docs/factoranalysis.html +[EM]: http://www.ics.uci.edu/~welling/classnotes/papers_class/EM.ps.gz +[FANode]: https://github.com/mdp-toolkit/mdp-toolkit/blob/master/mdp/nodes/em_nodes.py +[Unicode]: http://en.wikipedia.org/wiki/Unicode +[STIX]: http://www.stixfonts.org/ + +[[!tag tags/teaching]] +[[!tag tags/theory]] +[[!tag tags/tools]] diff --git a/posts/Factor_analysis/color-ball.py b/posts/Factor_analysis/color-ball.py new file mode 100755 index 0000000..98fd555 --- /dev/null +++ b/posts/Factor_analysis/color-ball.py @@ -0,0 +1,239 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# +# Copyright (C) 2013 W. Trevor King +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as +# published by the Free Software Foundation, either version 3 of the +# License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with this program. If not, see +# . + +"""Understanding expectation maximization + +You have a bag of red, green, and blue balls, from which you draw N +times with replacement and get n1 red, n2 green, and n3 blue balls. +The probability of any one combination of n1, n2, and n3 is given by +the multinomial distribution: + + p(n1, n2, n3) = N! / (n1! n2! n3!) p1^n1 p2^n2 p3^n3 + +From some outside information, we can parameterize this model in terms +of a single hidden variable p: + + p1 = 1/4 + p2 = 1/4 + p/4 + p3 = 1/2 - p/4 + +If we are red/green colorblind, we only measure + + m1 = n1 + n2 + m2 = n3 + +What is p (the hidden variable)? What were n1 and n2? +""" + +import numpy as _numpy + + +class BallBag (object): + """Color-blind ball drawings + """ + def __init__(self, p=1): + self._p = p + self.pvals = [0.25, 0.25 + p/4., 0.5 - p/4.] + + def draw(self, n=10): + """Draw `n` balls from the bag with replacement + + Return (m1, m2), where m1 is the number of red or green balls + and m2 is the number of blue balls. + """ + nvals = _numpy.random.multinomial(n=n, pvals=self.pvals) + m1 = sum(nvals[:2]) # red and green + m2 = nvals[2] + return (m1, m2) + + +class Analyzer (object): + def __init__(self, m1, m2): + self.m1 = m1 + self.m2 = m2 + self.p = self.E_n1 = self.E_n2 = None + + def __call__(self): + pass + + def print_results(self): + print('Results for {}:'.format(type(self).__name__)) + for name,attr in [ + ('p', 'p'), + ('E[n1|m1,m2]', 'E_n1'), + ('E[n2|m1,m2]', 'E_n2'), + ]: + print(' {}: {}'.format(name, getattr(self, attr))) + + +class Naive (Analyzer): + """Simple analysis + + With a large enough sample, the measured m1 and m2 give good + estimates for (p1 + p2) and p3. You can use either of these to + solve for the unknown p, and then solve for E[n1] and E[n2]. + + While this is an easy approach for the colored ball example, it + doesn't generalize well to more complicated models ;). + """ + def __call__(self): + N = self.m1 + self.m2 + p1 = 0.25 + p3 = self.m2 / float(N) + p2 = 1 - p1 - p3 + self.p = 4*p2 - 1 + self.E_n1 = p1 * N + self.E_n2 = p2 * N + + +class MaximumLikelihood (Analyzer): + """Analytical ML estimation + + The likelihood of a general model θ looks like: + + L(x^N,θ) = sum_{n=1}^N log[p(x_n|θ)] + log[p(θ)] + + dropping the p(θ) term and applying to this situation, we have: + + L([m1,m2], p) = N! / (m1! m2!) (p1 + p2)^m1 p3^m2 + = N! / (m1! m2!) (1/2 + p/4)^m1 (1/2 - p/4)^m2 + + which comes from recognizing that to a color-blind experimenter the + three ball colors are effectively two ball colors, so they'll have a + binomial distribution. Maximizing the log-likelihood: + + log[L(m1,m2)] = log[N!…] + m1 log[1/2 + p/4] + m2 log[1/2 - p/4] + d/dp log[L] = m1/(1/2 + p/4)/4 + m2/(1/2 - p/4)/(-4) = 0 + m1 (2 - p) = m2 (2 + p) + 2 m1 - m1 p = 2 m2 + m2 p + (m1 + m2) p = 2 (m1 - m2) + p = 2 (m1 - m2) / (m1 + m2) + + Given this value of p, the the expected values of n1 and n2 are: + + E[n1|m1,m2] = p1 / (p1 + p2) * m1 # from the relative probabilities + = (1/4) / (1/2 + p/4) * m1 + = m1 / (2 + p) + = m1 / [2 + 2 (m1 - m2) / (m1 + m2)] + = m1/2 * (m1 + m2) / (m1 + m2 + m1 - m2) + = m1 * (m1 + m2) / (4 m1) + = (m1 + m2) / 4 + E[n2|m1,m2] = p2 / (p1 + p2) * m1 + = (1/4 + p/4) / (1/2 + p/4) * m1 + = m1 (1 + p) / (2 + p) + = m1 [1 + 2 ((m1 - m2) / (m1 + m2)] / + [2 + 2 (m1 - m2) / (m1 + m2)] + = m1/2 * (m1 + m2 + 2m1 - 2m2) / (m1 + m2 + m1 - m2) + = m1 * (3m1 - m2) / (4 m1) + = (3m1 - m2) / 4 + + So with a draw of m1 = 61 and m2 = 39, the ML estimates are: + + p = 0.44 + E[n1|m1,m2] = 25.0 + E[n2|m1,m2] = 36.0 + """ + def __call__(self): + N = self.m1 + self.m2 + self.p = 2 * (self.m1 - self.m2) / float(N) + self.E_n1 = N / 4. + self.E_n2 = (3*self.m1 - self.m2) / 4. + + +class ExpectationMaximizer (Analyzer): + """Expectation maximization + + Sometimes analytical ML is hard, so instead we iteratively + optimize: + + Q(θ_t|θ_{t-1}) = E[log[p(x,y,θ_t)]|x,θ_{t-1}] + + Applying to this situation, we have: + + Q(p_t|p_{t-1}) = E[log[p([m1,m2],[n1,n2,n3],p_t)]|[m1,m2],p_{t-1}] + + where: + + p(m1,m2,n1,n2,n3,p) = δ(m1-n1-n2)δ(m2-n3)p(n1,n2,n3) + + Plugging in and expanding the log: + + Q(p_t|p_{t-1}) + = E[log[δ(m1…)] + log[δ(m2…)] + log[p(n1,n2,n3,p_t)] + |m1,m2,p_{t-1}] + ≈ E[log[p(n1,n2,n3,p_t)]|m1,m2,p_{t-1}] # drop boring δ terms + ≈ E[log[N!…] + n1 log[1/4] + n2 log[1/4 + p_t/4] + n3 log[1/2 - p_t/4] + |m1,m2,p_{t-1}] + ≈ E[n2 log[1/4 + p_t/4] + n3 log[1/2 - p_t/4] + |m1,m2,p_{t-1}] # drop non-p_t terms + ≈ E[n2|m1,m2,p_{t-1}] log[1/4 + p_t/4] + m2 log[1/2 - p_t/4] + + Maximizing (the M step): + + d/dp_t Q(p_t|p_{t-1}) + ≈ E[n2|m1,m2,p_{t-1}] / (1/4 + p_t/4)/4 + m2 / (1/2 - p_t/4)/(-4) + = 0 + E[n2|m1,m2,p_{t-1}] / (1 + p_t) = m2 / (2 - p_t) + E[n2|m1,m2,p_{t-1}] (2 - p_t) = m2 (1 + p_t) + p_t (E[n2|m1,m2,p_{t-1}] + m2) = 2 E[n2|m1,m2,p_{t-1}] - m2 + p_t = (2 E[n2|m1,m2,p_{t-1}] - m2) / (E[n2|m1,m2,p_{t-1}] + m2) + + To get a value for p_t, we need to evaluate those expectations + (the E step). Using a subset of the ML analysis: + + E[n2|m1,m2,p_{t-1}] = m1 (1 + p_{t-1}) / (2 + p_{t-1}) + """ + def __init__(self, p=0, **kwargs): + super(ExpectationMaximizer, self).__init__(**kwargs) + self.p = 0 # prior belief + + def _E_step(self): + """Caculate E[ni|m1,m2,p_{t-1}] given the prior parameter p_{t-1} + """ + return { + 'E_n1': m1 / (2. + self.p), + 'E_n2': m1 * (1. + self.p) / (2. + self.p), + 'E_n3': m2, + } + + def _M_step(self, E_n1, E_n2, E_n3): + "Maximize Q(p_t|p_{t-1}) over p_t" + self.p = (2.*E_n2 - self.m2) / (E_n2 + self.m2) + + def __call__(self, n=10): + for i in range(n): + print(' estimated p{}: {}'.format(i, self.p)) + Es = self._E_step() + self._M_step(**Es) + for key,value in self._E_step().items(): + setattr(self, key, value) + + +if __name__ == '__main__': + p = 0.6 + bag = BallBag(p=p) + m1,m2 = bag.draw(n=100) + print('estimate p using m1 = {} and m2 = {}'.format(m1, m2)) + for analyzer in [ + ExpectationMaximizer(m1=m1, m2=m2), + MaximumLikelihood(m1=m1, m2=m2), + Naive(m1=m1, m2=m2), + ]: + analyzer() + analyzer.print_results() diff --git a/posts/Factor_analysis/factors.dia b/posts/Factor_analysis/factors.dia new file mode 100644 index 0000000..78973e9 --- /dev/null +++ b/posts/Factor_analysis/factors.dia @@ -0,0 +1,1444 @@ + + + + + + + + + + + + + #A4# + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ## + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + #Common factors (y)# + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + #Specific factors (μ) # + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + #Measurement error factors (ν)# + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + #Unique factors (μ + ν)# + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + #Surface attributes (x)# + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + #Factor weights (A)# + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + #Addition (+) ;)# + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/posts/Factor_analysis/factors.png b/posts/Factor_analysis/factors.png new file mode 100644 index 0000000..fb7ab11 Binary files /dev/null and b/posts/Factor_analysis/factors.png differ diff --git a/posts/Factor_analysis/survey.data b/posts/Factor_analysis/survey.data new file mode 100644 index 0000000..51fbb7a --- /dev/null +++ b/posts/Factor_analysis/survey.data @@ -0,0 +1,15 @@ +#cost quality availability quantity respectability prestige experience popularity +# 1 = not important, 7 = very important +# http://web.archive.org/web/20051125011642/http://www.ncl.ac.uk/iss/statistics/docs/factoranalysis.html +1 3 4 6 7 2 4 5 +2 3 4 3 4 6 7 6 +4 5 6 7 7 2 3 4 +3 4 5 6 7 3 5 4 +2 5 5 5 6 2 4 5 +3 4 6 7 7 4 3 5 +2 3 6 4 5 4 4 4 +1 3 4 5 6 3 3 4 +3 3 5 6 6 4 4 3 +4 4 5 6 7 4 3 4 +2 3 6 7 5 4 4 4 +2 3 5 7 6 3 3 3