Imagine the search for the optimal unmixing matrix W to take place in an abstract landscape of potential matrices W. This theoretical landscape has troughs and (local and global) peaks, corresponding to matrices W that result in low and high temporal independence of the components, respectively. The ICA algorithm now searches for the highest summit (or “global maximum”) in this landscape of potential matrices W. While at the beginning W will not be optimal, with each step ICA will ascend and identify a more optimal unmixing matrix W. This stepwise optimization is generally referred to as stochastic gradient ascent. Optimization is limited to a certain number of steps (after which the search is terminated without success), but stops as soon as the summit in the landscape has been reached. In this case the optimal unmixing matrix W which maximizes temporal independence between components has been identified.
Most importantly, ICA completely neglects the time-course of the variables. Instead, statistical independence implies that the distributions, or probability density functions (pdfs) of two random variables y1 and y2 are completely unrelated to each other. Knowing the values of variable y1 does not tell anything about the values of variable y2, and vice versa:
p(y1,y2) = p1(y1) p2(y2)
i.e. “joint pdf of y1 and y2” = “mutial pdfs of y1 and y2“
While the joint pdfs represent the combined probability density of both variables, the marginal pdfs represent the probability density of each variable alone. Two variables y1 and y2 are statistically independent if and only if the joint pdf can be generated by multiplication of the two marginal pdfs.
This definition can be extended for any number N of random variables. The joint pdf must then simply be a product of N terms. Mathematical proof can be found in Hyvärinen & Oja (2000). We can now extend the definition to the core property of independent random variables. Assuming two functions h1 and h2, the expectation E can be expressed as:
E[h1(y1) h2(y2)] = E[h1(y1)] E[h2(y2)]
This formula is crucial for independence since it states that two random variables are independent not only if the distributions of the actual variables are independent, but further that independence also applies no matter what function is applied to the variables, including all moments of the distribution.
This feature of independent variables to be independent irrespective of the applied function is the central distinction to uncorrelatedness, which is a weaker form of independence. Two random variables y1 and y2 are said to be uncorrelated, if their covariance (i.e., the second moment of the joint pdf) is zero:
E[y1,y2] – E[y1] E[y2] = 0
If two variables are independent, they are also uncorrelated. However, if two variables are uncorrelated, they do not necessarily have to be independent. While correlation only refers to the second moment (covariance) of the joint pdf, independence involves all moments. For this reason, independence places stronger constraints on the data than uncorrelatedness (Hyvärinen and Oja 2000). While computing all moments of a distribution is of course not possible in real life, the computation of kurtosis, which is based on the computation of the fourth moment, has been shown to be sufficient.
kurtosis (y) = E[y4] – 3
Kurtosis is a classical measure of non-Gaussianity of a distribution. The less Gaussian the distribution, the higher the absolute value of kurtosis. While a Gaussian has a kurtosis of zero, platykurtic (sub-Gaussian) distributions with broad and wide peaks have negative kurtosis, and leptokurtic (super-Gaussian) distributions with acute and narrow peaks have positive kurtosis.
Since signal mixtures usually neither look like any of the pure signals but instead have a more Gaussian distribution (Central Limit Theorem), ICA examines signals with respect to their kurtosis, or non-Gaussianity, where maximum positive or negative kurtosis reflects maximal independence. This implies that ICA can only successfully unmix non-Gaussian signals (see the requirements for ICA). The good thing is that pure signals such as music, EEG voltage amplitudes or speech indeed are highly super-Gaussian. By contrast, highly sub-Gaussian distributions generally reflect AC (alternating current) or DC (direct current) noise, e.g., induced by screen currents, electrical machinery, lighting fixtures, or loose electrode contacts (Delorme et al. 2007).
So coming back to our landscape of potential unmixing matrices W, the criterion which is maximized actually is the kurtosis or another associated measure of non-Gaussianity of all components. Which criterion in detail is evaluated varies across ICA algorithms (the ICA algorithms implemented in BrainVision Analyzer are characterized in more detail below).