Amplitude-Biased And Amplitude-Unbiased Cross-Correlations:

An Illustration With Three Examples



Martin Schimmel



General Motivation:

An important problem in seismology is the unambiguous detection of seismic arrivals to constrain the corresponding Earth structure. Often, difficulties arise because of the variability and abundance of signals in the seismograms. The large amplitude signals are mostly caused by the gross structure of the Earth and although these signals vary in shape, they can be detected due to their outstanding amplitudes. Weak signals, however, are mostly concealed in other signals with similar amplitudes. As a consequence, the weak signals can only be detected because of their coherent appearance on different seismograms or their resemblance with a given reference or pilot wavelet such as the direct P arrival or the imposed ground signal of a vibrating source. The detection, arrival time picking and/or extraction of weak signals require therefore the use of coherence measures. Since the weak signals are more sensitive to waveform perturbations than the large amplitude signals the problem consists in the detection of closely similar waveforms rather than completely coherent waveforms. To cope with this problem a variety of coherence measures have been innovated to evaluate quantitatively the goodness of fit obtained. These measures are mostly based on cross-correlation and stacking techniques which are used widely at various stages of data processing. To enable the detection of weak signals concealed in other larger amplitude signals by waveform semblance the coherence measures should be amplitude unbiased.
 

Cross-Correlations:

The cross-correlation (CC) between two data sets measures their similarity as a function of time shift or time lag. This measure involves the progressively sliding of one waveform past the other and summing the cross-multiplication products over the common time interval of the waveforms. The cross-correlation function will be peaked at time lags when the (closely) similar waveforms are best aligned. Dissimilar waveforms cause the cross-correlation function to have small amplitudes due to the summation of positive and negative cross-multiplication products.

In our paper ( abstract) we extend the concept of the phase stack (Schimmel and Paulssen, 1997)and present a new cross-correlation function which we call phase cross-correlation (PCC). The proposed function can be used as coherence functional, for signal recognition, and arrival time picking. We show synthetic applications which are discussed in comparison with the conventional cross-correlation (CC) and the geometric normalized cross-correlation (CCGN). It is shown that the geometric normalized cross-correlations and phase cross-correlations of closely similar waveforms can lead to different results due to their different design philosophies. The geometrical normalized cross-correlation is insensitive to the amplitude changes between data sets but is biased by the large amplitude portions within the considered correlation windows. Conversely, PCC equally weights every sample in the correlation and is consequently insensitive to the amplitudes within the correlation windows.

In the following we show some examples to illustrate the effect of amplitude biased and unbiased correlations. For more details and figures see the paper by Schimmel (1999). In Schimmel et al. (2011) you can find other examples and comparisons. The goal of this last paper is to show how PCC can improve the seismic ambient noise signal (empirical Green's function) extraction. PCC has also been implemented in the multi-channel cross-correlation approach of VanDecar and Crosson (1990) to compute the cross-correlations between all possible pairs of traces to estimate relative phase arrival times in a LSQ fashion for travel time tomography (Schimmel et al., 2003).
 

Example 1:

a) Waveforms used for cross-correlations. Both traces consist of the same two wave trains (T1-T2 and T2-T3). The last wave train has been shifted by -0.2 s on the bottom trace. The top trace is used to extract two different pilots (T2 to T3 and T1 to T3) which are cross-correlated with the bottom trace. b) Cross-correlograms (top: CCGN, bottom: PCC) for the pilots T2-T3 (solid line) and T1-T3 (dashed line).
 

The different determination of waveform similarity and the importance of the choice of the correlation window are combined in the first example. Figure a shows two traces which consist of two different wave trains each. The only difference between both traces is that the last wave train (T2-T3) of the second trace is shifted by -0.2 s. Two pilots have been extracted from the top trace using the windows T2 to T3 and T1 to T3 for correlation with the bottom trace. The resulting CCGN's and PCC's are shown in Figure b. The solid and dashed line style are used to distinguish the applied pilots from T2-T3 and T1-T3. As can be seen from Figure b, CCGN is hardly affected by the choice of the pilot. The best correlations which are marked with a black dot are obtained for a lag time of -0.2 s, i.e. by shifting the pilot until its second wave train matches with the second wave train of the second trace. Conversely, with the PCC-measure we obtain maximum correlation at 0 s and -0.2 s for the pilot from T1 to T3 and T2 to T3, respectively. This is due to the fact that PCC is amplitude insensitive and consequently determines its best coherence by the maximum number of coherent samples. In other words the zero lag alignment (dashed line) has been favored since it aligns the longest coherent wave train (T1-T2) within the pilot trace. The corresponding CCGN-measure does not favor a zero lag alignment since the large amplitude wave train (T2-T3) dominates the correlation. The small amplitude parts of the correlation window lead, even in the case of coherence, to small amplitude contributions which do not much affect the total correlation. This explains the large resemblance of both CCGN functions. The smaller absolute maxima of the dashed lines indicate the decreased similarity of the cross-correlated waveforms.
 

Example 2:
 

a) Top: Pilot wavelet (solid line) in its full length and trace with distorted waveform (dashed line). The distorted waveform has zero amplitudes outside the time window shown. Bottom: PCC (solid line) and CCGN (dashed line) between the pilot and the distorted signal. Zero lag is marked by the vertical dashed line and corresponds to the relative position of the waveforms at the top. b) - d) show the same as a) for different waveforms. In d) the pilot decreased to the time window marked by the vertical lines.

The different sensitivities of the coherence measures to waveform similarity causes differences in the PCC and CCGN for waveforms which are not perfectly coherent. These differences are shown in the Figure. The traces at the top of each figure show the pilot wavelet (solid line) and the corresponding trace with the  closely similar waveform (dashed line). The pilot wavelet is shown in its full length while the other trace is continued with zero amplitudes outside the time window shown. The cross-correlations are plotted at the bottom of each figure. Zero lag is marked by the vertical dashed line. The lag time corresponds to a shift of the pilot wavelet relative to the other waveform.

 Figures a,b demonstrate that large amplitudes within the correlation window strongly influence the CCGN-measure. The CCGN from Figure a does not permit to decide whether the waveforms are best aligned at zero time lag or for a positive time lag which aligns the waveforms by their absolute maxima or absolute maximum and minimum. In other words, following this measure the pilot and the trace can be aligned in three different manners. The time lag for the alignment of the absolute maxima is marked with a grey dot. Conversely, from the PCC it is obvious that the best coherence is obtained at zero time lag only. The differences between the PCC- and the CCGN-measure are caused by their different concepts. The CCGN measure is based on the sum of cross-multiplication products which is strongest influenced by the largest amplitudes in the wavelets. To show this we further increased the absolute maximum of the dashed waveform from Figure a; the waveforms and corresponding correlograms are shown in Figure b.

As can be seen from this figure, the CCGN now favors the alignment of the waveforms by their absolute maxima. Conversely, PCC still advocates the alignment at zero lag. The PCC value at zero lag, however, decreases which means that the waveforms become less coherent by the modification.

The strong sensitivity of the CCGN to the large amplitudes in the waveforms has also an advantage. For instance, low amplitude noise within the correlation window will have its strongest impact when PCC is used. We show an example in Figures c and d. The PCC-measure is equally sensitive to all perturbations in the wavelet. As consequence, the coherence value at zero lag is smaller than for the CCGN (Figure c). In such a situation the chosen length of the time window of the pilot wavelet becomes important. In Figure d we demonstrate the results for a decreased time window. The begin and end times of the pilot wavelet are marked by the vertical bars. The PCC measure improves while the CCGN is little affected by this modification. Further, the Figure shows that the PCC maxima are more peaked than the maxima of the CCGN-measure. This and the importance of the choice of the correlation window support that PCC is more sensitive to waveform coherence. In other words, PCC permits to discriminate between closely similar waveforms. This is an advantage for travel time picking or the computation of objective functions.
 

Example 3:

a) The sketch outlines the generation of closely similar waveform pairs by the addition of two randomly generated signals. The pilot wavelet is generated by shifting the low frequency signal by 0.2 s prior addition. b) 10 examples of randomly generated waveform pairs. The vertical bars mark the 1.8 s window of the pilot (solid line). The numbers at the upper left and right are the lag times of the best CCGN and PCC waveform fits. c) and d) The histograms show the lag time distributions of the best PCC and CCGN waveform fits of 5000 randomly generated waveform pairs. The dark grey area shows the distributions for the 2500 best correlations.

An example for a deterministic waveform perturbation is shown in the Figure. In Figure a we picture the procedure used to generate the closely similar waveform pairs. Two pulse functions are generated with two random amplitude spikes at random time within a 1 s window. The first and second pulse function have been band-passed at 1 - 3 Hz and 0.5 - 1.5 Hz, respectively. The high frequency bandpass produces signals which are about twice as large as the low frequency bandpass. Therefore the second trace was multiplied by 1.5 to counteract large amplitude differences between the waveforms. Finally, the resulting traces were added after shifting the low frequency trace by 0.2 s and without shifting the traces to obtain the pilot and its closely similar waveform. This procedure mimics the generation of composite signals due to multipathing. In other words the obtained signals consist of two or more distinct signals which arrive with different slowness and amplitude values at about the same time at slightly different station locations. The length of the pilot (solid line) is chosen to be 1.8 s and is marked by the vertical bars in Figures a and b. Figure b shows 10 examples of waveform pairs which are generated by the described procedure. The numbers at the upper left and right mark the lag times which correspond to the best waveform fit using CCGN and PCC, respectively. 5000 waveform pairs have been generated and correlated. The lag time distributions of the best waveform fits are presented in Figures c (PCC) and d (CCGN). The histograms in the dark grey tones show the distribution of the 2500 best correlations.

It can be seen that the distributions are significantly different. The PCC lag time distribution shows a balanced waveform alignment between both signals, i.e. at -0.2 s and 0 s lag time. Conversely, the CCGN lag time distribution contains a clear maximum at 0 s. This means that the waveform alignment is governed by the high frequency signal (a(t) in Figure a) which has not been shifted in time prior to summation. On average this signal is larger than the low frequency signal and consequently favored by the CCGN-measure. A considerable amount of waveform pairs have been aligned with positive lag times (maximum at about 0.25 s). In these cases the coherence measures advocate a negative waveform correlation. If PCC is used then these cycle skips will rapidly decrease with increasing pilot lengths. For instance, increasing the length by 0.5 s decreases the number of cycle skips to about one fifth for PCC while the number of cycle skips of the CCGN-measure remains almost unchanged. The CCGN lag time distribution shows only minor changes while the PCC lag time distribution contains an increased number of correctly aligned waveforms. The example from the Figure  shows the ability of PCC to detect coherent weak amplitude features which are concealed by larger amplitude signals.
 

Following figure is only to show the different sensitivity of both measures:

a) Seismic test trace and three pilot wavelets used to calculate the PCC's and the CCGN's which are shown in figures b-d. The vertical lines mark the begin and end times of the employed pilots. We use the labels b,c, and d to refer to these wavelets. b) PCC (solid line) and CCGN (dotted line) between wavelet b and test trace. c) Same as b) but wavelet c is used. d) Same as b) but wavelet d is used.

Figure a shows a random time-series and three arbitrarily selected pilots. We label these wavelets b, c, and d. Their begin and end times are marked by the vertical lines. Wavelet b and d look complicated and are distinct from the rest of the trace. Conversely wavelet c is not recognizably different from other portions of the time-series. Figures b, c, d show the PCC's (solid lines) and CCGN's (dashed lines) between the wavelets b, c, d  and the trace from Figure a. The black dots mark the absolute maxima of the cross-correlograms which are located at the beginning of the pilot wavelets. They mark what we call signal in this example. As can been seen in the Figure, the signal-to-noise ratio (S/N) of the different cross-correlograms depends on the waveform complexity of the pilot waveform. Note that the S/N ratio is large whenever a distinct waveform (Figures b, d) is used for the correlation. Conversely, the S/N ratio is small for the cross-correlation with wavelet c (Figure c). This is expected and caused by the large resemblance of the wavelet c with the waveforms in the rest of the trace. Note also that the qualitative comparison of the cross-correlograms (PCC and CCGN) shows the ability of PCC to further increase the S/N ratio. The ratios of the root-mean-square (rms) amplitudes CCGN-to-PCC are 1.28, 1.25, and 1.58 for the traces from Figures b,c, and d. This can be interpreted as a stronger sensitivity of waveform similarity in the PCC measure.


Back to my first page ?