Brainprints: identifying individuals from magnetoencephalograms | Panda Anku

In the result section, we first confirm the existence of the fingerprint using data of a single session, of multiple sessions, of multiple-tasks, and even of multiple recording modalities. We use machine learning tools as well as interpretable features to show that identification is easy when the MEG sessions were collected on a single visit. We then show that the proposed features also achieve high accuracy on datasets of multiple visits to the scanner, and some features are even consistent on datasets between different tasks and from imaging recording modalities. We finally show which components of each feature are important for individual identification, and that sample size and level of preprocessing will also affect identification accuracy.

Within-session identification is surprisingly easy

To measure identifiability, we consider the test accuracy of a classifier trained to identify participants from their MEG recording. We first focus on within session identifiability. In this context, we assume that each participant undergoes one session. A classifier is trained on a subset of the session, in which each trial is labeled with the identity of the participant it corresponds to. In our framework, we refer to the training set as the source set. Then, on held-out test data, the classifier predicts which participant is associated with each test trial. We refer to the test set as the target set. As an example, we investigated individual identifiability on a MEG dataset of eight participants during a reading task. Participants were asked to read a chapter of Harry Potter23 while each word was presented for 0.5 s on a screen. There were 5176 trials (words) for each individual. The data was recorded using the Elekta Neuromag system (see Supplementary Fig. 1 for the sensor layout). The Harry Potter (HP) data is a single-session dataset: the data for each individual were collected on a single visit of the MEG scanner. Hence the source and target set are non-overlapping subsets of that single session. We preprocessed and downsampled the data from 1000 Hz to 200 Hz so that there are 100 time points for each word. We trained a random forest classifier24 using the MEG recording of all channels at a randomly selected time point, a flattened vector representing the snapshot of the topographic map (topomap) of the brain activity (see Methods and Supplementary Fig. 2). Random forest is a powerful classifier that uses a majority vote of a number of decision trees to predict the label associated with a given feature. Under this setting, we are asking if there is any individual-specific information contained in the topomap, the basic element of MEG recording. We split the dataset into 10 non-overlapping folds and used one as the target (testing) set and the other nine as the source (training) set. This 10-fold cross-validation scheme yielded a high identification accuracy (0.94) while the chance accuracy is only 0.125. We also repeated the analysis by only sampling one topomap from each trial to deflate possible statistical dependency and still obtained an accuracy of 0.923. This surprisingly high accuracy on merely 0.05 s of MEG data suggests the existence of strong patterns detected by the random forest classifier. This strong pattern may be contained on the transient spatial distribution of an individual’s MEG activity and is strongly distinctive of an individual. This high accuracy with the limited amount of information used suggests that within-session identification is a strikingly easy task.

Interpretable MEG features yield high identification accuracy

The random forest classifier may not enclose enough information to explain the high identifiability of the HP data because of the black-box nature of the algorithm. The topomap mainly contains the spatial information: how heterogeneous the amplitude of the signal is across channels at a certain time point. High identifiability may also be attained using temporal and frequency information. We proposed three interpretable features for individual identification to further disseminate the individual-specific information. These features are interpretable because they bear biological meanings and hence can be used to interpret the high identification accuracy. The three features were based on n randomly selected trials (words) which have the shape [102 channels, 100 time points, n trials] (Fig. 3a). sp (Fig. 3b, Supplementary Fig. 3) is the spatial correlation between different sensors which may be related to individual-specific correlated activities between areas of the brain or the anatomy of the individual (e.g., brain size)8,25. tp (Fig. 3c, Supplementary Fig. 4) is the temporal correlation between the time points into a trial. A high value in the tp matrix indicates highly synchronous brain signals between two temporal points, which might be related to participant specific stimulus processing latencies. A relevant study shows that the temporal change of brain activities in auditory steady-state responses are different between individuals26. fq (Fig. 3d, Supplementary Fig. 5) represents the distribution of the power intensity of signal frequency. Individual differences might also manifest as differences in the power distribution along frequency bands22,27.

Fig. 3: High within-session identification accuracy on HP data with three interpretable features.
figure 3

a Shape of the HP data before featurization. The HP data consists of participants reading a book chapter one word at a time for 0.5s each. The data are resampled to have the dimension [102 channels, 100 time points, n trials] where each trial corresponds to one word and n to the number of words. b The spatial correlation feature sp is a 102 × 102 Pearson’s correlation coefficient matrix computed across the time points and trials. c The temporal correlation feature tp is a 100 × 100 Pearson’s correlation matrix computed across the channels and trials. d The frequency feature fq is a vector in ({{mathbb{R}}}^{51}) where 51 is the number of frequency bands. The power at each band was averaged across channels and trials. e Identification accuracy with the three features. The accuracy was averaged across 100 identification runs of 8 individuals. The red dashed line represents the chance level (=0.125). The error bars are the standard errors across individuals and identification runs and are invisible since they are all zeros.

We used the 1-Nearest Neighbor (1NN) identification procedure, similar to Finn et al.15, to test if the three features are brainprints for the within-session identification task. For a given feature such as sp, the feature was computed on the source set using n randomly sampled trials (n = 300 for the HP data). Target set features were also computed in the same way (but unlabeled) with the same number of trials. The 1NN classifier simply assigned each target feature to the participant with the closest source feature (we used correlation to measure distance). The aforementioned matching process was repeated for 100 runs to account for the variance of the feature on the sampled trials, and the accuracy was averaged across these 100 runs. The simplicity of this 1NN classifier optimizes the interpretability of the result.

With n = 300 trials all three features achieve perfect identification accuracy (Fig. 3e, the accuracy for sp, tp, and fq is 1 ± 0, mean ± SE, p < 0.0002, see Supplementary Information B for how we computed the p-values). In fact, the high identifiability can be attained with as few as n = 100 trials (Supplementary Fig. 6a shows the improvement in accuracy for different number of trials, and Supplementary Fig. 6b shows the accuracy is high for all subjects and features). The high identifiability with sp, tp and fq suggests they are brainprints, at least for identifying individuals within a session. Therefore, multiple features capturing different aspects of the MEG activity can be used for identifying individuals.

Cross-session identification confirms the existence of brainprints

The high within-session identification accuracy suggests sp, tp, and fq are individual-specific within a session. Artifacts such as environmental noise and equipment configurations, however, might be the main contributing factor to within-session identification accuracy. Hence, we examined the consistency of the three features when the same type of task data was collected from each individual on multiple sessions. This setting tests if the features are preserved over time, i.e., if they are indeed brainprints and not mere artifacts. If the identifiability is significantly lower on multi-session datasets, the high identifiability on the HP data may be a mere result of session-specific artifacts, since the recording session for each individual is performed on different days. If high cross-session identifiability is observed, sp, tp, and fq can be considered genuine brainprints because they are unique to individual and invariant between sessions. This would also suggest low cross-session and high within-session variability (Fig. 1).

We tested the three features on two multi-session datasets: FST28,29, a four-session dataset where four individuals were shown pictures of familiar and unfamiliar faces with 1464 trials and SEN, a three-session dataset where four individuals were shown sentences with 3575 trials. Both recordings were recorded with the Elekta Neuromag sytem and were preprocessed and downsampled from 1000 Hz to 200 Hz so that there were 100 time points in one picture/sentence which we considered as one trial (see “Methods”). Since each individual has recordings conducted on different days, we set the target and source data to be from different sessions (Fig. 4a), to test the role of environmental artifacts and further confirm the existence of the brainprints. In addition to identification accuracy which is binary on one matching procedure, we used a continuous version, the rank accuracy, which captures more information in a failure case where an individual is misidentified. Rank accuracy represents the rank of the correct assignment out of all possible assignments: it is 1 if the target feature of each individual have the largest similarity to the source features for that individual, and is (frac{1}{K}) if the similarity is the smallest. The chance rank accuracy is (frac{K+1}{2K}). In addition to the identification and rank accuracy, we also used a metric, differential identifiability30 which measures the similarity between the features of the same individual as compared to that of other individuals (see “Methods” and Supplementary Fig. 7).

Fig. 4: Cross-session identification on FST and SEN data confirms existence of brainprints.
figure 4

a Schema of the cross-session identification task. For one identification run, the features of each individual are computed using randomly sampled trials (N = 300) from both the source and target session. Target session features are then classified by selecting the individual with the largest similarity score in the source session. b Heat maps of the cross-session identification accuracy using the three features on FST data. Each grid represents the average accuracy across 4 individuals and 100 identification runs. The within-session accuracy (diagonal entries) are computed using the same source-target splitting procedure as on the Harry Potter data to avoid data leakage. c Average cross-session identification accuracy and rank accuracy for each feature on FST data. Within-session accuracy (diagonal entries in b) were excluded in computation. Error bars are the SEs across cross-sessions (N = 12), individuals (N = 4), and identification runs (N = 100) and are invisible due to small values. Red dashed lines are the chance level for the identification accuracy (=0.25) and rank accuracy (=0.625). d Identification and rank accuracy on FST data by individual. Within-session accuracy were excluded in computation. Error bars are the SEs across cross-sessions (N = 12) and and identification runs (N = 100) and are invisible due to small values. The red dashed lines are the same as in (c). eg Same as (bd) but on SEN data with the same number of individuals and identification runs (N = 4 and N = 100) but different number of cross-sessions (N = 6). The high identification accuracy with the three features on multi-session datasets confirms these features can be brainprints for individual identification.

Both tp and fq achieve almost perfect average identification and rank accuracy on both FST and SEN data whereas sp achieved lower but still well above-chance accuracy (Fig. 4c, f). The high cross-session identification accuracy of sp, tp, and fq confirms that it is reasonable to call them brainprints for individual identification in MEG. The lower identification accuracy for sp is due to low accuracy on a two of the individuals (Fig. 4d, g) in both datasets. This is also confirmed using the confusion matrices (Supplementary Fig. 8). However, identification accuracy of these individuals is not consistently low across all session pairs (Fig. 4b, e) indicating that sp only perform worse for these subjects between certain sessions.

For SEN data, the MEG recording of two subjects were taken on the same day for session 1 and 2. Since the identification accuracy of sp corresponding to these two pair of sessions (1 vs 2 and 2 vs 1) does not yield higher accuracy than the average (the mean identification accuracy between these two session pairs is 0.655, lower than 0.72, the mean across all cross-session pairs), the accuracy for sp is not inflated due to this issue with duplicated recording times. In line with the results on the HP dataset, sp, tp, and fq are the brainprints that are consistent even between recording sessions with tp, fq leading to higher identifiability.

Spatial brainprints are consistent across resting-state and tasks

The high performance and interpretability of the brainprints make it enticing to study the factors and the underlying mechanism for identification. We looked at the performance of these features between two sessions of different types collected on the same day to test their consistency between different brain states. We compared the features using the Human Connectome Project (HCP) MEG data5 between a resting-state session (422 trials on average) in which individuals (N = 77) rest and do not perform a task and a task-MEG session (372 trials on average) where these same individuals view images and perform a working-memory task. The dataset was recorded using the MAGNES 3600 system. We preprocessed and downsampled the data from 1024 Hz to 200 Hz and there were 500 time points in one trial in the WM data, which correspond to 2.5 s after the onset of the stimulus (see Methods). For the resting dataset, we simply reshape the recording into consecutive blocks similar to the WM dataset and performed the same analysis.

Consistent with the cross-session results in Fig. 4, sp yields a high identification accuracy (Fig. 5a, b, 0.77 ± 0.0034, mean ± SE, p < 0.0002), well above the 0.013 random baseline. This suggests that the spatial fingerprint is consistent between different brain states, which confirm a similar finding in fMRI15. The by-individual identification accuracy (Fig. 5c, Supplementary Fig. 9) shows that there is a small subset of individuals whose accuracy is below random, which may be due to the lack of head position correction in the HCP collection protocol. tp and fq do not perform as well as sp, suggesting that the temporal rhythm and frequency involved might be different between resting-state and task31,32.

Fig. 5: Consistent sp for cross-task identification on Human Connectome Project data.
figure 5

a Heat maps of the cross-task identification accuracy using the three features on HP data. Both resting and working-memory (WM) data were recorded on the same day. For one identification run, the features of each individual were computed using randomly sampled trials (N = 200) from both the source and target session. Each grid represents the average accuracy across 77 individuals and 100 identification runs. The within-task accuracy (diagonal entries) were computed using the same source-target splitting procedure as on the Harry Potter data to avoid data leakage. b Average cross-task identification accuracy and rank accuracy for each feature on HCP data. Within-task accuracy (resting vs. resting, WM vs. WM) are excluded in computation. Error bars are the SEs across cross-task sessions (N = 2), individuals (N = 77), and identification runs (N = 100) and are invisible due to small values. The red dashed lines are the chance level for the identification accuracy ((=frac{1}{77})) and rank accuracy ((=frac{39}{77})). c Identification (upper three rows) and rank (lower three rows) accuracy on HP data by individual. Within-task accuracy were excluded in computation. Error bars are the SEs across cross-task sessions (N = 2) and identification runs (N = 100) and are invisible due to small values. The red dashed lines are the same as in (b). These results indicate that sp is consistent even when performing different tasks (resting vs WM) in the source and target session.

The rank accuracy of tp and fq (Fig. 5b, 0.82 ± 0.0017 and 0.85 ± 0.0016, mean ± SE, p < 0.0002 for all) are much higher than the baseline (=0.506). The majority of the individuals also have higher rank accuracy than baseline for tp and fq (Fig. 5c). The higher rank accuracy suggests that tp and fq may still contain individual-specific information but are not strong enough to achieve a high identification accuracy. Since the individuals perform different tasks on the source and target session, the rank accuracy indicates the potential consistent brainprint the generalizes beyond the task. It is noticeable that for the HCP dataset, the recording sessions of one individual were recorded on the same day. Hence one may exercise caution when extend the conclusions to cross-session datasets.

Temporal and frequency brainprints are consistent across modalities

So far, we have verified that the brainprints are consistent across visits, and even between resting and tasks. It would be a stronger piece of evidence if we show that brainprints can identify individuals during two visits to different centers with different recording modalities. We looked at MEG and EEG (electroencephalography) data of 15 participants viewing scene images (362 trials for each individual, one trial lasted 1 s)33,34,35. Both MEG and EEG were recorded for the exact same stimuli, but on different days for each participant, making it an ideal testbed to verify the consistency of brainprints across different imaging modalities. We downsampled the MEG and EEG data from 1000 Hz and 512 Hz to 110 Hz so that there were 110 time points per trial. Since the spatial arrangements of MEG and EEG are different, we only tested the accuracy using tp and fq.

Both features yield well above-chance identification accuracy and rank accuracy (Fig. 6, identification accuracy for tp and fq is 0.43 and 0.43 whereas the chance accuracy is (frac{1}{15}=0.067), the same conclusion for the rank accuracy). This constitutes strong evidence that the frequency and temporal information of an individual’s response to stimuli are preserved even when different imaging modalities are used. The consistency also indicates that, at least for the temporal and frequency feature, the high accuracy is due to the individual-specific responses despite the possibility of different artifacts induced by MEG and EEG machines. We also checked that the identification accuracy does not depend on the number of days between the two visits of an individual (Supplementary Fig. 10).

Fig. 6: Consistent tp and fq for cross-modality identification on MEG-EEG data.
figure 6

a Heat maps of the cross-modality identification accuracy using the two features on MEG-EEG data. MEG and EEG data for the same individual were recorded on different days. For one identification run, the features of each individual were computed using randomly sampled trials (N = 200) from both the source and target session. Each grid represents the average accuracy across 15 individuals and 100 identification runs. The within-task accuracy (diagonal entries) was computed using the same source-target splitting procedure as on the Harry Potter data to avoid data leakage. b Average cross-modality identification accuracy and rank accuracy for each feature. Within-modality accuracy (MEG vs. MEG, EEG vs. EEG) were excluded in the computation. Error bars are the SEs across cross-modality sessions (N = 2), individuals (N = 15), and identification runs (N = 100) and are invisible due to small values. The red dashed lines are the chance level for the identification accuracy ((=frac{1}{15})) and rank accuracy ((=frac{16}{30})). c Identification (upper two rows) and rank (lower two rows) accuracy on MEG-EEG data by individual. Within-modality accuracy is excluded in the computation. Error bars are the SEs across cross-modality sessions (N = 2) and identification runs (N = 100) and are invisible due to small values. The red dashed lines are the same as in (b). These results indicate that tp and fq are consistent even when different neuroimaging modalities were used in the source and target session.

Not every part of a brainprint is equally important

What contributes to the high identifiability of the three brainprints? Understanding the relative contribution of the components of brainprints could help understand individual identifiability and variability. We divided the three brainprints into sub-features and looked at their identification accuracy to see which components contain the most individual-specific information. sp was divided into correlations between groups of sensors: Left Occipital (LO), Right Occipital (RO), Left Parietal (LP), Right Parietal (RP), Left Temporal (LT), Right Temporal (RT), Left Frontal (LF), Right Frontal (RF). tp was divided into correlations between time intervals. fq was divided into frequencies within a sliding window. We use the SEN and FST dataset to focus on cross-session patterns.

For both SEN and FST, the correlations between sensors within Left Occipital (LO) and between LO and Right Parietal (RP) yield high accuracy (Fig. 7a, inset, and Supplementary Fig. 11a). LO is involved in visual processing36 and RP is involved in sensory integration37, both of which are functions recruited by the experimental task. Due to the nature of the sampled signal and the physical properties of the skull, each MEG sensor samples coarsely from the brain, making it hard to say whether MEG spatial correlation effectively corresponds to functional connectivity, especially for nearby sensors8. However, the fact that correlations between faraway groups of sensors, for example, LT and RT, still have good accuracy suggesting it may be due to actual functional correlation between these areas, but it could still be the case that it is the difference in skull shapes that contributes to the high sp accuracy.

Fig. 7: Identification accuracy of components of the features.
figure 7

See Supplementary Fig. 11 for (a, b) on FST data. a Identification accuracy of the sub-features of sp on SEN data. Each grid represents the identification accuracy using the corresponding entries of sp averaged across cross-sessions (N = 6), individuals (N = 4), and identification runs (N = 100). Inset is the plot of the sensor group layout and edges correspond to the sensor group pair with over 0.7 accuracy for both FST and SEN. The topomap was plotted using the python MNE package49. b Identification accuracy of the sub-features of tp on SEN data. Each grid represents the identification accuracy using the corresponding entries of tp averaged across the same dimensions as in (a). Inset is an example MEG signal of one individual averaged across channels (N = 102) and trials (N = 1000). Arrows correspond to the entries of the heatmap with over 0.9 accuracy for both FST and SEN. c Identification accuracy of the sub-features of fq on SEN (upper plot) and FST (lower plot) data. Each dot represents the identification accuracy using the corresponding entries of tp averaged across cross-sessions (N = 6 for SEN and 12 for FST), individuals (N = 4), and identification runs (N = 100). Accuracy values of f larger than 60 Hz were truncated since the curve became flat. Error bars are SE across cross-sessions, individuals, and identification runs and are invisible due to small values. The curve peaks at f = 6 Hz for SEN and f = 8Hz for FST. The accuracy of some components of a feature is consistently higher than the rest on both datasets, indicating that some parts of a certain feature may be more important in identifying individuals.

For both SEN and FST, the super-diagonal of the heatmap for temporal sub-features (Fig. 7b and Supplementary Fig. 11b) has high accuracy. The super-diagonal entries correspond to the cross-correlation of the MEG signal between two consecutive segments of 0.05 s. Hence the rhythm of the signal within a short segment of time contributes to identifiability, which can also be seen from the banded structure of tp (Fig. 3c). Moreover, the correlations between fourth and fifth 0.05 s yield considerably high accuracy on both datasets tp (Fig. 7b, inset). These time periods overlap with the time we expect the brain is processing word and picture stimuli38.

The power intensity of frequencies between 4 and 13 Hz yields the highest accuracy on both SEN and FST data (Fig. 7c), the peak is 6 Hz for SEN and 8 Hz for FST. These peaks roughly corresponds to the Theta and Alpha frequency band which are related to the resting state, memory, and mental coordination39. The accuracy is also moderately high on part of Beta band (14–31 Hz) where attention and concentration are recruited39. We also grouped the frequencies into canonical frequency bands and discovered a similar pattern (Supplementary Fig. 12).

Identifiability changes with data size and preprocessing

The last dimensions that we investigate are the dependence of individual identification on the amount of available data and on the level of data preprocessing.

We looked at the identification accuracy using the three brainprints while increasing the sample size n. The identification accuracy increases with the amount of data used for computing sp, fp, and fq (Fig. 8a) as the sampling variance becomes smaller. In general, with 50 s of data, the brainprints perform well on cross-session identification of the same task. sp becomes reasonably accurate on the HCP dataset with 100 trials corresponding to 250 s of recording, possibly because more trials are required to accurately compute features that are distinguishable within a larger pool of individuals. For FST and SEN, the identification accuracy of sp saturates at fewer number of trials than tp and fq. It is possible that sp requires fewer trials to be estimated robustly.

Fig. 8: Factors affecting identification accuracy.
figure 8

a Identification accuracy with respect to the number of trials (sample size) used for the featurization of FST, SEN, and HCP data. Each dot represents the identification accuracy averaged across individuals, identification runs, and cross-sessions (or cross-task sessions) excluding the within-session or within-task results. Error bars are the SEs across the corresponding cross-sessions (or cross-task sessions), individuals, and identification runs of each dataset and are invisible due to small values. bc Identification (b) and rank (c) accuracy of the three features computed on raw and fully preprocessed FST and SEN data. The same color represents the same feature as in (a). For (b), the identification accuracy across sessions (N = 12 for FST, N = 6 for SEN) and individuals (N = 4) were averaged with respect to identification runs (N = 100) and were put into one vector (of N = 48 entries for FST and 24 entries for SEN) for each feature and level of preprocessing. The heights of the bar plots are the mean of the corresponding vector. A two-sided unpaired t-test was performed on the vectors of the same feature and dataset between the raw and preprocessed data. The p-values for all pairs are less than 0.05, except for the sp feature for SEN. For (c), the rank accuracy were put into one vector in the same way as in (b). The heights of the bar plots are the mean of the corresponding vector and the error bars are its SE A two-sided unpaired t-test was performed on the vectors of the same feature and dataset between the raw and preprocessed data. The p-values for all pairs are less than 0.05, except for the sp feature for SEN.

Preprocessing may also affect identification accuracy. We compared the difference in the identification and rank accuracy between the raw and preprocessed data (Fig. 8b, c). The changes in accuracy were all statistically significant (Fig. 8b, c) when the raw data was preprocessed for all the three features except for the sp feature for SEN: For both FST and SEN, preprocessing yields better accuracy for tp and fq. However, for sp, the results point in opposite directions: preprocessing increases identifiability for FST and decreases it for SEN (though with little statistical significance). There is one difference in the preprocessing pipeline for both datasets: FST preprocessing does not include head position correction due to a lack of head position recordings whereas SEN does. Head position correction might be changing the signal in inhomogeneous ways thereby undermining the identifiability with sp. We also found that head size and average movement have a weak correlation with identification accuracy in the HCP data (not statistically significant after multiple comparison correction), shown in Supplementary Fig. 13.

Leave a Comment