Studies have shown that the formation of emotion as self-awareness and cognitive appraisal process is complicated and can lead to idiosyncratic differences. Subject’s self emotion evaluation process could be biased due to factors of environment, personal experience, and one’s own cognitive ability, and the true affective state may be neglected (un-noticeable) due to an unconscious mental process. In this work, we present a comprehensive study to investigate the emotion recognition accuracy obtained using physiology with respect to different annotation schemes, i.e., intended, self-reported, and observed emotion labels. We found that when performing recognition across these three different labeling schemes using the same physiological parameters, the accuracy of the self-reported emotion labels results in about 10.3% and 3.1% drop when compared to two other annotation schemes. It indicates that self-assessed emotion labels may be noisier and induces a larger mismatch with respect to the affect-stimulated physiological responses. Further analysis shows that the electrodermal activity signal(EDA) has the highest recognition rate with respect to the intended emotion of the stimuli. Finally, our error analysis reveals that there may exist a bias in the self-annotated label that is conditioned on the intended stimuli’s valence polarity.


In this reserach, we apply our experiments on Amigos Dataset. A total of 16 short emotional videos (duration <250s) were carefully chosen from previous research as physiology elicitation. 40 articipants aged between 21 and 40 (mean age 28.3) were recruited in a laboratory environment. Each emotional label is binarized with two criteria: -SB and -PB refer to binarized according to subject mean or dataset’s mean respectively. Three types of the emotional annotations are defined:

  • Intended(-Int): The video stimuli’s intended emotional stimulation level. High/Low on Arousal and Valence.
  • Self-reported(-Self): The participant’s self-disclosed emotion feelings after wathcing each video stimuli.
  • Observed(-obs): The facial expression of the participant when watching the stimuli was recorded and then annotated by third-party unknown annotators.


We perform the standard classification using SVM. The results are below:


First, we find that subjectdependent label binarization technique (-SB) results in generally a better recognition accuracy. This suggests that a global normalization may neglect the individual baseline creating unwanted bias when learning to recognize these emotion labels using physiological signals.

Second, we could quickly notice that the predictability of self-labels (Self-) using physiology is the worst. In contrast, these physiological data is more discriminative when learning to recognize the external observer’s ratings (Obs-).

Feature Importance Analysis

To leverage the importance of the different modalities toward different emotional label, here we perform the SHAP analysis on our trained model: shap

  • ECG: There is hardly any features that are consistently important between different labeling schemes. many of the Heart Rate Variability (HRVs) features are jointly selected in the cases for the intended original video stimuli Int and the observers’ judgment Obs.

  • EDA: In both arousal or valence attribute, “Peaks Amp” feature computed from skin conductance response (SCR) is consistently selected as a key factor that are correlated to the intended -Int labels. This measure has distinct phenotype from other automatic nervous system signs such as heart rate, since SCR is under the strict control of the sympathetic branch of the nervous system. This discrepancy in the production mechanism may help explain why EDA related features achieve the best predictive power on Int label in contrast to other physiological modalities.

  • EEG: EEG achieves high predictability on -Obs, however, decreases an almost 10% and 5% on -Self labels of arousal and valence. We notice that many “Hjorth” related dimensions, which are commonly calculated in emotion recognition tasks using EEG, are exclusively selected for Obs and Int labels only (not for Self !)


In this work, we comprehensively inspect the predictability of emotion annotations from three different perspectives: self-assessment(-Self), external observation(-Obs), and intended stimuli(-Int).

Our experiments show that there exist several interesting patterns on the recognition results. ECG and EEG data consistently obtain better discriminative power for the external observations while the EDA signal tends to work better on original stimuli’s label.

To our knowledge, this is one of the first works in providing comprehensive recognition and analyses on multi-perspective of emotion annotations using physiology. It’s pretty interesting to see that somehow even “What we think we feel”(-Self) is not consistent with “What our body(brain) really feels”(-Obs). A further research would be conducted to realize the formation of the affective feelings from the neurophsyiological perspective of view !!!