(Warning: This is an extremely technical paper, if you are not interested in the psychology of music I suggest you move on to another one of my posts. If you are not familiar with this topic but would like to expand your knowledge, I suggest you start with my post on "Math and Music".)
Abstract:
The
current experiment is designed to test for a possible universal basis for human
interpretations of musical signs. As far as I know no experiment has been
conducted to test for universality in the decoding of a variety of musical
meanings using the N400 response. By recreating Koelsch’s 2004 experiment in a
culture that has never been exposed to western music, it should be possible to
determine to what extent western musical signs are cultural constructs and to
what extent they share universal information-carrying characteristics with all
human music.
Background
Investigation:
In
2004 Koelsch et al performed a novel experiment in which a semantic priming
paradigm showed that by pairing words with similar and dissimilar musical
excerpts one can manipulate the elicited N400 response. Since then various
studies have expanded upon this paradigm in a number of ways to create a small but rich body of
work, which I will review in the forthcoming paragraphs.
In
the majority of these studies EEG is measured and averaged to assess strength,
time course, and, to a lesser
extent, neural generators of the N400. This electrophysiological response has
been shown in previous studies to be inversely related to the semantic fit
between a word and its preceding contexts[1].
Concurrently, behavioral data reveals
a processing advantage for words preceded by semantically related context, be it
visual, musical, or lexical.
Koelsch
found that musical excerpts could prime both abstract and concrete words in a
variety of ways,
including (i) common patterns or forms (pitch, dynamic, tempo, timbre) that
resemble the features of an object (ii) mood suggestion (iii) extramusical
association (iiii) intramusical meaning (tension and resolution)[2].
The N400 was unrelated to whether the subjects’ task involved rating the
semantic fit of what they were experiencing. From this study Koelsch et al conclude that music can prime
representations of abstract meaningful content both independent of and in
relation to emotions.
In a follow up study in 2010 Koelsch and Steinbeis had both
musicians and non-musicians evaluate musical samples and emotionally congruous
and incongruous words. As in other studies they found that incongruous words
were categorized slower and accompanied by a larger N400, but interestingly
showed that three independent aspects of music could each independently be
responsible for this effect; timbre, consonance vs dissonance, and major vs
minor.
They
go on to speculate that we would do well to investigate whether this can also
be extended to modulating melody and rhythm. Finally they suggest overlapping
networks for decoding verbal
and nonverbal semantic information which include early and late processing. The
former is related to the N400 and involves meaningful semantic but not lexical
representation and the latter involves explicit interpretations and is indexed
by the N600. Semantic representations are often conceptualized as the meaning
behind the words while lexical representations are of the word itself.
In
a pair of studies in 2009 and 2010 Daltrozzo et al showed that the 10 second
musical excerpts used by Koelsch can be reduced in length to one second and
still carry relevant semantic information. This temporal reduction also allowed
them to reverse the prime-target ordering, so that music could now serve as
target as well as prime. This change revealed an N400 of equal size and latency
in both variations. By changing whether the subjects were asking to judge the
relatedness of the two words (semantic decision) or whether they were actually
words (lexical decision) they were able to diminish N400s, showing that
endogenous or conscious attention and top down processing can affect the
subconscious responses.
In an
interesting variation of this paradigm Gordon et al (2010) investigated the way
in which concepts from melodies and words can
intertwine when stimuli are tri-syllabic words sung on three-tone melodies. They
used four categories for the prime and target: same word-same melody, same
word-different melody, same melody-different word, and different word-different
melody. Response time and N400 were assessed, and interestingly the same word
sung on a different melody produced a robust N400 response and slower reaction
times, suggesting the musical features are in some way affecting the processing
of the words meaning. As expected, same melody different word resulted in
slower responses and greater N400, but it was the only other category to do so.
Like
the previous works they also addressed whether or not attention mediated the
process in some way by having two versions of the task, where participants were
asked to evaluate the similarities of the melodies or words respectively. The
observed effects were only slightly offset by attending to the melody rather
than the words and vice-versa, suggesting mainly automatic processing of
semantic information. The authors corroborated these findings with an fMRI
study which they claim suggests shared resources through interactive
phonological/semantic and melodic/harmonic processing.
They
go on to make the evolutionary claim that infant preference for sung words
cannot be attributed only to the addition of a musical dimension, it must be
related to the combination of the two elements. They go on to suggest that this preference reflects a proclivity for singing based
parent-infant interactions for a number of important biological reasons. In
addition to fostering bonding among a parent and their newborn, it may also aid
in language learning through its use of exaggerated prosody, which can both aid
in segmentation and ads an emotional and therefore memorable element to the
interaction. Finally, they suggest that music can be thought of as a mnemonic
to aid in memory of language.
Goerlich
et al (2011) conducted an experiment with alexithymia patients in order to
judge their electrophysiological and behavioral response times for judging
semantically related and unrelated happy/sad words, music and prosody.
Alexthymia is a condition involving the identification and verbalization of the
feelings and cognitive processes related to emotion. The condition exists on a
spectrum from low-high, and the experimenters hoped to compare behavioral and
physiological measures to characterize whether the impairment was mainly
conscious or unconscious.
At
a behavioral level they found reduced affective priming with increased
alexithymia mainly when music and prosody were to be categorized first, i.e.
when they were the prime rather than the target. However they admit that the
observed unaffected emotional word processing is at odds with other studies.
They also found a decreased N400 with increasing levels of alexthymia leading
them to conclude music and speech were using similar acoustic features to
demarcate various emotions.
Where
does this meaning come from?
In
a recent review paper Koelsch (2011) tries to elucidate the ways in which
semantic meaning can emerge from musical stimuli. First he identifies several
sources from which musical meaning can emerge, including musical aspects of
speech, such as pitch, contour, range, variation, timbre, tempo and pause, and
what he calls “nonlinguistic” aspects of speech, including gender, size,
distance, location, etc. He doesn’t extrapolate much on how these different
aspects of speech give rise to meaning, but it seems self-evident in many cases
(pleasant vs unpleasant timbre, more change in pitch/increase tempo for greater
emotional arousal etc).
I
don’t see one of these categories being more linguistic or musical than the
other. It seems to me that the first set we generally think of as relating to
music when it could as easily be ascribed to language and the second set is
generally not thought of as relating to music but can be used by a skilled
composer to manipulate our semantic expectations. Both are important ways that
meaning can emerge from both music and speech in order to contribute to semantic
knowledge, and can be seen as overlapping areas on Koelsch’s Music-Language
Continuum.
From
there he defines three categories: extramusical sign qualities, intramusical
structural relationships, and the musicogenic effect. In this experiment we are
concerned with musical semantics and the N400, and therefore we will discuss in
detail the extramusical sign qualities, in which Koelsch attempts to create an
auditory analogue to Peirce’s original depiction of the signifier-signified
relationships.
The
first category is iconic musical meaning, and in my opinion is a bit messy, so
well save it for last. Indexical meaning arises in music when the music hints
at an emotional state by using patters similar to those used in speech prosody to index certain emotions. Finally
symbolic musical meaning comes from learned cultural associations.
Koelsch
gives numerous examples to describe iconic relationships rather than giving a
strong definition. I believe this is due to the fact that it is in need of
further subdivided, into at least three distinct subcategories. The first of
these are various qualia which can be evoked by different sound stimuli,
including warm, round, sharp, etc. These are examples of ideathesia, related to
synesthesia where different modalities are crossed, i.e. it sounds warm when
normally warmth is denoted through touch. Rather than one sense being crossed
with another, an idea is crossed with a sense.
Aristotle
used the term
“common
sense”
to refer to qualia that can be perceived through more than one sense organ. For
example, we generally think smells can only be perceived through the nose,
while shape can be perceived by either the eyes or the hands. There
are separate cells to detect heat and sound vibrations, and yet here we are
perceiving warmth through auditory stimuli. This may be due to a complex cross
modal mapping in the brain, which Ramachandran claims may be the basis of more
abstract metaphors.
Many
studies also provide evidence for sensory integration across modalities by
means of cross-modal mapping in the brain. It has been observed that “The
integration of information from different sensory modalities has many
advantages for human observers including increase of salience, resolution of
perceptual ambiguities, and unified perception of objects and surroundings.”
(Lalanne and Lorenceau
2004). This may lead to the existence of
robust cross-modal correspondences between many different stimuli, including
for example “food and beverage items and shapes varying on the angular-round
continuum” as observed by Ngo et al (2013). Perhaps, as Ramachandran suggests,
this is the basis for more abstract metaphor like “Juliet is the sun”.
I
think these semantic priming effects provide an excellent means to study different
cultures and age groups to determine whether these effects are due to learned
associations or innate predispositions. If it turned out to be primarily the
former with little agreement across cultures then perhaps this type of icon
would need to be re-categorized as a symbolic representation. However, due to universally
tested examples of these ideasthetic representations from the literature (e.g. Bouba-Kiki
effect), I think it is likely more than just cultural associations
(Ramachandran 2011).
Another
example which Koelsch gives of iconic meaning derives from an even more
abstract type of cross modal mapping. This category of stimuli generally refers
to a psychoacoustic effect and its corresponding theoretical concept, and
rather than converting from an idea to a modality we now moving from musical
ideas to another non-sense based semantic representation. In a concrete sense
this could refer to ascending musical notes priming the word staircase, or more
abstract ideas such as chords with notes spaced far apart priming wideness.
The
final example of iconic meaning according to Koelsh is when music directly
mimics something we hear in the real world such as the sounds of a bird call or
a thunderstorm. But what about a saxophone that sounds like someone laughing (an example used in Koelsch’s original experiment)?
This clearly combines the indexing of an emotional sound with the mimicking of that
same sound. Furthermore, by using patterns in music similar to those found in speech
(as Koelsch claims we do) in order to index emotions, are we not mimicking what
we hear in the real world?
In
order to remedy this discrepancy I propose that examples of mimicking what we
hear in the world always be thought
of as producing indexical rather than iconic musical meaning. Both hint at
something in the real world, but one is concrete/physical (bird song) and one
is abstract/nonphysical (emotional state).
Assuming prosody used to express different emotions is universal it
could be hypothesized that this type of musical meaning could emerge even when
individuals from different cultures rate the emotional dimensions of musical
pieces they have no experience with.
To
reiterate; the first and second example I agree are iconic representations
because they both bear a “likeness, a community in some quality” (such as a map
to the actually world), which I see as integral to Peirce’s original
definition. Meanwhile, indices (both concrete and abstract) bear a resemblance
to the real world that is (to quote Peirce himself) “a correspondence in fact”[3] and
therefore does not require a cross-modal mapping; it sounds like a bird sounds
rather than it sounds warm. Like the other two categories, symbols also come in concrete and
abstract forms; the national anthem could prime the idea of the landmass we
call “America” (more concrete) or the word patriotism (more abstract).
Methods:
In
hopes to reduce all other extraneous variables, I would like to recreate
Koelsch et al’s 2004 experiment as faithfully as possible with the exception
that I would be conducting it on the Mafa people of North Africa rather than on
German college students. The only difference between the two experiments should
be the subject’s cultural learning in relation to music, so that we might
attempt to determine the extent of the cultural basis of the encoding and
decoding of different types of musical signs and referents.
I
propose to use 50 subjects between the ages of 20-30. My stimuli will include
the 176 significant prime-target items selected for use in the original 2004
study during the behavioral pretest. As in the original study there will be
four prime stimuli for each target word; a related sentence, an unrelated
sentence, a related musical excerpt,
and an unrelated musical excerpt. Each prime will be used twice in relation to
targets, one in which they are related and one in which they are not. The primes
will be “pseudo-randomly intermixed and counterbalanced”, as in the original experiment.
Prime
stimuli (sentences used as controls and musical excerpts) will be presented via
computer speakers at a volume of 60 decibels. This will be followed by the presentation
of a target word in the computer screen for 2000 milliseconds.
After the word disappears from the screen, subjects will indicate their
subjective relatedness judgments of the prime-target pair using a button press.
A second experiment will replicate this first one but without the related
judgment task.
EEG
signals will be recorded from 32 surface electrodes placed around the scalp.
The time window for statistical analysis of EEG data will be 300-500 milliseconds after exposure to target stimuli. It may
also be useful to attempt to localize the neural sources of the N400 as Koelsch
did,
but trying to describe how we would do that is unfortunately beyond my
capabilities.
Hypothesis:
Evidence
shows that emotional inferences made from prosodic cues remain relatively
accurate across languages and cultures (Scherer, Banse, and Wallbott).
Simultaneously Fritz et al claim to have experimentally verified the universal
recognition of at least three basic emotions in music. In a meta-study Juslin
concludes that emotions can be accurately communicated to listeners through
music due to an emotion-specific pattern of acoustic cues that resembles those
used in speech prosody to communicate emotions. Importantly, as language
similarities decrease so does the percent of correct decodings by subjects,
though they still remain above chance levels.
From
this we hypothesize at least some level of accurate decoding of emotions as
well as an observed semantic priming effect in individuals who have had no
exposure to western music, which is likely to correlate with their ability to
decode prosodic cues in western languages. In addition, we can test the
universality of our auditory induced cross-modal representations using iconic
musical stimuli. It seems likely that these two types of musical signs will prove
relatively robust in cross-cultural studies of semantic priming in regards to
both behavioral and electrophysiological measures, though possibly only in a
subset of variants. I would be rather surprised if any symbolic representations
hold up cross-culturally.
If
this experiment was successful it would provide strong evidence that music is not only
expressive but also communicative, supporting Koelsch argument that there exists
a continuum between the music and
language realms, just as there exists a continuum
between music and language processing. It would also provide evidence of a
universal prosody used to communicate basic emotions and that this prosody can
be accurately encoded and decoded by skilled composers to listeners with no
cultural knowledge that a certain music styles indexes a certain emotion. In
addition it could provide evidence that qualia as well as abstract concepts can
be universally primed through cross-modal mapping regardless of
language/culture.
Koelsch
also argues that the fact that music is more expressive than communicative when
compared to language is exactly what gives music the power of ambiguity,
flexibility and fluidity, allowing it to mean different things at different
times to different people. I agree with this assessment, but if this experiment
were successful I would also add that music can also be effective as a type of
universal communication medium of emotions and cross modal representations.
Bibliography:
Atkin, A. (2010). Peirce's theory of signs.
Philosophy Dept, Stanford, CA. Available from Stanford Encyclopedia of
Philosophy. Retrieved from http://plato.stanford.edu/entries/peirce-semiotics/
Balkwill,
L., & Thompson, W. (2006). Decoding speech prosody in five languages. Semiotica, 2006(158), 407-424. Retrieved
from http://www.degruyter.com/view/j/semi.2006.2006.issue-158/sem.2006.017/sem.2006.017.xml
Balkwill,
L., Thompson, W., & Matsunaga, R. (2004). Recognition of emotion in
japanese, western, and hindustani music by japanese listeners. Japanese Psychological Research, 46(4), 337-349. Retrieved from http://onlinelibrary.wiley.com/doi/10.1111/j.1468-5584.2004.00265.x/abstract
Daltrozzo,
J., & Schon, D. (2009). Conceptual processing in music as revealed by n400
effects on words and musical targets. Journal
of Cognitive Neuroscience, 21(10),
1882-92. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/18823240
Daltrozzo,
J., & Schon, D. (2009). Is conceptual processing in music automatic? An
electrophysiological approach. Brain
Research, 1270, 88-94.
Retrieved from http://www.sciencedirect.com/science/article/pii/S0006899309005186
Fritz,
T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R.,
Friederici, A., & Koelsch, S. (2009). Universal recognition of three basic
emotions in music. Current
Biology, 19(7),
573-576. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/19303300
Goerlich,
K., Witteman, J., Aleman, A., & Martens, S. (2011). Hearing feelings:
Affective categorization of music and speech in alexithymia, an erp study. PLoS One, 6(5), e19501. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3090419/
Gordon,
R., Schon, D., Magne, C., Astésano, C., & Besson, M. (2010). Words and
melody are intertwined in perception of sung words: Eeg and behavioral
evidence. PLoS ONE, 5(3), 1-12. Retrieved from http://web.ebscohost.com/ehost/detail?sid=d360fce1-dc27-4676-9cf3-716e60de446a@sessionmgr4003&vid=1&hid=4214&bdata=JnNpdGU9ZWhvc3QtbGl2ZSZzY29wZT1zaXRl
Juslin,
P., & Laukka, P. (2003). Communication of emotions in vocal expression and
music performance: different channels, same code?.Psychological Bulletin, 129(5), 770-814. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/12956543
Koelsch,
S. (2011). Towards a neural basis of processing musical semantics. Physics of Life Reviews, 8(2), 89-105. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/21601541
Koelsch,
S., Kasper, E., Sammler, D., Schulze, K., Gunter, T., & Friederici, A.
(2004). Music, language and meaning: brain signatures of semantic processing.Nature
Neuroscience, 7,
302-307. Retrieved from http://www.nature.com/neuro/journal/v7/n3/full/nn1197.html
Lalanne,
C., & Lorenceau, J. (2004). Crossmodal integration for perception and action. Journal of Physiology, Paris, 98(1-3), 265-279. Retrieved
from http://www.ncbi.nlm.nih.gov/pubmed/15477038
Ngo, M., Spence, C., Percival, B., & Smith,
B. (2013). Crossmodal correspondences: Assessing shape symbolism for cheese. Food Quality and Preference, 28, 206-212. Retrieved from http://www.psy.ox.ac.uk/publications/395592
Ramachandran,
V. S. (2011). The tell-tale
brain: A neuroscientist's quest for what makes us human. New York: W.W.
Norton.
Scherer,
K.R., Banse, R., & Wallbott, H.G. (2001). Emotion
inferences from vocal expression correlate across languages and cultures.
Journal of Cross-Cultural Psychology: 32 (1),
76–92. http://emotion-research.net/biblio/SchererBanseWallbott2001
Steinbeis,
N., & Koelsch, S. (2011). Affective priming effects of musical sounds on
the processing of word meaning. Journal
of Cognitive Neuroscience, 23(3), 604-621. Retrieved from http://www.mitpressjournals.org/doi/abs/10.1162/jocn.2009.21383?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub=pubmed&
[1] For example
the sentence “The man spread the butter on his bread with his sock” would
elicit a far greater N400 than if the last word in the sentence were replaced
with knife.
[2] They later claim that intramusical meaning is indexed by the N500
rather than the N400, but we will get more into a distinction between these
categories in the discussion section.
[3] These quotes from Peirce can be found in Atkins (2010)
No comments:
Post a Comment