Labels

Monday, March 23, 2015

Hearing Shapes and Feelings: The Universal Communication of Iconic and Indexical Representations in Music

(Warning: This is an extremely technical paper, if you are not interested in the psychology of music I suggest you move on to another one of my posts. If you are not familiar with this topic but would like to expand your knowledge, I suggest you start with my post on "Math and Music".)
Abstract:
The current experiment is designed to test for a possible universal basis for human interpretations of musical signs. As far as I know no experiment has been conducted to test for universality in the decoding of a variety of musical meanings using the N400 response. By recreating Koelsch’s 2004 experiment in a culture that has never been exposed to western music, it should be possible to determine to what extent western musical signs are cultural constructs and to what extent they share universal information-carrying characteristics with all human music.
Background Investigation:
In 2004 Koelsch et al performed a novel experiment in which a semantic priming paradigm showed that by pairing words with similar and dissimilar musical excerpts one can manipulate the elicited N400 response. Since then various studies have expanded upon this paradigm in a number of ways to create a small but rich body of work, which I will review in the forthcoming paragraphs. 
In the majority of these studies EEG is measured and averaged to assess strength, time course, and, to a lesser extent, neural generators of the N400. This electrophysiological response has been shown in previous studies to be inversely related to the semantic fit between a word and its preceding contexts[1]. Concurrently, behavioral data reveals a processing advantage for words preceded by semantically related context, be it visual, musical, or lexical.
Koelsch found that musical excerpts could prime both abstract and concrete words in a variety of ways, including (i) common patterns or forms (pitch, dynamic, tempo, timbre) that resemble the features of an object (ii) mood suggestion (iii) extramusical association (iiii) intramusical meaning (tension and resolution)[2]. The N400 was unrelated to whether the subjects’ task involved rating the semantic fit of what they were experiencing. From this study Koelsch et al conclude that music can prime representations of abstract meaningful content both independent of and in relation to emotions.
In a follow up study in 2010 Koelsch and Steinbeis had both musicians and non-musicians evaluate musical samples and emotionally congruous and incongruous words. As in other studies they found that incongruous words were categorized slower and accompanied by a larger N400, but interestingly showed that three independent aspects of music could each independently be responsible for this effect; timbre, consonance vs dissonance, and major vs minor.
They go on to speculate that we would do well to investigate whether this can also be extended to modulating melody and rhythm. Finally they suggest overlapping networks for decoding verbal and nonverbal semantic information which include early and late processing. The former is related to the N400 and involves meaningful semantic but not lexical representation and the latter involves explicit interpretations and is indexed by the N600. Semantic representations are often conceptualized as the meaning behind the words while lexical representations are of the word itself.
In a pair of studies in 2009 and 2010 Daltrozzo et al showed that the 10 second musical excerpts used by Koelsch can be reduced in length to one second and still carry relevant semantic information. This temporal reduction also allowed them to reverse the prime-target ordering, so that music could now serve as target as well as prime. This change revealed an N400 of equal size and latency in both variations. By changing whether the subjects were asking to judge the relatedness of the two words (semantic decision) or whether they were actually words (lexical decision) they were able to diminish N400s, showing that endogenous or conscious attention and top down processing can affect the subconscious responses.
In an interesting variation of this paradigm Gordon et al (2010) investigated the way in which concepts from melodies and words can intertwine when stimuli are tri-syllabic words sung on three-tone melodies. They used four categories for the prime and target: same word-same melody, same word-different melody, same melody-different word, and different word-different melody. Response time and N400 were assessed, and interestingly the same word sung on a different melody produced a robust N400 response and slower reaction times, suggesting the musical features are in some way affecting the processing of the words meaning. As expected, same melody different word resulted in slower responses and greater N400, but it was the only other category to do so.
Like the previous works they also addressed whether or not attention mediated the process in some way by having two versions of the task, where participants were asked to evaluate the similarities of the melodies or words respectively. The observed effects were only slightly offset by attending to the melody rather than the words and vice-versa, suggesting mainly automatic processing of semantic information. The authors corroborated these findings with an fMRI study which they claim suggests shared resources through interactive phonological/semantic and melodic/harmonic processing.
They go on to make the evolutionary claim that infant preference for sung words cannot be attributed only to the addition of a musical dimension, it must be related to the combination of the two elements. They go on to suggest that this preference reflects a proclivity for singing based parent-infant interactions for a number of important biological reasons. In addition to fostering bonding among a parent and their newborn, it may also aid in language learning through its use of exaggerated prosody, which can both aid in segmentation and ads an emotional and therefore memorable element to the interaction. Finally, they suggest that music can be thought of as a mnemonic to aid in memory of language.
Goerlich et al (2011) conducted an experiment with alexithymia patients in order to judge their electrophysiological and behavioral response times for judging semantically related and unrelated happy/sad words, music and prosody. Alexthymia is a condition involving the identification and verbalization of the feelings and cognitive processes related to emotion. The condition exists on a spectrum from low-high, and the experimenters hoped to compare behavioral and physiological measures to characterize whether the impairment was mainly conscious or unconscious.
At a behavioral level they found reduced affective priming with increased alexithymia mainly when music and prosody were to be categorized first, i.e. when they were the prime rather than the target. However they admit that the observed unaffected emotional word processing is at odds with other studies. They also found a decreased N400 with increasing levels of alexthymia leading them to conclude music and speech were using similar acoustic features to demarcate various emotions.
Where does this meaning come from?
In a recent review paper Koelsch (2011) tries to elucidate the ways in which semantic meaning can emerge from musical stimuli. First he identifies several sources from which musical meaning can emerge, including musical aspects of speech, such as pitch, contour, range, variation, timbre, tempo and pause, and what he calls “nonlinguistic” aspects of speech, including gender, size, distance, location, etc. He doesn’t extrapolate much on how these different aspects of speech give rise to meaning, but it seems self-evident in many cases (pleasant vs unpleasant timbre, more change in pitch/increase tempo for greater emotional arousal etc).
I don’t see one of these categories being more linguistic or musical than the other. It seems to me that the first set we generally think of as relating to music when it could as easily be ascribed to language and the second set is generally not thought of as relating to music but can be used by a skilled composer to manipulate our semantic expectations. Both are important ways that meaning can emerge from both music and speech in order to contribute to semantic knowledge, and can be seen as overlapping areas on Koelsch’s Music-Language Continuum.
From there he defines three categories: extramusical sign qualities, intramusical structural relationships, and the musicogenic effect. In this experiment we are concerned with musical semantics and the N400, and therefore we will discuss in detail the extramusical sign qualities, in which Koelsch attempts to create an auditory analogue to Peirce’s original depiction of the signifier-signified relationships.
The first category is iconic musical meaning, and in my opinion is a bit messy, so well save it for last. Indexical meaning arises in music when the music hints at an emotional state by using patters similar to those used in speech prosody to index certain emotions. Finally symbolic musical meaning comes from learned cultural associations.
Koelsch gives numerous examples to describe iconic relationships rather than giving a strong definition. I believe this is due to the fact that it is in need of further subdivided, into at least three distinct subcategories. The first of these are various qualia which can be evoked by different sound stimuli, including warm, round, sharp, etc. These are examples of ideathesia, related to synesthesia where different modalities are crossed, i.e. it sounds warm when normally warmth is denoted through touch. Rather than one sense being crossed with another, an idea is crossed with a sense.
Aristotle used the term common sense to refer to qualia that can be perceived through more than one sense organ. For example, we generally think smells can only be perceived through the nose, while shape can be perceived by either the eyes or the hands. There are separate cells to detect heat and sound vibrations, and yet here we are perceiving warmth through auditory stimuli. This may be due to a complex cross modal mapping in the brain, which Ramachandran claims may be the basis of more abstract metaphors.
Many studies also provide evidence for sensory integration across modalities by means of cross-modal mapping in the brain. It has been observed that “The integration of information from different sensory modalities has many advantages for human observers including increase of salience, resolution of perceptual ambiguities, and unified perception of objects and surroundings.” (Lalanne and Lorenceau 2004). This may lead to the existence of robust cross-modal correspondences between many different stimuli, including for example “food and beverage items and shapes varying on the angular-round continuum” as observed by Ngo et al (2013). Perhaps, as Ramachandran suggests, this is the basis for more abstract metaphor like “Juliet is the sun”.
I think these semantic priming effects provide an excellent means to study different cultures and age groups to determine whether these effects are due to learned associations or innate predispositions. If it turned out to be primarily the former with little agreement across cultures then perhaps this type of icon would need to be re-categorized as a symbolic representation. However, due to universally tested examples of these ideasthetic representations from the literature (e.g. Bouba-Kiki effect), I think it is likely more than just cultural associations (Ramachandran 2011).
Another example which Koelsch gives of iconic meaning derives from an even more abstract type of cross modal mapping. This category of stimuli generally refers to a psychoacoustic effect and its corresponding theoretical concept, and rather than converting from an idea to a modality we now moving from musical ideas to another non-sense based semantic representation. In a concrete sense this could refer to ascending musical notes priming the word staircase, or more abstract ideas such as chords with notes spaced far apart priming wideness.
The final example of iconic meaning according to Koelsh is when music directly mimics something we hear in the real world such as the sounds of a bird call or a thunderstorm. But what about a saxophone that sounds like someone laughing (an example used in Koelsch’s original experiment)? This clearly combines the indexing of an emotional sound with the mimicking of that same sound. Furthermore, by using patterns in music similar to those found in speech (as Koelsch claims we do) in order to index emotions, are we not mimicking what we hear in the real world?
In order to remedy this discrepancy I propose that examples of mimicking what we hear in the world always be thought of as producing indexical rather than iconic musical meaning. Both hint at something in the real world, but one is concrete/physical (bird song) and one is abstract/nonphysical (emotional state).  Assuming prosody used to express different emotions is universal it could be hypothesized that this type of musical meaning could emerge even when individuals from different cultures rate the emotional dimensions of musical pieces they have no experience with.
To reiterate; the first and second example I agree are iconic representations because they both bear a “likeness, a community in some quality” (such as a map to the actually world), which I see as integral to Peirce’s original definition. Meanwhile, indices (both concrete and abstract) bear a resemblance to the real world that is (to quote Peirce himself) “a correspondence in fact”[3] and therefore does not require a cross-modal mapping; it sounds like a bird sounds rather than it sounds warm. Like the other two categories, symbols also come in concrete and abstract forms; the national anthem could prime the idea of the landmass we call “America” (more concrete) or the word patriotism (more abstract).
Methods:
In hopes to reduce all other extraneous variables, I would like to recreate Koelsch et al’s 2004 experiment as faithfully as possible with the exception that I would be conducting it on the Mafa people of North Africa rather than on German college students. The only difference between the two experiments should be the subject’s cultural learning in relation to music, so that we might attempt to determine the extent of the cultural basis of the encoding and decoding of different types of musical signs and referents.
I propose to use 50 subjects between the ages of 20-30. My stimuli will include the 176 significant prime-target items selected for use in the original 2004 study during the behavioral pretest. As in the original study there will be four prime stimuli for each target word; a related sentence, an unrelated sentence, a related musical excerpt, and an unrelated musical excerpt. Each prime will be used twice in relation to targets, one in which they are related and one in which they are not. The primes will be “pseudo-randomly intermixed and counterbalanced”, as in the original experiment.
Prime stimuli (sentences used as controls and musical excerpts) will be presented via computer speakers at a volume of 60 decibels. This will be followed by the presentation of a target word in the computer screen for 2000 milliseconds. After the word disappears from the screen, subjects will indicate their subjective relatedness judgments of the prime-target pair using a button press. A second experiment will replicate this first one but without the related judgment task.
EEG signals will be recorded from 32 surface electrodes placed around the scalp. The time window for statistical analysis of EEG data will be 300-500 milliseconds after exposure to target stimuli. It may also be useful to attempt to localize the neural sources of the N400 as Koelsch did, but trying to describe how we would do that is unfortunately beyond my capabilities.
Hypothesis:
Evidence shows that emotional inferences made from prosodic cues remain relatively accurate across languages and cultures (Scherer, Banse, and Wallbott). Simultaneously Fritz et al claim to have experimentally verified the universal recognition of at least three basic emotions in music. In a meta-study Juslin concludes that emotions can be accurately communicated to listeners through music due to an emotion-specific pattern of acoustic cues that resembles those used in speech prosody to communicate emotions. Importantly, as language similarities decrease so does the percent of correct decodings by subjects, though they still remain above chance levels.
From this we hypothesize at least some level of accurate decoding of emotions as well as an observed semantic priming effect in individuals who have had no exposure to western music, which is likely to correlate with their ability to decode prosodic cues in western languages. In addition, we can test the universality of our auditory induced cross-modal representations using iconic musical stimuli. It seems likely that these two types of musical signs will prove relatively robust in cross-cultural studies of semantic priming in regards to both behavioral and electrophysiological measures, though possibly only in a subset of variants. I would be rather surprised if any symbolic representations hold up cross-culturally.
If this experiment was successful it would provide strong evidence that music is not only expressive but also communicative, supporting Koelsch argument that there exists a continuum between the music and language realms, just as there exists a continuum between music and language processing. It would also provide evidence of a universal prosody used to communicate basic emotions and that this prosody can be accurately encoded and decoded by skilled composers to listeners with no cultural knowledge that a certain music styles indexes a certain emotion. In addition it could provide evidence that qualia as well as abstract concepts can be universally primed through cross-modal mapping regardless of language/culture.
Koelsch also argues that the fact that music is more expressive than communicative when compared to language is exactly what gives music the power of ambiguity, flexibility and fluidity, allowing it to mean different things at different times to different people. I agree with this assessment, but if this experiment were successful I would also add that music can also be effective as a type of universal communication medium of emotions and cross modal representations.

Bibliography:

Atkin, A. (2010). Peirce's theory of signs. Philosophy Dept, Stanford, CA. Available from Stanford Encyclopedia of Philosophy. Retrieved from http://plato.stanford.edu/entries/peirce-semiotics/

Balkwill, L., & Thompson, W. (2006). Decoding speech prosody in five languages. Semiotica, 2006(158), 407-424. Retrieved from http://www.degruyter.com/view/j/semi.2006.2006.issue-158/sem.2006.017/sem.2006.017.xml

Balkwill, L., Thompson, W., & Matsunaga, R. (2004). Recognition of emotion in japanese, western, and hindustani music by japanese listeners. Japanese Psychological Research, 46(4), 337-349. Retrieved from http://onlinelibrary.wiley.com/doi/10.1111/j.1468-5584.2004.00265.x/abstract

Daltrozzo, J., & Schon, D. (2009). Conceptual processing in music as revealed by n400 effects on words and musical targets. Journal of Cognitive Neuroscience, 21(10), 1882-92. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/18823240

Daltrozzo, J., & Schon, D. (2009). Is conceptual processing in music automatic? An electrophysiological approach. Brain Research, 1270, 88-94. Retrieved from http://www.sciencedirect.com/science/article/pii/S0006899309005186

Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., Friederici, A., & Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology, 19(7), 573-576. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/19303300

Goerlich, K., Witteman, J., Aleman, A., & Martens, S. (2011). Hearing feelings: Affective categorization of music and speech in alexithymia, an erp study. PLoS One, 6(5), e19501. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3090419/

Gordon, R., Schon, D., Magne, C., Astésano, C., & Besson, M. (2010). Words and melody are intertwined in perception of sung words: Eeg and behavioral evidence. PLoS ONE, 5(3), 1-12. Retrieved from http://web.ebscohost.com/ehost/detail?sid=d360fce1-dc27-4676-9cf3-716e60de446a@sessionmgr4003&vid=1&hid=4214&bdata=JnNpdGU9ZWhvc3QtbGl2ZSZzY29wZT1zaXRl

Juslin, P., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: different channels, same code?.Psychological Bulletin, 129(5), 770-814. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12956543
Koelsch, S. (2011). Towards a neural basis of processing musical semantics. Physics of Life Reviews, 8(2), 89-105. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/21601541

Koelsch, S., Kasper, E., Sammler, D., Schulze, K., Gunter, T., & Friederici, A. (2004). Music, language and meaning: brain signatures of semantic processing.Nature Neuroscience, 7, 302-307. Retrieved from http://www.nature.com/neuro/journal/v7/n3/full/nn1197.html
Lalanne, C., & Lorenceau, J. (2004). Crossmodal integration for perception and action. Journal of Physiology, Paris, 98(1-3), 265-279. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/15477038

Ngo, M., Spence, C., Percival, B., & Smith, B. (2013). Crossmodal correspondences: Assessing shape symbolism for cheese. Food Quality and Preference, 28, 206-212. Retrieved from http://www.psy.ox.ac.uk/publications/395592
Ramachandran, V. S. (2011). The tell-tale brain: A neuroscientist's quest for what makes us human. New York: W.W. Norton.

Scherer, K.R., Banse, R., & Wallbott, H.G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology: 32 (1), 76–92. http://emotion-research.net/biblio/SchererBanseWallbott2001

Steinbeis, N., & Koelsch, S. (2011). Affective priming effects of musical sounds on the processing of word meaning. Journal of Cognitive Neuroscience, 23(3), 604-621. Retrieved from http://www.mitpressjournals.org/doi/abs/10.1162/jocn.2009.21383?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub=pubmed&




[1] For example the sentence “The man spread the butter on his bread with his sock” would elicit a far greater N400 than if the last word in the sentence were replaced with knife.
[2] They later claim that intramusical meaning is indexed by the N500 rather than the N400, but we will get more into a distinction between these categories in the discussion section.
[3] These quotes from Peirce can be found in Atkins (2010)

No comments:

Post a Comment