Sound design can be said to consist of two aspects: the making of sounds and making decisions with those sounds. Making sounds involves a gamut of techniques that includes the processes of recording (finding and capturing a sound), signal processing (altering sounds) and synthesis (inventing sounds from the abstract realms of mathematics and electronics).
The second aspect of sound design is the creation of sonic relationships to motion-visual and photo-realistic imagery. With developments in technology, the means to achieve this first aspect have multiplied to a tremendous array of ever-increasing techniques, as the choices of hardware and software available seem to increase exponentially every few years, with the latest explosion of tools having to do with the — often gimmicky for sure — integration of AI and machine learning algorithms.
The second aspect of sound design — the concern with what to do with or how to use these sounds — is neither identical nor parallel to these processes of technological evolution. Patterns of choices made in the act of combining sounds with moving image can be understood as occurring within identifiable hermeneutic (i.e. interpretive) horizons, each of which alters the character of what is here called the audio affect image (moving image as it undergoes interpretive flux in the predisposition towards its reception).
This essay charts specific modalities that define this hermeneutic dimension to sound design, in particular integrating that aspect of hermeneutic method that concerns itself with the kinds of presuppositions and predeterminations that condition a larger range of interpretative, meaning-making actions. The five modalities discussed below can be thought of as hermeneutic predispositions that are made in advance – perhaps often unconsciously, or in relatively strict accordance with established trends or traditions, e.g. determining a ‘typical’ or ‘emotional’ filmic moment calling for needle drop music cues and so forth.
Understanding this hermeneutic flux with some conceptual specificity can contribute to a more ‘knowing’ (less naïve, less based on habitual or genre typical choice patterns) practice of sound design. For example, frequently in discussions of sound and image it is said that sound ‘enhances’ the image, and this single word ‘enhance’ is often ill- or under-defined, leaving us to ask: enhanced how, in what way, to what effect and for what reasons? A hermeneutically informed sensibility delves into more granularity of affect, as shown in Figure 1 below.
If we pay attention to the act of creating a specific relation between a sound or group of sounds and accompanying images (for ‘accompaniment’ itself is usually what brings such relationships about), we can discern that there are certain modalities of decision making that sound design seems to gravitate towards. These modes have to do with the general bearing of sound towards the image at any given instant. The subtle, or sometimes not-so- subtle, interplay and contamination of meaning that occurs between the soundtrack and the image strip results in a constant hermeneutic modulation of audiovisual percepts.
These five hermeneutic modes we might initially identify as the Real, the Ideal, the Relentlessly Real, the Musaic and the Abstract. Taken together, these headings constitute five strata of cinesonic relations that tend to govern the decision making processes of sound design. Time-based visuals with soundtracks can be said to instantiate a play between these five dispositions, the interaction of which constantly modulates audiovisual meaning.
By the Real in the sound/image relation, we mean essentially a non-problematic relationship between what we see and hear together, the ‘normal functioning’ of the world. The reality effect described here is not restricted to high fidelity in the acoustic or audiophile sense. The ‘realistic’ sound effects of films made four or five decades ago are often unnuanced, nondynamic, lacking in detail, sparsely layered and one dimensional by today’s technical-aesthetic standards, yet we understand these as limitations of the technology at the time and do not fail to understand their role in signifying the world at large depicted in the film, the objects or creatures in that world, or the actions of the characters in the same way as realistic sound effects in today’s films.
When we see a door slam, we hear a door slam. Footfalls require synchronized footsteps, a cocktail glass placed on a table requires a bump on wood or a clink of ice cubes (or perhaps both sounds together). If the scene depicted is outdoors, we expect to hear birds or traffic hum. If someone jumps into water, we hear the appropriate plunks and splashes. That’s the Real modality of sound and image, where the two more or less mirror our expectations based on our experiences of the world.
To be sure, the discourse that surrounds the continuous development of technologies deemed (at any particular historical moment) to be considered ‘hi-fi’ ground themselves in a privileged connection to ‘the Real’ or to reality, and this is not unrelated to the notion of the Real being introduced here, as sound designers aiming for this desired effect may (and most likely would) call upon the best rendered recordings available.
But as a hermeneutic determination (predisposition), the Real in sound design occurs in the intuitive bumping together of recollection and anticipation – we expect realistic sound effects because that is how we remember the world. Without the anticipated relationship of a sound to an image, we would feel that something is wrong — in fact, filmmakers will often use ‘strange’ sounds to indicate that something other than normal is happening.
The Real in this sense is thus always about the familiar appearance of the world, the nonproblematized in sensory relationships. The Real hermeneutic mode in sound design invokes the wholeness and completeness of the world, and generates the heft and depth of bodies conveyed in images of flattened light.
From the Real we can make two turns, towards the Ideal and towards the Relentlessly Real. Before we make these turns, let us note the rationale for not making these two new terms subcategories of the Real but instead giving them an equal rank. We have defined the Real as being related to the normal functioning of the world or its ‘nonproblematic’ representation. The Ideal and the Relentlessly Real would thus articulate the world functioning in some manner away from the representational forces of immediacy, transparency, fidelity, accuracy and the host of related bearings towards the photo-realistic motion- visual image, as well as that employment of sound–image relationships based on facilitating the connections between memory and anticipation.
The Ideal and the Relentlessly Real each introduce a specific alteration, which we will here characterize as an aesthetic skewing either towards pleasure (towards the spectacular, surfeit, the otoerotic, immateriality) or towards anxiety (towards extreme banality, hideousness, the gross or abject). These two modulations of the Real constitute an interpretive force that can be just as contrapuntal to any other modality under discussion in this article, which is why we allow them to stand apart, at least for purposes of clear conceptualization. In the context of sonic-moving image art and design, each is equal as an aesthetic choice and force in its own right.
The Ideal can be defined as the Real as a sensuous imaginary. It is the Real as it verges with the fantastic and illusory, but also the Real with a heightened concern with aural pleasure. The Ideal reveals the essential unity of fantasy and pleasure because the Real is precisely that which gives resistance to our fantasies, thus yielding us less pleasure.
Sound design often brings representation into the realm of the Ideal, giving us the fantasy of the thing rather than trying to just give us the thing, as when the sounds of a lion in heat and a baby’s cries might be blended into the squealing of car brakes during a chase sequence, or recordings of cat screeches used as the basis of the laser Frisbees in Tron (Steven Lisberger, 1982).
Thus, when talking about the Ideal in sound design, we can include the sounds of the dinosaurs in Jurassic Park (Steven Spielberg, 1993), the sounds of the lightsabers of Star Wars (George Lucas, 1977), or the punching orchestrations in Raging Bull: the Ideal gives us either the reality of the fantasy (‘realistic’ sounds of fantastic things like monsters and spaceships) or a fantastic representation of real things (i.e. the above-mentioned animali-like squealing tires). The Ideal happens when the imaginary side of the Real is fleshed out, when we begin to leave behind the prosaic or everyday attitude towards reality and open meaning up to its mythological or psychological dimensions.
However, we need not focus too much on obviously fantastic content. Most fight scenes in films, for example, give us a kind of Ideal Fighter – fighters whose punches make spectacularly loud impact sounds, indicating an amplified and impervious body capable of delivering and receiving those sound-accentuated body blows. The sound effect in the case of standard fight sequences produces an ideal body, a body stripped of its Real limitations and transposed into a dematerialized vessel of will – actors in film slugging it out are all creatures of light and acoustic impact. This notion of the Ideal tries to capture this spectrum of audio as a fantasy complement to the moving image.
The Relentlessly Real names that effect produced when the Real is drained of its pleasurable aspects and offered to us in a mode that might produce the effects of anxiety, displeasure, nausea or the abject — this form of the Real is existence sans essence, the starkness of being with attenuated significance.
The Relentlessly Real turns towards the direction of anti-art, a withdrawal of the aesthetic away from the beautiful and towards the grain of brute actuality. This ‘brutality’ is why it is called here ‘relentless’, as it represents an assault on our senses. At the same time, it can be understood as an increase in the resistance of the Real, a move in the opposite direction of Ideality and will amplification, pleasure and sensorial access. Examples from four films will suffice to illustrate the idea of the Relentlessly Real — Jan Svankmajer’s Alice (1988), Stanley Kubrick’s 2001: A Space Odyssey (1968), Akira Kurosawa’s Red Beard (1965) and Kieslowski’s Decalogue 2 (1989).
The sound effects of a Svankmajer animation usually exhibit a kind of grotesque realism, i.e. the sound effects will be realistic in that they may be tied directly to the action, but will be manipulated in terms of their percussiveness (the sounds are often sudden and/or brief), dynamics (the sound effects are often of exaggerated loudness) and frequencies (the sounds tend to be rich in high frequency content, as might be found in the process of sharpening knives).
This makes them less about rounding out the acoustic dimension of the scene and more about the desire to disturb the listener, to produce high amounts of anxiety, and unsettle the relation of sound and image rather than aim at a familiar or pleasurable seamlessness that simply represents a scene’s action. The Real takes on a new quality of relentlessness, as though we have to force ourselves to listen to someone scrape their nails across a chalkboard for the duration of the film.
In 2001, Red Beard and Decalogue 2 we find instances of the Relentlessly Real that are tied to our bodies’ organic processes and to the thematics of mortality inherent in an increased awareness of the carnal. In 2001, a long sequence of an astronaut in his space suit gives us two very distinct and sparse sounds, the hiss of the air supply hose and the labored breathing of the astronaut. This is essentially what one might call a ‘pulse and drone’ structure, with the hiss a constant and unaltering drone, and the breath providing a pulse to mark time, a breath that also introduces minute variations in tempo, pitch and duration.
What is interesting is the length of the sequence — this minimal sound composition lasts for six minutes of screen time, an austere sound build that cuts across several scenes completely uninterrupted by other audio elements, including Bowman in the cockpit, out of his space suit and sitting comfortably at the controls, and a long shot of the space ship in which a meteor hurtles silently towards us in the foreground. The breath and hiss tonalities create a fixation on the real interior of the space helmet, creating a level of tension and anxiety that paradoxically produces an extreme distance between sound and image. These very literal sounds are pushed to the point of almost becoming an abstraction of the scene.
Red Beard features an up-close and amplified death rattle sequence. An old man lies on the floor, in a room almost without ambience. He is dying before us for several minutes; the coughs, gurgles and clicks in his throat are all there is for us to listen to through amplified, microphonic intimacy. What makes this an instance of the Relentlessly Real is an unusual attention to detail bent on evoking a visceral response in the listener.
Finally, Decalogue 2 gives us the patient Andrzej, who lies in bed uncertain of his recovery in a hospital ward. In two scenes we are brought close to the nausea of his existence in the hospital, particularly through acoustic closeups on the water dripping from the pipes overhead, the plinks and plops of which perform their water torture on us.
The second sound we are forced to concentrate upon is his labored breathing. The camera wanders across his face, body, blankets, bed rails, the walls and pipes of the room, and the bucket in which the water accumulates. The constantly dripping water undergoes perplexing undulations of timbre, and its juxtaposition against the breath of a sick man accomplishes a draining of meaning — there is a torpor to the intimation of Andrzej’s death, a sickly unease. These sparse elements of the soundtrack bring the background into the foreground, like death itself.
For further examples of the Relentlessly Real, we can choose those vérité documentaries that bombard us with ‘bad sound,’ e.g. sound that it is more painful than pleasurable to listen to — muddy, noisy, echoey, distorted or indistinct sound that we listen to because of the vérité ideology that equates the Real with the technologically raw, and the unreal with that which has been refined or extensively processed and thus become more ‘artificial’ or ‘unnatural.’ The Relentlessly Real is a relatively uncommon choice of sonic decision making in film because usually we want audiences to be pleased, but remains an option at those moments when one seeks the psychological effect of deep discomfort in the audience, or when one seeks neither the normality nor the ideality of the world, but rather its viscosity or resistance.
The Musaic modality of sound design names the relation of music to moving image. While the tradition of using music in the movies also owes a debt to Tin Pan Alley, vaudeville, opera, the circus and other forms of musical event entertainment, the term ‘Musaic’ is used here because it seems to be the case that almost any kind of film sequence is capable of ‘absorbing’ any kind of music. You can learn this easily by simply trying out different kinds of music tracks on the same video clip, and seeing just how many different ones can work quite well.
Films are ‘porous’ to music, which tends most often towards certain easily definable roles. Music may try to depict the emotion in the characters, or it may try to create emotion in the viewers — in both of these instances, it is accomplishing the ‘representation of the will’ that Schopenhauer and Wagner held as the highest goal of music.
Film music might also flesh out the mis-en- scène, giving us cues of time and place, either diegetically (e.g. music from the jukebox or car radio in a scene) or non-diegetically, suggesting locale through ethnic or ethnographic exoticism. Music might also be used to delineate the moral plane of the film, employing major modes when ‘the good guys’ are happy, and minor modes when they are sad or distressed, or calling upon dissonance, noise and tritones to depict evildoers, aliens and monsters, as in the opening sequence in Star Trek: The Motion Picture (Robert Wise, 1979) wherein the ‘good guy Klingons’ face off against the mysterious and deadly nebulae (the Klingons here getting a very Wagnerian major mode horn treatment relative to the menacing cloud, scored with electronically synthesized tritones).
There is also use of music that turns the film into a sort of music video, as in Run Lola Run, in which the images become a kind of light show to the score’s beats and grooves. A less predictable use of music would be its unempathetic use — music that resists being pinned down emotionally. We might consider the use of the piano in Eyes Wide Shut a use of unempathetic music, because the piano is mic’d so closely and mixed so loudly that it is impossible to watch the scene without thinking about the mechanics of the piano, and the pianist’s hands striking the keys with extreme force.
Or we can consider the score of a Brothers Quay animation, where the composer is not shown the film but merely instructed to compose music that has the same length as the film. Thus the music’s effects will be fortuitous – caused by the happening-at-once of sound and image, and the chance relationships that come about due to that simultaneity – and will not owe its effects to the composer’s or filmmaker’s intention to supply emotional meaning to specific scenes.
The term ‘Musaic’ tries to capture this complete permeability of film to music, the manner in which one can think of a film as a ‘visual score’ at any moment capable of undergoing the affective transformations of a music cue or needle drop.
An exception to this might be a use of diegetic music that is backgrounded and merged into the ambience such that it does not takeover the soundtrack and ‘comment’ on the scene in such a way as to dominate interpretation of feeling or pacing — faint muzak or radio music in the distance would be examples of music essentially functioning as a sound effect. This use of music as an ambient effect is primarily a matter of dynamics, of the relative volume of the other audio elements in the mix — the louder the diegetic music, the more it has the potential to emerge from the ambient tapestry and become non-diegetic musaic ‘score’ to the scene.
Finally, the Abstract hermeneutic mode applies to the play of formal elements that do not serve purposes related directly to the mimetic demands of photorealistic motion-visual imagery (and it should be clear that in the elaboration of these five hermeneutic modalities, we are primarily concerned with moving images of a photorealistic character, not, for example, with film animation of abstract shapes as might be found in modernist experiments).
Abstract sound design techniques foreground the act of perception, the reflexive dwelling on the fact that, while seeing and hearing something, we are seeing and hearing an artefact, and perhaps having aspects or thresholds of our senses made into a theme. The Abstract reinforces the relative autonomy of the audio and visual domains, and tends to adhere to the goals of ‘absoluteness’, in that a sound might not have a referential or identifiable meaning, or its meaning might be that it lacks a meaning.
Music, we remember, recalling the old debate between programmatic and absolute music, is usually used in films to fulfill certain programmatic intentions, even if originally scored in an ‘absolute’ mode — for instance, using a Bach fugue to instill nostalgia in a viewer — here absolute music is being used programmatically.
The Abstract is instantiated in the gaps and chasms of correspondence between sound and image, engages with the resistance between the conceived and the perceived, and plays with the disengagement from language (linguistic meanings) and reality (as given in the imagery). The Abstract verges on that part of perception that does not yield all to semantic understanding but which is perfectly within the bounds of our senses.
The Abstract in sound design means either the severing of the connections between sound and image, or the nonsensical (meaning, uninterpretable) co-joining of the same. For examples of abstract sound design we can consider the use of Ligeti during the geometrical atmospherics at the end of 2001.
Sound effects editing for purely formal editing reasons, like pace and transition, as in the occasional montage editing to jet-like sounds in Shaun of the Dead (Edgar Wright, 2004), can also be understood as sound design in the Abstract mode.
A classic instance of abstract sounds in film would be the electronic sounds heard during the opening of The Conversation (Francis Ford Coppola, 1974), in which a garbled, vaguely vocalic sonic texture announces itself prior to serving a diegetic role in the film (out of phase tape recordings) – this initial ‘sound without a source’ calls attention to disparities between what we see and hear in a film, creating an ‘in-between’ space that fissions the sense-worlds of the eye and ear, which usually enjoy a high degree of synaesthetic fusion, both in film and in life.
Abstract sound design is rarely employed in mainstream film, though its use is fairly common in experimental practices (for instance, in Michael Snow’s Wavelength (1967), in which static electronic tones accompany a 45-minute slow zoom).
There is a ‘flavor’ of abstract use of sound with moving image that could perhaps be termed something like ‘the reduced syncretic’. This would occur when the abstract sound element retains some causal or thematic linkage to the film at large. Recalling Wavelength, the use of a sine wave tone over a very slow zoom in an apartment can be contrasted to the sine wave that is the only soundtrack element during a scene in the film Cop Land (James Mangold, 1997), in which Sylvester Stallone’s character is temporarily deafened by an explosion. He is deaf in one ear, as the result of a childhood car accident, so deafness and tinnitus become occasional elements of the sound design.
The audience is given a unique moment of POV sound, the high-pitched sine wave of the character’s damaged eardrums. Here the sound design is highly similar to that in Snow’s Wavelength, an electronically produced sine wave against moving photorealistic imagery. However, the Cop Land scene situates the sine wave soundtrack relative to a causal narrative event.
These five hermeneutic modes of audiovisual meaning — the Real, Ideal, Relentlessly Real, Musaic and Abstract — can of course coexist simultaneously at any moment in a film. Imagine a character walking down the street. Sad string music accompanies the sound of traffic and street vendors — here the Musaic and the Real are co-joined in a seamless moment of musical- emotional commentary on the character’s (subjective) inner life in the midst of the city (objective world).
If, however, we remove all the sounds of the world, and raise the music, then the scene becomes purely Musaic. Then let’s have the character slip on a banana peel — if the squish of the peel on pavement is grotesquely audible and his skull splatters with gruesome detail, we have moved towards the Relentlessly Real. Bring in the traffic noise, drop the music, show the crowd gathering around and murmuring, and we are back in the frame of the Real.
But if a military helicopter flies overhead, and surround sound is used to articulate the aircraft’s motion across the scene, and all sorts of electronic sounds and filtering are used to alter the sound so that our ribcages are plucked like strings by the loud subharmonic frequencies of the rotor blades, then we are now blending in aspects of the Ideal. Remove the helicopter sound and all city sound effects and then leave on the soundtrack just some abstracted remnant of the helicopter in the form of a loud drone that lasts for fifteen minutes, totally severing the relation between what we are seeing and hearing, and then we have moved into the territory of the Abstract.
If we then add to the mix the character’s cellphone ringing in his pocket as he lies on the pavement, then we are bringing back into the mix aspects of the Real. What this thought experiment aims to show is that the soundtrack is often in a state of hermeneutic flux with regard to these five determinations, and might be constructed of one or several of these modes at any given time, or constantly shifting between them.
None of the five modes can prevent another mode from co-forming the overall audiovisual topology of any given sequence, and any can predominate in a specific scene. While sounds themselves can be infinite in their particularities, these five modes tend to constrain choice on a meta level, shaping cinesonic meaning in the creation of sound–image relationships.
We can also take note in our analysis of certain ‘border cases’ that we can identify in common techniques, such as the Foley practice called ‘gun life’. Gun life is the practice of inscribing a slight sound of rattling metal whenever a gun is pulled out and wielded but not fired. Guns in ‘real life’ of course are completely silent when handled, but in films, there is usually placed in the soundtrack a small sound of clinking metal that Foley artists have dubbed ‘gun life’, which gives a slight acoustic presence to a gun as it is handled by the character — this is also employed even when the gun in question is of a laser or phasor variety in sci-fi. In the context of sound design practice, this seems to have emerged because ‘it just sorta seems right’ in the context of audiovisual experience. In terms of our outline, it can perhaps be understood as a small modulation towards Ideality – a gun, after all, is a primary carrier of a character’s desire in cinematic imagery.
While this essay has been primarily concerned with the process of defining relationships between sounds and images specific to time-based audiovisual media such as film and video, there is no reason why what is argued here could not apply equally to other audiovisual forms such as installations and the vast interdisciplinary realm of ‘new media’, many of which are not so new anymore and show marked continuities with cinematic precedents in sound– image confluences.
An exploration of these modes can help us to understand better the interpretive flux of sound–image relationships — the way meanings and resonances shift over time between the different symbolic registers of eye and ear.
Originally published in The Soundtrack, based on my sound design lectures — in other words, originally published as PowerPoint slides!