Audiovisual Aesthetics | A View from 10,000 Feet (or so) Part 1

Jonas Mekas’ typology of experimental film is still enormously useful as a high level conceptual scheme for defining a vast territory of aesthetic practices in moving image media:

– Documentary
– Cinepoem
– Cineplastics
– Narrative

To this list of categories we might add Performance and Animation, except that both of these types of audiovisual production easily cut across these four Mekas categories. A performance film or video, for example, can have a very documentary-like character to it, make use of lyrical abstraction, or tell a story etc. There can be moments of some works, however, where the moving image seems to take on something truly ‘performative’ that is not quite encapsulated in these four ‘ur-categories,’ so it’s worth keeping an open mind as to whether to formally expand Mekas’ categories to include these other two possibilities.



The four concepts of Mekas’ could be further analyzed to be part of a matrix composed of two spectrums of possibilities, that between Documentary and Cineplastics, and between Narrative and Cinepoetry.

The logic here is straightforward — classically, documentary provides a ‘window onto the world’ and the focus is on what’s in front of the lens, not the lens and camera or editing apparatus, which make up the medium of film or video which is the interest of Cineplastics. So the spectrum between Documentary and Cineplastics is the aesthetic range between the ‘real world’ as represented in documentary style conventions, versus a strong interest in the moving image medium itself at the Cineplastics pole. Let’s contrast these.


A good ‘first stop’ on the Cineplastics tour is Stan Brakhage’s hand-painted films. In this version, a Vimeo user has added a musical soundtrack, so turn off the sound since the film is supposed to be silent.


Cineplastics has two related but oscillating poles of attraction — the materiality of the medium, and the conditions of human perception, since Cineplastic works are ultimately addressed to our eyes and (usually) ears. Cineplastic approaches — where there is no ‘window onto the world’ but instead a media ‘surface’ to experience, is common in end credits and title sequences (amongst other places in popular media). Here is an homage to Stan Brakhage in the (bootlegged!) end credits to the film The Jacket:

So that’s the quick tour between the world in front of the medium — or Documentary — and the medium itself or perception as the subject, or Cineplastics.

The spectrum between Narrative and Cinepoetry is based on a distinction in the logic of construction, which can be defined as the difference between Causality and Association. Narrative is all about cause and effect — A produces B because the laws of time and space apply in a way in which an outcome is produces by something that logically precedes it. In poetry, this logic of causation is often absent and instead there is another form of construction based on associations. Here’s a snippet of narrative, for instance:

I begin tucking him into bed and he tells me,

“Daddy check for monsters under my bed.” I

look underneath for his amusement and see him,

another him, under the bed, staring back at me

quivering and whispering, “Daddy there’s

somebody on my bed.” (source)

Contrast this coherent flow of events above, with the haiku below:

The apparition of these faces in the crowd;

Petals on a wet, black bough. (Ezra Pound)

As with everything else in this essay, whole books have been written on the difference between narrative and poetry, so if you are a high-powered critical scholar and this feels like too rushed a distinction, you’re free to argue back in your own blog posts! As stated previously, I’m painting broad strokes, here. Anyway, out of millions of examples, here’s two to contrast the concepts.



Cinepoetic visual compositions are actually quite common in mainstream media. Many if not most title sequences, dreams, flashbacks, alien trauma, war trauma, demon possession, drug, montage, and monster attack scenes (I might be forgetting a few categories!) involve cinepoetic construction. The ‘death video’ in The Ring, for example, is basically an experimental art school film that will kill you if you watch it.

Persona’s opening scene is a famous example of cinepoetics at the start of a film. In today’s Trigger Warning Era, I should point out that the footage is not for those who dislike seeing violent corporeal unpleasantries, but since this is a film classic, I don’t really want to be an extension of the Censor’s Office, either (click on image to view video).

Click on image to view video.

Category Deep Dives

For each of these four high level concepts of audiovisual practice — Documentary, Cinepoem, Narrative and Cineplastics — once can always do deep dives into each one of them. For instance, Bill Nichols has a well-known typology of documentary modes:

– Poetic mode
– Expository mode
– Participatory mode
– Observational mode
– Reflexive mode
– Performative mode

Hybridization of Categories.

Just as you can take any of these high level categorical mappings of audiovisual practices and deep dive for the granular refinement of conceptualizing delight, you can also hybridize across the categories, which the two-axis matrix above actually offers as a major possibility, since most of the space in its implied grid by implication involves a mixture of all the categorical end poles. Let’s look a few ‘hybrid’ (according to this analytical scheme) examples.

Narrative and Animation

Documentary and Narrative

The possibilities are endless, so these examples are really just meant to give a taste of the variety that’s possible by using the 4-pole matrix as a general aesthetic possibility space.

AUDIO-visual Aesthetics

Now we have to bring audio specifically into the discussion. The first stops, from a historical perspective, are Sound vs. Silence and Asynchronicity vs. Synchronicity. Silent films were actually full of conceptual sounds — when you watch them, they are full of sound-making objects and events, which were supposed to inspire you to hear sounds in your head, whether or not there were sound artists and musicians producing sound live and theatrically in tandem with the film projection.

Sound effects performers behind the projected image. Source.

Synchronicity and Asynchronicity

In this section, I will be a tad ‘lazy’ and just cite some of my earlier writing on this topic:

The first significant technological process of matching objects to events in the history of sound-film was synchronization, or the matching in time of sounds with their images — the rich spectral tonality, dynamic range and spatial articulation of sounds would wait several decades for their refinement….Asynchronicity, however, as a principle of film editing, did not of course imply the playing of sounds out of sync with each other, which would just read as a technical error after sync’s technical accomplishment had come to pass. Rather asynchronicity emerged as a contrarian approach specifically opposed to what was deemed a “slavish” relationship between sound and image created by sync technology. This contrarian ethos was expressed very shortly after the dawn of the talkies. In contrast to what was perceived as an excessive American naturalism in the sound-image relationship, certain European (especially Russian and French) artists and theorists advocated for a contrasting approach in which sound and image would work against each other akin to counterpoint. In 1928, Eisenstein, Pudomn and Alexandrov issued a short manifesto, “A Statement” against naturalism in sound-image representation:

Sound recording is a two-edged invention, and it is most probable that its use will proceed along the line of least resistance, i.e., along the line of satisfying simple curiosity.

In the first place there will be commercial exploitation of the most salable merchandise, TALKING FILMS. Those in which sound recording will proceed on a naturalistic level, exactly corresponding with the movement on the screen, and providing a certain “illusion” of talking people, of audible objects, etc.

To use sound in this way will destroy the culture of montage, for every ADHESION of sound to a visual montage piece increases its inertia as a montage piece, and increases the independence of its meaning-and this will undoubtedly be to the detriment of montage, operating in the first place not on the montage pieces but on their JUXTAPOSITION.

ONLY A CONTRAPUNTAL USE of sound in relation to the visual montage piece will afford a new potentiality of montage development and perfection.

THE FIRST EXPERIMENTAL WORK WITH SOUND MUST BE DIRECTED ALONG THE LINE OF ITS DISTINCT NONSYNCHRONIZATION WITH THE VISUAL IMAGES. And only such an attack Will give the necessary palpability which will later lead to the creation- of an ORCHESTRAL COUNTERPOINT of visual and aural images. (Eisenstein et al.)

The reference to music is especially striking because musical practices, specifically around the placement of performers across a stage and its rendering as spatial audio, would later contribute significantly to many a multichannel naturalist approach!

A distinction was drawn between the capability of the medium and its employment for artistic ends. In 1929 Pudovkin expressed the asynchronist aesthetic in “Asynchronism as a Principle of Sound Film.”

But there is a great difference between the technical development of sound and its development as a means of expression.

What new content can be brought into the cinema by the use of sound? It would be entirely false to consider sound merely as a mechanical device enabling us to enhance the naturalness of the image.

The role which sound is to play in film is much more significant than a slavish imitation of naturalism on these lines; the first function of sound is to augment the potential expressiveness of the film’s content.

It is clear that this deeper insight into the content of the film cannot be given to the spectator simply by adding an accompaniment of naturalistic sound; we must do something more. This something more is the development of the image and the sound strip each along a separate rhythmic course. (Pudovkin)

If the principle of montage had been built on the clash of opposites, dialectically opposing contrasting elements against each other with a film cut, the addition of sound literally doubled this doubleness since each cut could contrast two image and two sound tracks: “wherever in silent film we had a conflict of but two opposing elements, now we can have four.” Note that what is advocated is not a severing of time as a principle of diegesis — e.g. the real time of the unfolding narrative– but rather what is proposed is a skeptical attitude toward all forms of naturalistic multimodal representation whereby visually depicted events are presented with their sonic aspect intact. The specific form of contrast and counterpoint advocated by Pudovkin and others was to use sound and image to render separately external-objective and internal-subjective sequences.

Always there exist two rhythms, the rhythmic course of the objective world and the tempo and rhythm with which man observes this world. The world is a whole rhythm, while man receives only partial impressions of this world through his eyes and ears and to a lesser extent through his very skin. The tempo of his impressions varies with the rousing and calming of his emotions, while the rhythm of the objective world he perceives continues in unchanged tempo.

The course of man’s perceptions is like editing, the arrangement of which can make corresponding variations in speed, with sound just as with image. It is possible therefore for sound film to be made correspondent to the objective world and man’s perception of it together. The image may retain the tempo of the world, while the sound strip follows the changing rhythm of the course of man’s perceptions, or vice versa. This is a simple and obvious form for counterpoint of sound and image. (Pudovkin)

Also in 1929, René Clair proclaimed that “the world of noises seems far more limited than we had thought.” Within just two years of The Jazz Singer, auditory clichés were already noticeable to film writers.

Although the talkies are still in their first, experimental stage, they have already, surprisingly enough, produced stereotyped patterns. We have barely “heard” about two dozen of these films, and yet we already feel that the sound effects are hackneyed and that it is high time to find new ones. Jazz, stirring songs, the ticking of a clock, a cuckoo singing the hours, dance-hall applause, a motorcar engine, or breaking crockery-all these are no doubt very nice, but become somewhat tiresome after we have heard them a dozen times in a dozen different films. (Clair, 1929)

Clair was in agreement with the Soviet montagists that “It is the alternate, not the simultaneous, use of the visual subject and of the sound produced by it that creates the best effects.” Writers in this period attuned to the ‘non-slavish’ use of sound (the idea of sound as a slave to image has been a constant thread of audiovisual discourses from its origins to the present day) often give similar examples of ‘ideal’ uses of sound and image for counterpoint. One motif that recurs is the recommendation to use sound depicting goings-on in the world and use image to focus on an emotional reaction or response shot. Like the objective-subjective polarity in the use of sound-image contrasts espoused by the Russian montage school, Clair writes:

For instance, we hear the noise of a door being slammed and a car driving off while we are shown Bessie Love’s anguished face watching from a window the departure which we do not see. This short scene in which the whole effect is concentrated on the actress’s face, and which the silent cinema would have had to break up in several visual fragments, owes its excellence to the “unity of place” achieved through sound. In another scene, we see Bessie Love long thoughtful and sad; we feel that she is on the verge of tears; but her face disappears in the shadow of a fade-out, and from the screen, now black, emerges a single sob.

In these two instances the sound, at an opportune moment, has replaced the shot. It is by this economy of means that the sound film will most probably secure original effects.

Bresson’s famous “Notes on Sound” (1975) likewise express the asynchronist aesthetic with great “economy of means,” to use Clair’s phrase:

● What is for the eye must not duplicate what is for the ear.

● If the eye is entirely won, give nothing or almost nothing to the ear. One can not be at the same time all eye and all ear.

● When a sound can replace an image, cut the image or neutralize it. The ear goes more toward the within, the eye toward the outer.

● A sound must never come to the help of an image, nor an image to help the of sound.

● If a sound is the obligatory complement of an image, give preponderance either to the sound or to the image. If equal, they damage or kill each other, as e say of colors

● Image and sound must not support each other, but must work each in turn through a sort of relay.

● The eye solicited alone makes the ear impatient, the ear solicited alone, makes the eye impatient. Use these impatiences. Power of the cinematographer who appeals to the senses in governable way. Against the tactics of speed, of noise, set tactics of slowness, of silence.

Writing historically mid-way between the Russian montage theorists and Bresson, Béla Balázss theoretical work is more meditative and literary in style, neither manifesto nor the short working notes of a master of film craft. Balázs (1952) is particularly interesting for this discussion for his poetic description of the effects and limitations of mono. Writing in the middle of what Kerins (2011, p.329) has called ‘The Mono Era’ (1927-late 1970s), Balázs pays considerable attention to mono’s lack of spatial detail.

Sounds Throw No Shadow

Auditive culture can be increased like any other and the sound film is very suitable to educate our ear. There are however definite limits to the possibilities of finding our way about the world purely by sound, without any visual impressions. The reason for this is that sounds throw no shadows. In other words that sounds cannot produce shapes in space. Things which we see we must see side by side; if we do not, one of them covers up the other so that it cannot be seen. Visual impressions do not blend with each other. Sounds are different; if several of them are present at the same time, they merge into one common composite sound. We can see the dimension of space and see a direction in it. But we cannot hear either dimension or direction. A quite unusual, rare sensitivity of ear, the so-­called absolute ­is required to distinguish the several sounds which make up a composite noise. But their place in space, the direction of their source cannot be discerned even by a perfect ear, if no visual impression is present to help.

It is one of the basic form­ problems of the radio play that sound alone cannot represent space and hence cannot alone represent a stage.

Sounds Have No Sides

It is difficult to localize sound and a film director must take this fact into account. If three people are talking together in a film and they are placed so that we cannot see the movements of their mouths and if they do not accompany their words by gestures, it is almost impossible to know which of them is talking, unless the voices are very different. For sounds cannot be beamed as precisely as light can be directed by a reflector. There are no such straight and concentrated sound beams as there are rays of light.

The shapes of visible things have several sides, right side and left side, front and back. Sound has no such aspects, a sound strip will not tell us from which side the shot was made. [italics added]

These passages almost read like a film critic’s prayer for someone to invent Dolby Stereo, a film critic who perhaps never experienced Fantasound on one of its two American systems, which one would expect of a Hungarian Jewish writer of German descent. Mono is indeed “one common composite sound” and lacks the spatial form that visual images possess by default. Balázs does note one spatial capacity of mono, which is the same capacity of any amplified technology, namely that sound amplification allows for the audition of subtle and quiet sounds or what he refers to as “the intimacy of sound” and the “acoustic close-ups.” Read more closely, Balázs even seems to be anticipating audio beamforming technology in his comment that “sounds cannot be beamed as precisely as light.” Beamforming techniques applied to audio speakers address if not solve this problem today.

At Philips Research, we have developed technology that enables two people in the same room to hear the same audio output at different volumes. Based on audio beamforming, this technology is a new application of an old idea, made possible by the falling cost of computing power. We were granted a patent for our audio beamforming technology in 2012.

Imagine two people in a room watching television — say, the Netherlands winning in the World Cup…. One is slightly deaf and needs the volume high, but the other isn’t interested in soccer and wants the volume low. This effect could be achieved by mounting several speakers throughout the room, but this solution would require unsightly trailing wires and time-consuming installation. Getting the same result from a single array of speakers mounted on the TV is an attractive alternative, but is technically challenging. We addressed that challenge by creating a detailed MATLAB® simulation that provided us with a means of calculating the loudspeaker parameters we needed for beamforming. (de Brujin, 2013)

Echoing the French and Russian theorists, Balázs echoes approval of asynchronism as an aesthetic principle.

Asynchronous Sound

In a close-up in which the surroundings are not visible, a sound that seeps into the shot sometimes impresses us as mysterious, simply because we cannot see its source. It produces the tension arising from curiosity and expectation. Sometimes the audience does not know what the sound is they hear, but the character in the film can hear it, turn his face toward the sound, and see its source before the audience does. This handling of picture and sound provides rich opportunities for effects of tension and surprise.

Asynchronous sound (that is, when there is discrepancy between the things heard and the things seen in the­ film) can acquire considerable importance. If the sound or voice is not tied up with a picture of its source, it may grow beyond the dimensions of the latter…. The surest means by which a director can convey the pathos or symbolical significance of sound or voice is precisely to use it asynchronously.

A more moderate position, between slavishness and asynchronicity, is often expressed by film sound practitioners, who rightly note that these two tendencies are often found used within the same film, alternating across scenes or cuts.

Image and sound are linked together in a dance. And like some kinds of dance, they do not always have to be clasping each other around the waist: they can go off and dance on their own, in a kind of ballet. There are times when they must touch, there must be moments when they make some sort of contact, but then they can be off again…. Out of the juxtaposition of what the sound is telling you and what the picture is telling you, you (the audience) come up with a third idea which is composed of both the picture and the sound and resolves their superficial differences. The more dissimilar you can get between picture and sound, and yet still retain a link of some sort, the more powerful the effect. (Paine, 1985, p. 356)

Synchronous sound establishes several orders of synchronicity. First, as a baseline of audiovisual multimodal technology (sound, image, music, color, light, text, gesture, movement, and all the sensory modalities that sound-film can evoke and make use of as its material), it produces its ‘naturalistic’ or, alternately, ‘illusionistic’ effects of sound and image fusion. Secondly, this solidifies the temporal diegesis so that asynchronous events of contrasting image/sound and objective/subjective renderings all still can be said to be occurring “at the same time.” A third order might be said to be introduced with sound-driven montage, where an auditory rhythm in the soundtrack motivates abrupt changes in image content based on the auditioned beats.

A Matrix for Sound Aesthetics

Similar to the 4-pole (or 2-axis) possibility space above for mapping the aesthetic terrain of moving image practices, we can do something similar with sonic forms.

This matrix combines a low level acoustic spectrum — that between Noise and Pitch — and a higher level artistic or practical distinction, between Similarity and Contrast. The noise vs. pitch difference is a straightforward contrast from acoustics, where the waveform shape is either regularly repeating (periodic) or randomly progressing (aperiodic) in its temporal flow.

As it turns out, I’ve published on this topic as well, so I will just cite myself again if you don’t mind:

In The Poetics of Music, Stravinsky borrows the ideas of the Russian philosopher Pierre Souvtchinsky to elucidate music as the structuring of similarity and contrast. Souvtchinsky had defined music as being of two kinds, ontological (based on similarities) and psychological (based on contrast):

Mr. Souvtchinsky thus presents us with two kinds of music: one which evolves parallel to the process of ontological time, embracing and penetrating it, inducing in the mind of the listener a feeling of euphoria and, so to speak, of ‘dynamic calm.’ The other kind runs ahead of, or counter to, this process. It is not self-contained in each momentary tonal unit. It dislocates the centers of attraction and gravity and sets itself up in the unstable; and this fact makes it particularly adaptable to the translation of the composer’s emotive impulses…

Music that is based on ontological time is generally dominated by the principle of similarity. The music that adheres to psychological time likes to proceed by con- trast. To these two principles which dominate the crea- tive process correspond the fundamental concepts of variety and unity. (Stravinsky 2007: 31)

Similarity and contrast can be thus discussed as phenomenological manifestations of periodicity in that the similar may be viewed as periodic and the contrasting as aperiodic. These represent a ‘macro’ correspondence to the pitch/noise ‘micro’ feature of periodic and aperiodic waveforms, and together these two sets of oppositions can form a two-axis heuristic. On one continuum similarity and contrast represent the phenomenological manifestations of a/periodicity; on the other, noise and pitch represent the acoustic science of a/periodicity.

While scientific representations of periodicity (orbits, rotations, oscillations, heartbeats, breath, etc.) certainly ‘leak’ into any phenomenological assessment of periodic behaviors, it is important that the acoustic science remain distinct from phenomenological interpretations (the latter describes lived experience; the former is based on scientific method and mathematical modeling). For example, it is possible to acoustically generate an aperiodic waveform so as to exhibit little variety or contrast, and thus be phenomenologically periodic (e.g. monotonous noise).

The musical examples mapped in the matrix above have been selected to be illustrative of this dual-periodic (pitch– noise, similar–contrasting) aspect to articulate the field out of which acoustic images emerge. The selected section of Philip Glass’s Music in Fifths is a series of whole tones repeatedly ascending from the first to the fifth and descending back to the first. The key feature of this example is the repetition of the same five pitches in a similar pattern; however, the progression is not always exactly the same and occasionally skips one, two or three notes while descending, which has the effect of breaking the similarity that is reinforced by its repetition, resulting in slight changes in timing, a sense of incomplete phrases, and interrupted cadence.

Pitch/Similarity Quadrant

Christian Marclay’s Guitar Drag exhibits guitar distortion and overdrive primarily in a single pitch consisting of a rich rhythm of repeated boiling pops and clicks that blend together into a noisy drone. Again the backdrop of the constancy of noise is broken by occasional variance in tonal colour, volume and rhythm attracting the attention of the listener without completely pulling the focus out of immersion in the drone.

Noise/Similarity Quadrant

Unyoga, by the Chlorgeschlecht collective, is segmented into a variety of different short moments of noise. The segments alternate between moments of quiet, to clicks and pops over a high-pitched sine wave, to various loud chaotic moments of noise consisting sometimes of crunchy guitar chords based off a minor third or drones of guitar distortion or vocal screaming or rapid drum beats, or combinations of these overlapping. This piece elicits the noisy contrast, not just in the alternation between quiet and loud but also in the variety of dif-ferent kinds of noise presented.

Noise/Contrast Quadrant

The selected section of Edgar Varése’ Poème électronique elaborates a variety of sound events of various pitches and glissandos covering a broad range of electronic timbres from single voice waveforms to rich gong sounds performed in varying lengths from short beeps and chirps to long single tones and cyclical sliding whistles. The contrast in the variety of tones in this sample challenges the ear to assemble a meaningful composition from pitched sounds which have been uncoupled from their traditional associations with each other.

Pitch/Contrast Quadrant

Occupying the centre of this heuristic space is a selection from Duplo Remote’s Cusp, a sample of glitch electronica offering the noisy buzzes, clicks and pops of electronic distortion in a consistent four-beat measure over a smooth electronic organ in a repeated first, fourth, fifth progression. A similarity reinforced by a repetition of cadence and pitch is contrasted by rhythmic irregular noise of distorted, broken beats.

In the Middle of the Compositional Matrix

I am done citing myself now ; )

Pulse & Drone

One of the most prevalent sonic-structures in cinema that come into play when ‘something odd is happening’ is the Pulse and Drone structure. The odd thing happening might involve aliens, psychopaths, dreams, science experiments, spaceships, drugs, flashbacks and what have you. When such odd things happen, Pulse+Drone is one of the first structural considerations in the sound design for audio-visual aesthetic production. Put succinctly, a drone is a continuous sound that usually extends and slowly evolves over quite some time, while a pulse is regularly repeating heard (not implied) interval in the sonic texture.

Sound & Music

There’s a significant amount of historical and creative overlap between sonic and musical aesthetics, so a few main themes from that nexus will be highlighted (and in fact, the previous discussion has already done some of this).

Let’s start with this Ear Painful (but intellectually edifying) example of moving image and sonic compositional correspondence, Iannis Xenakis’ Mycenae Alpha. This is a work of electroacoustic music where the visual score is drawn with a light pen on the early UPIC system.

The idea here, obviously, is that what you see is what you get, sort of. There’s clearly a correspondence that’s easy to see between the lines drawn and sounds heard, but not one that you would necessarily be able to guess at ahead of time, unlike traditionally scored music.

End of Part 1

