Methods of Montage Analysis

Music videos, opening credit sequences, experimental films, depictions of hallucinations and dreams, compressing time and action, and videos that will kill you if you watch them are amongst the main types of moving image making that employ montage techniques. Montages have their own inner logic and workings that are typically conceptualized as being at the opposite end of the practical spectrum from continuity methods, which aim to hide that ‘stiched-togetherness’ of edited shots aiming to reproduce a more or less seamless sense of naturalistic time and space.

Below I will consider a montage examplesand explore it initially through frame and content analysis, which is perhaps the most direct way we can analyze montages. More indirect and intellectually lofty methods might be to apply the Deleuzean/Bergsonian contrarities of Time Image versus Movement Image — e.g. we could analyze montage in relation to the varieties of the latter, namely the Perception Image, the Action Image, and the Affection Image — or the Soviet montage theorists’ typologies (Metric, Rhythmic, Tonal, Overtonal, and Intellectual montage strategies). This more abstract approach is certainly warranted and I will come back to these ideas further below.

Frame and content analysis start literally with what is most immediately given in a montage sequence, the visual material. We start with the basics — what are the frames, what’s in the frames, and what’s happening in the frames. Generally, the ‘what’s in’ a frame will be a simple listing of the objects shown, whereas happenings or events can consist of elements such visual effects, camera movements, actions and sounds.

We’ll start with the aforementioned video that won’t kill you, from The Ring (2002), which is essentially a short experimental video at the narrative and visual epicenter of a traditional mainstream horror film. As this montage comes straight from a horror film, there are some brief moments of violent images, so skip the video if you find that kind of content disturbing.

On a practical note, to try this technique yourself, you can download video clips from YouTube via clipconverter or Y2mate. Make sure you have good anti-malware installed on your computer, because these types of video file downloading sites will constantly prompt you do install things you don’t want. On a Mac, you can open the video file in Quicktime Player, hit Command-C on a frame (to copy it), and in the Preview app, under File choose New from Clipboard (to paste it), and Command-S to save a PNG of the frame. PCs will have similar methods for copying video frames, depending on what app you’re using, e.g. you can use the Snipping Tool to take a screenshot of a screen area defined by the video frame..

The first step of this kind of direct analysis of montage sequences is to put together what will resemble a kind of mood board of the whole clip, where each image is placed in its order of appearance, e.g. top left = the start, bottom right = the end, and your goal is to select mainly very representative frames.

Then, perhaps not surprisingly (I did not say this method of analysis is highly difficult : ) you want to indicate the content and happenings within the frames. For example, here are the first three ‘mood board-ish’ frames of The Ring’s death video:

First three main montage ‘moments’ of the death video

You can distinguish content from happenings (you can also call them events) in different ways, whatever serves your understanding best. Below, I will employ non-italicized font for content, and italicized font for events, with a vertical pipe character | separating them. I will also use commas to separate visual from sonic happenings, with visuals first and audio elements after the comma.

If the main event is the sound, that can be listed by itself, as the image may already be well described on the left side of the vertical pipe character. Duration notes can go inside of brackets, and can be either absolute — indicating an exact time in the number of frames or seconds, or relative based on a subjective sense of how they compare to each other. I will also use bullet points to list them, e.g. for the first three frames above, my description will look like this:

– TV Static | dynamic glitchy, electronic rhythmic hums and tones [medium]
– Black | intermittent silence (this is a kind of non-event, but this counts, too) [medium]
– Bright Irregular Circle Shape | sudden loud layered full spectrum electronic sounds [medium]

As you can see, this analytical method is rather straightforward, as it is mainly descriptive (it’s not asking yet for any kind of analysis or pattern recognition). So, let’s keep going in the same vein, until we get to about two thirds of the way through the clip:

The analytical description for all the frames after the first three already described comes out to:

  • Static | dynamic glitchy static, electronic rhythmic tones (especially high pitched ones) [short]
  • Dark Organic Substance | watery wave and current motion, bass sounds [short]
  • A Tear or Rip | scraping sound [extremely short]
  • An embryo | scraping sound continues [extremely short]
  • Chair, Shadow, Room | a processed moaning sound [medium]
  • Comb, Hair | sound evolves, like electronic feedback[medium]
  • Woman combs hair in mirror | image warble effects, feedback sound continues [long]
  • Girl in opposite mirror | fades into background, sound keeps evolving [short]
  • Woman in mirror returns | looks towards the girl’s mirror [short]
  • A Tear or Rip again | high pitched sounds [very short]
  • Someone in window | image shakes, sound continues [medium]
  • Shoreline with fly | image distorts rhythmically and has white flashes of overexposure, sound is both mechanical and organic [long]
  • Rope-like thing pulled from mouth of a dirty face | ‘gross’ organic sounds [short]
  • Indeterminate |fast moving unclear object, hight pitched sounds [very short]
  • Lid / Eclipse | black circular object cuts across a white disc light an eclipse or lid, bass sounds [very short]
  • Leafless Tree on Fire | bass sound continues, looks like animation instead of film [very short]
  • Box with Fingers Inside | camera zooms in, sound from low to high pitched [long]
  • Nail (looked like a tear or rip before) | finger lowers onto the nail [medium]
  • Cut |skin or animal hide is sliced, scraping sound [very short]
  • Dark droplets | two dark liquid drops, one moves sideways, image dissolves to a blur, bass sounds [long]
  • Globules | out of focus blobs floating in a medium, new bass sounds, [medium]
  • Finger Impaled on a Nail | high pitched sounds [very short]
  • Combing Hair| high pitched sound continues [short]
  • Fingers in a Wooden Box | fingers wriggle, airplane-like sounds [medium]
  • Streak of Blood on Cloth | blood flowing in a line, total silence in the soundtrack [medium]
  • Larvae | alive and wriggling, organic sounds and drones[medium]
  • Bugs | presumes a leap forward in time, image values invert, the larvae have hatched [medium]
  • Table, Chair, Glass of Liquid, Giant Millipede| zoom in, wind and scraping sounds [long]

Once we’ve organized the frames side-by-side in this way, and made simple notes as to their visual content, the events happening in time, their sonic content and durations, we start to see aspects of larger scale structuring in the sequence.

The overall sensibility of this video montage is that it represents things of the unconscious realm. The unconscious if famous for bad jokes, such as taking the innocuous word ‘fingernail’ and then showing us a finger being impaled on a nail — that’s a classic Surrealistic maneuver, since dreams and fantasies tend to be overloaded with bad puns (as per Freud and other psychoanalytic theory).

We can also see a soundtrack that is continuously shaping a compositional space of very low, midrange and high pitched sounds, where those sounds alternate between an electronic, organic, environmental and mechanical aesthetic feel.

Looking across the frames, beyond a monochromatic color palette which imparts a strong sense of unity to all of the individual images used in the sequence, we can see a recurrence of key forms, such as a repetition of circles, or contrasts between circles and ovals, or round shapes versus square shapes. The images all also mostly make use of large areas of negative space, where there are a small number of foregrounded objects against a neutral backdrop.

There are a fair number of images which are indeterminate, i.e. we cannot really tell for sure what the objects are, either because of the strange close-up perspective, motion blur, focus issues or lack of contextual cues.

We can also clearly see patterning in the durations, where we often have a sequence of very brief shots followed by longer ones. There is a gradual temporal movement between long, medium, short and very short duration image sequences which causes time to dilate subjectively, that is, seem to expand and contract.

The full gamut of cinematographic composition is used, i.e. close-ups, extreme close-ups, medium shots, long shots etc. Generally the close-up and medium wide shot appear to be the most frequently used in this sequence, though how you define those terms at their edge cases may cause some variation in shot composition interpretation. The images also play heavily with balanced versus unbalanced composition, the rule of thirds, centered versus uncentered framing, diagonals, unique points of view and a wide range of image compositional devices.

Intertextually, i.e. referring to the wider world of video art, The Ring’s death video (made in 2002) riffs off the experimental video of Ed Rankus and Bob Snyder (1995), Nerve Language, as easily demonstrated in the comparison shots below.

Nerve Language (at left) and The Ring (at right)
Nerve Language (at left) and The Ring (at right)

Intertextuality refers to the interdependence of texts in relation to one another (as well as to the culture at large). Texts can influence, derive from, parody, reference, quote, contrast with, build on, draw from, or even inspire each other. Intertextuality produces meaning. Knowledge does not exist in a vacuum, and neither does literature. (source)

A Soviet Montage Reading

So how might we apply the Russian montage theorists’ concepts to this particular filmic construction? While there are five general categories of montage that they defined, that does not mean you should pick only one strategy when constructing your montage. In fact, it can easily make sense to blend montage principles to create a stronger visual composition.

Metric Montage

There are clear durational categories cycling through the sequence. We don’t need to strictly measure the exact number of frames to check whether several ‘very short’ shots are of exactly the same length. We can go with a general concept of strict and lose metric construction, and note that the images fit a small set of relatively similar durational categories that we experience as being similar, no matter what the actual clock time may be.

Rhythmic Montage

You may have noticed that the most violent imagery is typically presented in the shortest shots. The extreme briefness of a slice through skin or a finger prick, relative to longer shots of a mirror on the wall or a coastline, is part of a coherent rhythmic logic whereby the content of the shot has a logical and regularly patterned relationship to other shots in terms of their respective durations. Both metric and rhythmic approaches respond to the slow, evolving and sudden (and sometimes extreme) changes in the soundtrack, which imparts another logic to shot length and cut points.

Tonal Montage

‘Tone’ in this context means emotion and thematic reinforcement. This is clearly not a bright and cheerful montage sequence! It is clearly preoccupied with death, violence, loss, the past, decay and mystery (amongst others that can be named). These emotional and thematic tonalities are reinforced by recurring images, sometimes of the exact same image or images highly similar to others, as well as the color palette used and visual effects post-processing.


The visual pun of ‘fingernail’ and ‘finger impaled on a nail’ would be an example of intellectual montage at work, as would be the intertextuality shown above whereby this ‘art film inside of a Hollywood film’ is directly referencing art world experimental filmmaking through its reinterpretation and visual (and somewhat sonic) riffing off the pre-existing film, Nerve Language. There is the additional ‘intellectuality’ implied by this film presenting somewhat of a mystery to the viewer. This mystery solving activity is not just about understanding why this video kills people within seven days of viewing it (the plot’s premise), but also the narrative work involved in figuring out the meaning of the video, whereby each shot is treated as a kind of puzzle piece in an overall backstory that the film will eventually solve for the audience.

Associational (Overtonal) Montage

Since we can see all of the previous montage methods in effect, Overtonal strategy is thus also at work, since this implies the coming together of the other montage strategies to produce a strong aesthetic impact that can work at many levels of experience and interpretation. The sound collage strongly connects these various strategies so that the inter-coherence of shots is accentuated.

A Deleuzean/Bergsonian Reading

Deleuzean readings of anything run the risk of immediately becoming mini-dissertations in terms of the sheer word count (and the duration of reading those words) to properly rise to the level of a fully philosophical reading.

Here I will propose a few ways that one might read Deleuze’s Bergsonian film theory concepts against this particular video sequence.

The Movement of Interruption

‘It’s not the fall that kills you, it’s the sudden stop’ as they say in physics classes. A montage such as this is a rhythmic cessation of movement, where we are often brought into the unfolding of an event only to be suddenly yanked out of it.

The Brief Pleasure of Coherences

At the same time, there are moments of everyday motion sense making, such as the implicit spatial logic that a person at frame right may want to look over at another person at frame left, even as the latter is vanishing from the scene. Such everyday movements of life are briefly introduced, only to take us quickly away from them.

Inhuman Movements

The bodily wriggling of insects and even severed fingers introduces themes of involuntary movement, movement not guided by human will but generated at visceral levels of embodied cognition that are not covered by safe and civilized humanistic contexts. This extends to movements produced by gravity onto inanimate objects, such as rolling ocean waves and falling bodies, whether (and from gravity’s perspective, equally) humans or ladders.

Mundane Movements

Perhaps most movements we enact are utterly mundane, such as chewing, breathing, or in the context of this video, brushing our hair. This everyday mundanity, however, becomes ritualized through repetition and intense focus.

Destructive Movements

There are movements in this sequence that seeks out pain and violence (given the horror genre, it may be odd of these movements did not appear at some point!). Slicing skin or puncturing a finger, or yanking some kind of rope out of a mouth bring up movements that disrupt organic wholeness and unity.

Movement for Movement’s Sake

There are movements of unidentified objects, indeterminate movements of blobby, fast or out of focus things that fill up the film frame’s totality for no other reason than to present some unspecified movements. Movement is displayed in all of its velocities, from ‘so fast you can’t tell what it is’ to bobbing gently in a fluid medium, with ocean waves and millipede legs perhaps somewhere in between.

Biological Time(s)

An embryo does not develop at rapid filmic speed. The division of cellular life typically requires a time lapse treatment to appear in film sequences. We are presented both with a temporal slice of some cellular something, pulsing slowly and not rushing towards its development, and also a fast speed-up from a seething larval mass to fully grown bug population. Decay is both biological (e.g. the cow rotting in the lapping waves), but also technological, in the texture of the VHS analog video tape that is clearly degraded both by time and somewhat supernatural forces.

This will suffice for my Deleuzean/Bergsonian exposition of The Ring’s death video! Feel free to write proper PhD thesis length treatises of your own analysis of the movement image in montage sequences.

