What is joint attention, and are we defining it wrong?

Most humans have an important skill: the ability to share attention.  That is, they can pay attention to the same thing, knowing that the other person is also attending to and thinking about it.  The ability to share attention with another person enables all sorts of other activities, such as having a coherent conversation, or working together to put together a puzzle or construct a house.  It may even help babies learn language. When parents name an object in the environment, babies are more likely to understand the referent if they are paying attention to the same object as the parent.  Researchers call the act of sharing attention "joint attention." Not surprisingly, they find it to be an integral part of social and language development in neurotypical people.  They also find it to be delayed or absent in young autistic children, and theorize that this may cause all sorts of social and language delays.

When you define an important concept in psychology, you must decide how to measure it.  Psychologists who study social and language development have generally settled on one, highly specific definition. A person has joint attention if they can look back and forth between another person and an object, and lacks it if they cannot.  Notice that this definition focuses on only one sense (vision), and on a highly demanding motor skill: the ability to rapidly move one's eyes between two things.
A boy engages in joint attention with his mother under this definition. He looks at her, sees she's looking at the toy, looks at the toy himself, and understands that he and his mother are looking at the toy together. From Gillespie-Lynch (2013). Response to and initiation of joint attention: Overlapping but distinct roots of development in autism? OA Autism 13, http://www.oapublishinglondon.com/article/596.

The advantage of this definition is that it is a common way that people share attention (in the U.S., at least), and it's easily measured when you bring people into the lab.  The disadvantage is that when researchers start to identify this operationalization of joint attention with joint attention itself, they ignore the myriad other ways people can share attention.  This may not matter much when considering typical development, but it definitely matters when trying to explain why autistic people have trouble sharing attention.

Two researchers, Morton Ann Gernsbacher and Chen Yu, have written two very different critiques of why joint attention should not be considered synonymous with alternating eye gaze between another person and an object.  Morton Ann Gernsbacher's theoretical paper1 explains that such a theory of joint attention cannot explain how people share attention in many of the world's cultures.  Chen Yu's experimental study2 indicates even the typical research subjects, middle-class U.S. toddlers, rarely coordinate joint attention through alternating eye gaze, and instead do so through other means. 

Vision isn't the only way to share attention
Akhtar and Gernsbacher (2007) lay out a variety of other ways that children and parents in other cultures share attention.

In some cultures (e.g. Kaluli people in New Guinea, Palestinian families in Israel), babies are not only constantly held but are held facing outward, rather than face to face.  This can occur when babies are carried in the mother's arms, on her shoulders, on her back, or in her lap.  As a result, these babies get very little face-to-face experience with their caregivers compared to U.S. babies.

However, in such non-Western cultures, mothers interact with their babies less through eye contact and talking and more through touching and holding.  These babies' early social engagement occurs through other sensory modalities more than through vision.  Thus, one would expect shared attention to develop out of the other senses more than, or earlier then, it would develop out of vision.

When babies are held, they adjust to the movements of the person holding them, and they're sensitive to changes in posture very early in life.  Changes in the caregiver's posture can convey similar information to changes in their gaze direction--a caregiver is likely to lean towards something they are focused on or interested in.  Thus, babies in these low-eye-gaze cultures have access to the same information that U.S. babies get from eye gaze, but they get it from a different sensory modality.

In addition to postural changes, other tactile cues may be important for establishing shared attention.  If a child sits in his mother's lap while they both handle a toy, he can tell from her posture, touch, and hand movements that she is attending the same toy.

Blind children do not utterly lack the ability to share social engagement and attention with others, as would follow if joint attention were truly nothing but the alternation of visual attention.  Instead, they do so through nonvisual modalities instead.  Caregivers also provide tactile cues that direct the child's attention and cue them to engage in intentional communication.  They use touch to get the child's attention, either before signing in the child's visual field, or simply to maintain contact when one partner has looked away. 

In short:
"Although it is possible that gaze is the primary sense for typically developing, sighted infants in Western middle-class contexts, we cannot assume that gaze is primary without exploring other senses and other populations. By examining variations across cultures and across typical and atypical development, researchers may uncover multiple pathways to achieving social engagement and intentional understanding of others' behaviors."
Alternating gaze may not be the best measure of shared reference
Both Gernsbacher and Yu argue that even for sighted U.S. babies, gaze alternation may not be the best measure of shared attention.

To display joint attention, it's not enough to be looking at the same object as a parent.  A child must also gaze into the caregiver's face.  We've already seen one problem with this--that babies raised in other cultures with less eye contact will be unlikely to do so.  Another problem is that the child might look up at the caregiver for reasons other than trying to share attention.  They might be looking for comfort, if anxious in the unfamiliar laboratory setting.  Or, they might be looking for information, when confronted with odd and ambiguous laboratory toys.  

Furthermore, even U.S. babies who appear to be following eye gaze may not actually be doing so. Changes in gaze direction are usually accompanied by changes in head orientation, body posture, and voice direction, all of which come together to indicate the person's direction of attention.  Studies of joint attention that define joint attention as alternating eye gaze do not necessarily attempt to separate gaze cues from these other visual and auditory cues--which may be the truly informative ones for babies. 

Enter Chen Yu's study, which uses sophisticated real-time measures to determine how U.S. toddlers playing with their mothers really do share attention. These researchers have an innovative procedure: in addition to using overhead camera to provide a third-person view of their behavior, they have mothers and babies each wear head cameras with built-in eye-tracking equipment so that researchers can literally see the world from each participant's point of view.

Here's what the setup looked like:

They brought in seventeen 11-15 month olds and their parents for a play session in the laboratory. There were six toy objects at the table, displayed in sets of three, with which the pair could play freely. As seen above, the room was white and minimally distracting, which could be a strength or weakness of the study depending on your point of view.

Researchers measured frame by frame where the babies looked, where the parents looked, and when both shifted to look at a new object, which partner led and which one followed.  They suspected that social coordination would involve babies and parents looking at the same object at the same time. If one partner looks at a new object, to maintain coordination, the other will soon follow, but they might do so without making eye contact.

That was, in fact, what they found.  Babies and their parents frequently looked at the same object (about 33-42% of the time, depending on the measure), sharing over 23 bouts of joint attention per minute by one measure.  Yet babies rarely looked at their parent's face (about 5 times per minute), certainly not often enough to coordinate their looking behavior with parents.  Indeed, babies did not consistently look at a parent's face when following their gaze to an object. (It's important to note that babies and parents did look at each other's faces at times, just that this was rare and did not seem to relate to their coordinated looking at objects. Also, parents did frequently look at the child's face and used this cue to follow their child's attention. Children just didn't do the same with the parent).

So what cues were babies using to share attention with their parents?  Parents or babies were holding an object almost all the time, and hand cues overlap well with eye gaze cues.  Babies tended to look at the hands of whomever was acting on the object, whether that was themselves or their parent.
Above: Gaze and joint attention data from Yu & Smith (2013). a,b) Comparison of where child and parent were gazing, showing that both were often looking at the same thing; that parents looked at the face more than children did; and that children maintained gaze fixation longer than parents, overall (perhaps related to slower attention shifting in this age group). c) and d) are two different ways of measuring the synchrony between child and parent gaze. e) isn't important for the purposes of this post, but it compares the cross-recurrance of parent-child gaze to a random baseline, where the x axis represents time.
Why does it matter how we measure shared attention?
Since Simon Baron-Cohen proposed3 that autistic children's language delays stem from their inability to alternate joint attention between another person and an object, a line of autism research investigating the relationship has arisen, based on this assumption. But if joint attention is not identical with triangulated eye movements even in typical development, then our explanations for disabilities in autism rest on a faulty foundation.  If this misinformed research informs interventions, then much effort may be spent trying to teach triadic eye movements that may be painful, or even impossible for young autistic children (given their difficulties with rapid eye movements in general). Efforts would be better spent developing ways to teach language skills and desired social behavior using more accessible cues.

Stigma also arises when we assume that a person who cannot triangulate eye movements between a person and an object also cannot share attention with another person. From here, people often make the leap to claiming that autistic children cannot be emotionally engaged with others or realize that other people have mental states, too, which leads to viewing them as alien at best and sociopathic at worst. These conclusions do not follow, of course, but given the difficulty even researchers have with recognizing that, it may be best to emphasize the distinction between the ability to perform a particular pattern of eye movements and the ability to share attention.

Nameera Akhtar & Morton Ann Gernsbacher (2007). On privileging the role of gaze in infant social cognition. Child Development Perspectives 2, pp. 59-65.
Chen Yu & Linda Smith (2013). Joint attention without gaze following: Human infants and their parents coordinate visual attention to objects through eye-hand coordination. PLoS ONE 3, e79659.
Simon Baron-Cohen, Dare A. Baldwin, & Mary Crowson (1997). Do children with autism use the speaker's direction of gaze strategy to crack the code of language? Child Development 68, pp. 48-57.