Frontiers for Young Minds

Frontiers for Young Minds
Menu
Core Concept Engineering and Technology Published: April 9, 2024

How Scientists Use Webcams to Track Human Gaze

Abstract

Eye tracking is a technology that can record people’s eye movements and tell scientists what people look at on screens or out in the world. Scientists use eye tracking to understand what people notice or remember; marketing researchers who create ads use eye tracking to see what type of ads or products capture people’s attention; and video game designers use eye tracking to see what parts of a game are confusing to players, so designers can fix the game. Eye-tracking equipment can be expensive and time consuming for researchers to use, so is there another way to record eye movements without buying an eye tracker? There is! Computer scientists can use a computer-based method called machine learning to turn an everyday webcam into an eye tracker. They can even do this with mobile phones! In this article, you will learn about how eye trackers work and the advantages of disadvantages of using webcams to track eyes.

Eyes are Windows to the Mind

Have you ever had a conversation with a friend and noticed your friend’s eyes were no longer looking at you but were suddenly looking behind you? What did you do? You probably turned around to see what your friend was looking at. This illustrates that eye movements tell us where people are paying attention. Scientists measure eye movements to understand what people remember and pay attention to, how people read, and even to screen for certain disorders. An eye tracker is a camera that takes pictures of a person’s eyes [1]. Eye trackers study information from these pictures (like the shape of the pupils) to pinpoint where a person is looking. These cameras take hundreds or even thousands of pictures each second! The large number of eye pictures allows eye trackers to be very exact in pinpointing where and when a person looks at something.

If an eye tracker was recording your eye movements while you watched a video, a scientist could use your eye movements to understand what you were paying attention to on the screen and for how long. For example, an eye tracker could detect your fixations: when your eyes seem like they have stopped moving to look at something. Longer fixations (like when you stare at something) might mean that you are really focused on a character in the video, while shorter and frequent fixations may mean you are either distracted by some other characters or objects, or that you are having trouble understanding what is happening on the screen. The tracker may also detect that your eyes follow the movement of the characters without you even noticing (Figure 1). The large, sweeping movements that your eyes make between fixations are called saccades (for more information about eye movements, see this Frontiers for Young Minds article).

Figure 1 - A tablet shows a video with eye scan paths on it.
  • Figure 1 - A tablet shows a video with eye scan paths on it.
  • A scan path refers to the path that the eyes take when a person is looking at something. The large circles represent fixations, where the person’s eyes seem to stop, and the lines show the saccades that the person’s eyes took between fixations. What parts of this video did the person look at?

Teaching Computers to Predict Gaze Location

In the lab, scientists use special eye-tracking equipment that is extremely good at figuring out where a person’s eyes are looking on a screen, which is called gaze location (Figure 2). Even though eye trackers are excellent tools, they have some challenges. First, eye-tracking equipment can be very expensive, so not every scientist who wants to research eye movements can purchase the equipment for their laboratory. Also, eye trackers can only measure eye movements in-person and with one person at a time. This means research that requires lots of people can take a long time to conduct. It can be challenging to find people to participate in research when participants have to go to a laboratory to do so.

Figure 2 - (A) A participant works on a computer with an eye-tracking system.
  • Figure 2 - (A) A participant works on a computer with an eye-tracking system.
  • The eye-tracking system uses a lot of technical equipment and requires the participant to keep her head still on a chin rest. All of this equipment makes the system very accurate in figuring out where the participant is looking on the computer screen. (B) A person works on a laptop with a built-in webcam. The webcam does not require as much equipment and the participants can sit comfortably and is free to move her head.

These challenges in using eye-tracking equipment can be overcome by using webcams to track eyes. Webcams are in most common personal devices (like phones or laptops), making it easy for scientists to reach a diverse group of people, without participants needing to come to a lab. Webcams are also much less expensive than eye-tracking equipment. Scientists could use webcams to collect eye-movement data remotely, which could save time and money [2]. Webcams were not designed to track eyes, so how do scientists get eye-movement data from them? There are several ways to use webcams as eye trackers, but one popular way is with machine learning [3].

Machine learning is a way for computers to use data (like pictures or numbers) and a set of mathematical calculations to learn from experience and find patterns in the world. Using machine learning, computers can learn from lots of pictures of people’s faces. When you are playing with your friends, have you ever noticed where they were looking, like at a cool toy or a yummy snack? You use clues to figure out where your friend looking, like their eye movements, how their head is turned, or how close they are to something. Computers can do something similar. They look at thousands of pictures of people’s faces and try to find patterns in those pictures, just like your brain finds patterns in your friends’ actions. Computers use these patterns to guess where someone might be looking when they look at a face, for instance. Scientists have improved machine learning to make more accurate predictions of where a person is looking by using other helpful information like eye and face landmarks that point out edges on a face (Figure 3); depth information, like how far away a person is from the webcam; and even information from the scene on the screen [4].

Figure 3 - Webcam images with facial landmarks.
  • Figure 3 - Webcam images with facial landmarks.
  • The dots (landmarks) on this woman’s face are on important edges and corners of the face, such her jaw, mouth, eyebrows, and importantly, her eyes. Machine learning can use landmarks to make better gaze-location predictions from webcam images like these.

Challenges with Webcam Eye Tracking

Though webcam eye tracking can help scientists make conclusions about peoples’ gaze locations for little cost, it is far from perfect. Webcam eye tracking does not have great precision or accuracy in saying where the eyes are really looking. Compared to a laboratory eye tracker, webcam eye tracking is not very good at separating types of eye movements from each other. This is because the pictures taken on a webcam are of lower quality than those on a laboratory tracker. Also, the frame rates (how quickly cameras can take pictures) are very different. A webcam can take around 30 pictures per second. While that may seem like a lot, laboratory eye trackers can take hundreds or even thousands of images per second! Taking fewer pictures per second means that the webcam cannot capture certain types of eye movements that happen very quickly.

Scientists can use webcams to track the general pattern of eye movements, but the measurements are not exact for finer eye movements. When someone wants to track eye movements to large characters and scenes in a video or an ad, low precision might not be a big deal. However, when scientists are doing experiments, they need better precision for tracking small or fast eye movements, like those eye movements that happen during reading or searching for small objects in a scene. For instance, say that you are focused on a person talking in a video, then you move your gaze to see an animal moving in the background just behind the person, and then you shift your eyes back to the person talking. Those small shifts in gaze may not be detected in webcam eye tracking. Also, think about where and how you normally watch videos, browse the internet, or use a camera. Are you in the dark, and maybe sometimes moving around? Because webcams have lower image quality compared to laboratory eye trackers, it is ideal for people to be in rooms with good lighting and to be sitting still while tracking. It is not always possible to make sure people are doing these things while researchers collect webcam images remotely.

Looking Ahead: The Future of Eye Tracking

Webcam eye tracking can be a cost-effective and time-saving approach for researchers who want to study eye movements. However, there are limitations in using webcams for eye tracking, as they are not as accurate as laboratory eye trackers at predicting where someone is looking. Scientists are working to improve webcam eye-tracking methods, such as by using machine learning, so they can more accurately predict eye movements using images from webcams. This work is important because it helps make eye-tracking technology easy to use for everyone, allowing scientists to learn more about how we see and interact with the world around us, even from the comfort of our own homes.

Glossary

Eye Tracker: Technology that can record people’s eye movements and tell scientists what participants are looking at and for how long.

Fixation: The time between large eye movements when the eyes seem like they have stopped to look at something.

Saccade: A large, sweeping movement that your eyes make between fixations.

Machine Learning: A way of analyzing data that allows computers to learn from experience.

Landmarks: Marks that help a computer understand where edges of important parts of a face are in a picture, like eye corners or the chin.

Precision: Accuracy, or the degree to which the tracking system is correct in saying where someone is looking.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

Written informed consent was obtained from the individual(s) for the publication of any identifiable images or data included in this article.


References

[1] Robbins, A., and Hout, M. C. 2015. Look into my eyes. Sci. Am. Mind 26:54–61. doi: 10.1038/scientificamericanmind0115-54

[2] Papoutsaki, A., Laskey, J., and Huang, J. 2017. “Searchgazer: Webcam eye tracking for remote studies of web search”, in Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval (New York, NY: ACM), 17–26.

[3] Valliappan, N., Dai, N., Steinberg, E., He, J., Rogers, K., Ramachandran, V., et al. 2020. Accelerating eye movement research via accurate and affordable smartphone eye tracking. Nat. Commun. 11:4553. doi: 10.1038/s41467-020-18360-5

[4] Park, S., Aksan, E., Zhang, X., and Hilliges, O. 2020. “Towards end-to-end video-based eye-tracking”, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16 (Berlin: Springer International Publishing), 747–63.