What it is, how it differs from “traditional” eye-tracking, its limitations and risks.
Every year a growing number of new models of “predictive eye-tracking” are published (Kummerer, Wallis, & Bethge, 2018) and a comparable number of companies are incorporating them into their workflows as an alternative to traditional eye-tracking.
But what exactly is predictive eye-tracking?
Predictive eye-tracking is an AI-based alternative to “traditional” eye-tracking, which relies on pre-collected eye-tracking data to simulate human vision. Basically, these pre-collected eye-tracking data are used to train deep learning algorithms, which not only learn from these data, but also improve themselves.
Therefore, thanks to this kind of tool, it is possible to upload a stimulus, such as advertisement, commercial, shelf display, web page, packaging design and so on, and get in a few seconds a heatmap as the output, at a lower price than a “traditional” eye-tracking study.
As previously mentioned, the primary output of this tool is a heatmap that, analogous to those produced by conventional eye-tracking systems, indicates the areas of the stimulus that people are most and least likely to observe. Besides the heatmap, some predictive eye-tracking models also simulate some metrics, presenting them in an easily accessible format to the user.
These outputs are stated to be highly reliable because the estimated accuracy of most predictive eye-tracking models is greater than 90%.

But may predictive eye-tracking completely replace “traditional” eye-tracking?
Spoiler: No.
Predictive eye-tracking can’t fully replace “traditional” eye-tracking because:
-
It gives information just about fixations and not other eye movements, such as saccades and blinks;
-
It simulates just the bottom-up visual process.
Thus, thanks to this tool it is possible to acquire partial information from all that which an eye-tracking study could provide.
Regarding the first point, these models are based on Saliency maps (Kummerer, Wallis, & Bethge, 2018).
Saliency maps are neural network models which highlight the salient regions of an image. The goal of the saliency maps is to reflect the degree of importance of a pixel to the human visual system and, in this context, the probability of observing a fixation at a given pixel in a given image (Kummerer, Wallis, & Bethge, 2018).
But what fixations are?
Fixations are brief pauses in eye movements during which visual perception acquires information about the environment. This information is sent to the brain where it is processed and used to interact with the situation.
Fixations provide very valuable information about visual behavior and cognitive processes to which eye movements are tightly coupled (such as attention, memory, decision making, problem solving and associative learning), but fixations do not exhaust all the analyses that can be made of visual behaviour.
Indeed, predictive eye tracking does not account for other eye movements such as saccades (rapid eye movements during which visual perception is highly suppressed), blinks (rapid closing and opening movement of the eyelids) and pupillary constriction and dilatation mechanisms.
For understanding what additional information compared to fixations these movements give, it is functional to consider a particular construct, like the attention.
Considering the fixations, it is possible to say that the location of the fixation indicates a focus of attention, and the duration of fixations infers the processing efforts toward that spatial location (Holmqvist & Andersson, 2017). Highly salient objects are more frequently selected for fixation, compared to the objects with a low saliency (Nuthmann et al., 2020). Increased fixation duration, compared to the baseline, can indicate difficulties in information processing (Seelig et al., 2021), and repeated fixations to the same visual area can suggest an inefficient search strategy (Rayner, 1998).
Saccades are correlated to the timing at which stimulus impinges cognitive processing during visual scene scanning (Beesley et al., 2019) and how our previous experiences influence the way we prioritize visual information processing (e.g., initial saccade location); and so, gaze is more likely to be first directed toward stimuli that have in the past signaled a positive reward (Pearson et al., 2016).
Spontaneous blink rate is associated with engagement, and, in particular, less engaging tasks are followed by increased single blink duration and less frequent blinks (Hollander and Huette, 2022).
Eventually, the pupillary constriction and dilation mechanisms are closely related to distinct brain attentional networks (e.g., locus coeruleus, superior colliculus, and basal forebrain) responsible for alerting, orienting, and executive attention control. These tight physiological links make pupil size an integrated readout of different attention states (Strauch et al., 2022).
These are just a few examples to highlight the differences between what predictive and “traditional” eye-tracking could provide and how much extra information may the latter tool give.
Therefore, the choice between one of these two kinds of eye-tracking should be done considering what kind of information one wants to acquire.
In addition, it is important to consider the fact that most predictive eye-tracking can simulate just the bottom-up visual process (Murabito et al., 2018) and those who also integrate the top-down are trained on specific tasks (Ramanishka et al., 2017; Murabito et al., 2018); therefore, they do not account for the subject’s previous experience, expectations and goals while seeing a stimulus. In addition, they also may have problems in generalizing the task to stimuli of another nature.
But what are the bottom-up and top-down visual processes?
Our visual system functions through two different types of processes that serve to make sense of the stimulus: bottom-up and top-down.
The bottom-up process is activated when we don’t know absolutely nothing about something, such that we do not have any preconceived cognitive construction about what we are looking at. In this case, salient properties of the stimulus such as colour, shape, movement, contrast and brightness can automatically guide our visual perception and direct our cognitive awareness of the object (van Zoest, Donk, & Theeuwes, 2004).

Instead, the top-down process uses our background knowledge, expectations and our goals in order to influence our perception. In this case, the selection of stimuli is made according to the goal that we are trying to pursue, so that the relevance of the stimulus becomes more important than its salience.

These two processes are complementary to each other, but, as mentioned before, most predictive eye-tracking are able to simulate just the bottom-up ones.
Moreover, it is important to consider that the data used to train these models have been mainly collected through first impact tests, where the stimuli have been displayed only for 3-5 seconds – above all in case of a static stimulus. In this context, the bottom-up process takes over top-down ones, because the former is faster, automatic, and involuntary, while the latter is slower, controlled, and voluntary.
Hence, predictive eye-tracking can only explain a small aspect of the visual phenomenon that cannot be compared to what “traditional” eye-tracking can capture.
Indeed, the “traditional” eye-tracking can capture both of these processes, even more so that subjects recruited to participate in the experiment usually match to certain clusters important for the evaluation of the tested product.
Thus, thanks to “traditional” eye-tracking it is possible to evaluate how actually our customers visually perceive the product under analysis, considering not only to the aspects of the stimulus that are seen or not, but also to the variability and the complexity of their exploratory behavior in relation to their past experiences and expectations with the product and the brand.
That’s why targeting is crucial in neuromarketing, and “abstracting” the customer like predictive eye tracking does may provide a superficial explanation for the perceptual phenomenon.
Further limits
Reflecting on the nature of this tool, we have wondered why this AI-based alternative to “traditional” eye-tracking is becoming so popular.
As mentioned in the beginning of this article, there are economic and timing reasons behind it, but what is sacrificed is the completeness and variability of the data.
Beyond the limitations noted so far, we have identified several risks that the use of this tool without knowledge could lead to.
Below we present 4 of them.
- 1. The confusion of the bottom-up perception with the whole visual perceptive process: As remarked above, predictive eye-tracking can simulate just the bottom-up process and not the top-down and their integration. This gets what is salient confused with what is relevant to the customer, drifting us away from understanding the visual behavior of the consumer.

- 2. The forgetting about the scientific method: No matter if the research is not conducted in academia, neuromarketing research should follow some guidelines suggested by the scientific method.
In the scientific method a phenomena is studied involving observation, questioning, hypothesis formation, testing, analysis, and drawing conclusions.
By just uploading a stimulus, waiting a few seconds to get the output and possibly some insights from an LLM embedded in the software, some very important steps of the scientific method are skipped. In that case, what exactly are we testing? What kind of knowledge are we producing?

- 3. Unreliable data triangulation: Data triangulation refers to the use of multiple independent measures to determine whether a phenomenon is robust, and, the more a phenomenon is confirmed under multiple independent lines of determination, the more it is likely to be robust.
Some predictive eye-tracking available in the market allows to analyze a stimulus by also considering other instruments for measuring cognitive processes, based on AI. Thus, theoretically it is possible to triangulate the predictive eye-tracking data with these measures.
But how reliable are these measures? The accuracy levels which are reported on the websites of these companies are related just to predictive eye-tracking and not other kinds of predictive behaviors.

- 4. The homologation risk: The aim of neuromarketing is to optimize products through the study of consumer’s implicit and explicit reactions.
Now, let’s pretend that we are scrapping “traditional” eye-tracking and replacing it with predictive one.
Considering just the bottom-up attention, the insights achieved from the latter will lead designers to modify the product focusing only on the considered salient aspects such as color, intensity and orientation. By this way, the algorithm will interpret the most salient elements in the same way. Hence, what will differentiate one product from another? Are we always optimising the product?
Conclusions
Predictive eye-tracking is not so predictive because it is based only on a particular eye-movement (fixations) and it can simulate just the bottom-up visual process.
Therefore, broadly speaking, it allows to grab customers’ first impact attention, but not their intention.
Considering the limits of this product, it is important to remember that
-
It’s in the complexity (and in the variability) that the data and the relative insights become more interesting;
-
Their use carries some risks that should be considered during their use, such as the forgetting of the scientific method or the unreliable data triangulation.