1  Introduction

Model diagnostics are critical in evaluating the accuracy and validity of a statistical model. In the context of regression diagnostics, a common practice is to plot residuals against fitted values, which serves as a starting point for evaluating the adequacy of the fit and verifying the underlying assumptions. Visual diagnostics are frequently preferred or recommended (; ; ) due to the possibility of discovering abstract and unquantifiable insights, however, it can be subject to over-interpretation or even neglect.

Buja et al. () introduced a visual inference framework that formalised a hypothesis testing of graphical representations of data (henceforth referred to as the data plot) via the lineup protocol (see for an example). The protocol is inspired by the police lineup technique employed in eyewitness identification of criminal suspects. Briefly, the protocol comprises mm randomly positioned plots, where one position presents the data plot, while the remaining m1m-1 plots present the plots with the same graphical structure, except that the data has been replaced with data consistent with the null hypothesis H0H_0 (henceforth referred to as null plots). To compute the pp-value of the visual test, the lineup will be independently presented to a number of participants, asking them to pick the most different plot. Under H0H_0, the data plot is expected to be indistinguishable from the null plots, and the probability of correctly identifying the data plot by an observer is 1/m1/m. If a large number of participants correctly identify the data plot, the corresponding pp-value will be small, indicating strong evidence against H0H_0. This protocol provides a calibration of the data plot against the null plots, ensuring that the data plot is not over-interpreted.

The lineup protocol has gained increasing traction in recent years and has already been integrated into data analysis of various topics (see ; ; ; ). However, the reliance of human assessment is a fundamental aspect of visual tests, which may restrict its widespread usage. The lineup protocol is unsuitable for large-scale applications, due to its high labour costs and time requirements. Moreover, it presents significant usability issues for individuals with visual impairments, resulting in reduced accessibility.

To address these limitations, this thesis proposes a computer vision-based approach to automate the visual inference process for assessment of linear regression residual plots. Modern computer vision models often use a convolutional neural network to process digital images to perform various tasks (e.g. object detection, object identification and signal processing). The development for computer vision models has primarily focused on processing natural images, such as photographs and videos, and its adaptation for data plots has some success (e.g. classification of time series images in ) but generally limited in development. The development of computer vision models for the assessment of residual plots will make the process more efficient, consistent, and accessible.

Regression diagnostics is a well-established field with extensive literature, and a more detailed discussion will be provided in Chapter 2. This field also encompasses a variety of regression models, including generalized linear models (), mixed-effects models (), panel data models (), and survival models (). This thesis, however, focuses on the classical normal linear regression model. Further discussions on extending the methods established in this thesis to other types of regression models are provided in .

Figure 1.1: A lineup is used to conduct visual testing, as demonstrated in this example. The observed data’s residual plot is placed among 19 null plots generated from a standard error model. Human judges are then asked to examine the lineup and identify the plot they find most distinct. The pp-value is calculated based on how often the judges correctly identify the data plot, which is located at position 77 and exhibits heteroskedasticity. A small pp-value indicates a substantial agreement among the judges in selecting the data plot.

1.1 Thesis Outline

The thesis is structured as follows.

provides empirical evidence supporting the indispensability of residual plots through a visual inference experiment using the lineup protocol. By comparing human evaluations of residual plots to conventional statistical tests, this chapter demonstrates the advantages of graphical methods in detecting practical issues with model fit, while also highlighting the limitations of conventional tests in producing overly sensitive results. The chapter contains a comprehensive literature review related to residual diagnostics.

introduces a computer vision model to automate the assessment of residual plots, addressing the scalability limitations of human-based visual inference. This model is trained to predict a distance measure based on Kullback-Leibler divergence, quantifying the disparity between the residual distribution of a fitted classical normal linear regression model and the reference distribution. Performance of the model is evaluated on the human subject experiment data collected in . A comprehensive literature review of data plots reading with computer vision models is contained in the chapter.

introduces a new R package, autovi, and its accompanying web interface, autovi.web, designed to automate the assessment of residual plots in regression analysis. The package uses a computer vision model built in to predict a measure of visual signal strength (VSS) and provides supporting information to assist analysts in diagnosing model fit. By automating this process, autovi and autovi.web improve the efficiency and consistency of model evaluation, making advanced diagnostic tools accessible to a broader audience.

summarises the contribution of the work and the (potential) impact, and discusses some future plans.