cqcheck.Rd
Given an additive quantile model, fitted using qgam
, cqcheck
provides some plots
that allow to check what proportion of responses, y
, falls below the fitted quantile.
cqcheck(obj, v, X = NULL, y = NULL, nbin = c(10, 10), bound = NULL, lev = 0.05, scatter = FALSE, ...)
obj | the output of a |
---|---|
v | if a 1D plot is required, |
X | a dataframe containing the data used to obtain the conditional quantiles. By default it is NULL, in which
case predictions are made using the model matrix in |
y | vector of responses. Its i-th entry corresponds to the i-th row of X. By default it is NULL, in which
case it is internally set to |
nbin | a vector of integers of length one (1D case) or two (2D case) indicating the number of bins to be used
in each direction. Used only if |
bound | in the 1D case it is a numeric vector whose increasing entries represent the bounds of each bin.
In the 2D case a list of two vectors should be provided. |
lev | the significance levels used in the plots, this determines the width of the confidence intervals. Default is 0.05. |
scatter | if TRUE a scatterplot is added (using the |
... | extra graphical parameters to be passed to |
Simply produces a plot.
Having fitted an additive model for, say, quantile qu=0.4
one would expect that about 40
responses fall below the fitted quantile. This function allows to visually compare the empirical number
of responses (qu_hat
) falling below the fit with its theoretical value (qu
). In particular,
the responses are binned, which the bins being constructed along one or two variables (given be arguments
v
). Let (qu_hat[i]
) be the proportion of responses below the fitted quantile in the ith bin.
This should be approximately equal to qu
, for every i. In the 1D case, when v
is a single
character or a numeric vector, cqcheck
provides a plot where: the horizontal line is qu
,
the dots correspond to qu_hat[i]
and the grey lines are confidence intervals for qu
. The
confidence intervals are based on qbinom(lev/2, siz, qu)
, if the dots fall outside them, then
qu_hat[i]
might be deviating too much from qu
. In the 2D case, when v
is a vector of two
characters or a matrix with two columns, we plot a grid of bins. The responses are divided between the bins
as before, but now don't plot the confidence intervals. Instead we report the empirical proportions qu_hat[i]
for the non-empty bin, and with colour the bins in red if qu_hat[i]<qu
and in green otherwise. If
qu_hat[i]
falls outside the confidence intervals we put an * next to the numeric qu_hat[i]
and
we use more intense colours.
####### # Bivariate additive model y~1+x+x^2+z+x*z/2+e, e~N(0, 1) ######## NOT RUN { library(qgam) set.seed(15560) n <- 500 x <- rnorm(n, 0, 1); z <- rnorm(n) X <- cbind(1, x, x^2, z, x*z) beta <- c(0, 1, 1, 1, 0.5) y <- drop(X %*% beta) + rnorm(n) dataf <- data.frame(cbind(y, x, z)) names(dataf) <- c("y", "x", "z") #### Fit a constant model for median qu <- 0.5 fit <- qgam(y~1, qu = qu, data = dataf) # Look at what happens along x: clearly there is non linear pattern here cqcheck(obj = fit, v = c("x"), X = dataf, y = y) #### Add a smooth for x fit <- qgam(y~s(x), qu = qu, data = dataf) cqcheck(obj = fit, v = c("x"), X = dataf, y = y) # Better! # Lets look across x and z. As we move along z (x2 in the plot) # the colour changes from green to red cqcheck(obj = fit, v = c("x", "z"), X = dataf, y = y, nbin = c(5, 5)) # The effect look pretty linear cqcheck(obj = fit, v = c("z"), X = dataf, y = y, nbin = c(10)) #### Lets add a linear effect for z fit <- qgam(y~s(x)+z, qu = qu, data = dataf) # Looks better! cqcheck(obj = fit, v = c("z")) # Lets look across x and y again: green prevails on the top-left to bottom-right # diagonal, while the other diagonal is mainly red. cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5)) ### Maybe adding an interaction would help? fit <- qgam(y~s(x)+z+I(x*z), qu = qu, data = dataf) # It does! The real model is: y ~ 1 + x + x^2 + z + x*z/2 + e, e ~ N(0, 1) cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5)) # }