Due to the extensive use of classes, TrialEmulation can be expanded by the user to fit their own specific needs.
This document gives a quick overview of the extensible classes, the current implementations and the requirements for adding your own child classes.
This vignette describes two areas where new functionality could be implemented: regression model fitting and data storage.
Three classes are required implementing a model fitter:
te_outcome_fitted
must be defined for a new model
fitter implementation.
tidy
,
glance
) and the saved file (save_path
)tidy
,
glance
) and the saved file (save_path
)Currently only one model fitter class is implemented:
stats::glm(..., family = binomial("logit"))
model
, the result of glm()
, and
vcov
, the robust covariance matrixtidy
, glance
and save_path
A user constructor is required to specify the model fitter type in
set_censor_weight_model()
,
set_switch_weight_model()
and
set_outcome_model()
. Each is specified independently. The
user constructor should have arguments for any required model fitting
(hyper-)parameters as well as a path for saving the model objects.
See stats_glm_logit()
for a simple implementation.
There are 3 generic methods that are required when implementing a new
model, fit_weights_model()
,
fit_outcome_model()
, and predict()
.
This method uses the model object to fit a model for probability of censoring and returns the fitted probabilities which are later combined and used to construct the inverse probability of censoring weights. The method should also save the fitted model object to disk if a save path is specified.
te_model_fitter
objectdata.frame
containing the outcome (here the
censoring indicator) and covariate datacharacter
label describing the model to be
attached to the resultte_weights_fitted
object
containing a summary of the fitted model and the fitted
probabilities.This method fits the outcome model. object, data, formula, weights =
NULL - Arguments - object: the
te_model_fitter
object - data: data.frame
containing the outcome and covariate data - formula: the model formula -
weights: a numeric vector containing weights for all observations in
data
- Returns: The fitted model as an
object inheriting from a te_outcome_fitted
child class
corresponding to the fitter model class used. This object contains a
summary of the results as well as the raw result from the model.
This method calculates the marginal survival or cumulative incidences
based on the outcome model object. The method should take the baseline
covariates and construct data for assigned_treatment = 0
and 1
as well as the follow up times given in
predict_times
.
te_outcome_fitted
, eg
te_stats_glm_logit_outcome_fitted
data.frame
containing baseline covariates to
predict probabilities for"cum_inc"
or "survival"
The sequence of target trials dataset is much larger than the input
longitudinal data. If the original input data is already large compared
to the available system memory, an alternative data storage mechanism
might be desirable. Currently the package offers
data.table
, csv
, and duckdb
. Here
we describe the implementation of “data stores”.
In order to add a new data store, a child class must be defined that
inherits from class te_datastore
. You must also add at
least a new constructor save_to_xxx()
as well as new
methods for save_expanded_data()
and
read_expanded_data()
.
A new method for sample_expanded_data()
is optional
(e.g. in case sampling is not required or the implemented method for
te_datastore
is sufficient, see below under
sample_expanded_data), but it will be necessary for large
datasets.
trial_sequence
objects before setting expansion
options, will be replaced with the corresponding child class when
expansion options are set.
Currently the following Data Store child classes are available for saving expanded data:
data.frame
, used as a template when reading the data to
preserve types and attributeste_datastore
data.table
in memory, only viable for smaller datasets.
data.table
containing expanded datate_datastore
te_datastore
The user constructor function is used in
set_expansion_options()
to replace the
te_datastore
object in
trial_sequence@expansion@datastore
with an object of the
desired child class. The user constructor allows the user to specify any
parameters required for the data store, such as file path, or
username/password. Saving of the data happens later when calling
expand_trials()
which internally calls the corresponding
save_expanded_data()
method.
See the following currently available constructor functions for
further insights: save_to_csv()
,
save_to_datatable()
, save_to_duckdb()
There are four generic methods that are defined for the
te_datastore
class.
This method prints a simple summary or extract from the data.
Note: Since the child classes differ quite significantly from
each other, every child class has its own show method. There is no show
method for the te_datastore
parent class.
This method defines how the expanded data gets saved. Method is
chosen based on the te_datastore
child class. It gets
called internally by expand_trials()
. For large datasets
save_expanded_data()
may be called multiple times, so the
method must be able to “append” data in some way.
te_datastore
child class objectdata.table
to be saved to the data storete_datastore
child
class objectThis method is used for reading the expanded data into memory. The
data can be subset by period or any other subset condition. It gets
called internally by load_expanded_data()
if
p_control
isn’t specified, and by
sample_expanded_data()
if no specific sampling method
exists for a te_datastore
child class.
te_datastore
child class objectNULL
and selects all available trial
periodsNULL
and skips subsettingdata.table
objectThis method is used for reading and sampling the expanded data. The
data can be subset by period or any other subset condition plus it can
be sampled using the p_control
argument. It gets called
internally by load_expanded_data()
if
p_control
is specified.
If no method for the child class exists, the method of the parent
class will be used instead which will read and subset the data using
read_expanded_data()
. Then the sampling happens in bulk,
which might cause problems for large datasets. For speed or memory
reasons it might be necessary to implement a more efficient method for a
new child class.
te_datastore
child class objectNULL
and selects all available trial
periodsNULL
and skips subsettingdata.table
object