Merge R Function

tmerge {survival}R Documentation

Time based merge for survival data


A common task in survival analysis is the creation of start,stop datasets which have multiple intervals for each subject, along with thecovariate values that apply over that interval. This function aidsin the creation of such data sets.


  1. It will show its definition, uses and different arguments it takes in detail as shown below. To know more about how to get help in R you can follow one of my previous article Getting help in R. Additionally to know more about merge function in R you can follow the R documentation link merge function.
  2. Left join only selected columns in R with the merge function. Ask Question Asked 6 years, 10 months ago. Active 1 year, 3 months ago. Viewed 131k times.



the primary data set, to which new variables and/orobservation will be added


second data set in which all the other argumentswill be found


subject identifier


operations that add new variables or intervals, seebelow


optional variable to define the valid time range foreach subject, only used on an initial call


optional variable to define the valid time range foreach subject, only used on an initial call


a list of options. Valid ones are idname, tstartname,tstopname, delay, na.rm, and tdcstart. See the explanation below.


Search all packages and functions. Sp (version 1.4-4) merge: Merge a Spatial. object having attributes with a data.frame Description.

The program is often run in multiple passes, the first of whichdefines the basic structure, and subsequent ones that add newvariables to that structure. For a more complete explanation of how thisroutine works refer to the vignette on time-dependent variables.

There are 4 types of operational arguments: a time dependent covariate(tdc), cumulative count (cumtdc), event (event) or cumulative event(cumevent).Time dependent covariates change their values before an event,events are outcomes.

  • newname = tdc(y, x, init) A new time dependent covariatevariable will created. The argument y is assumed to be on thescale of the start and end time, and each instance describes theoccurrence of a 'condition' at that time.The second argument x is optional. In the case wherex is missing the count variable starts at 0 for each subjectand becomes 1 at the time of the event.If x is present the value of the time dependent covariateis initialized to value of init, if present, orthe tdcstart option otherwise, and is updated to thevalue of x at each observation.If the option na.rm=TRUE missing values of x arefirst removed, i.e., the update will not create missing values.

    newname = cumtdc(y,x, init) Similar to tdc, except that the eventcount is accumulated over time for each subject. The variablex must be numeric.

    newname = event(y,x) Mark an event at time y.In the usual case that x is missing the new 0/1 variablewill be similar to the 0/1 status variable of a survival time.

    newname = cumevent(y,x) Cumulative events.

The function adds three new variables to the output data set: tstart, tstop, and id. The options argumentcan be used to change these names.If, in the first call, the id argument is a simple name, thatvariable name will be used as the default for the idname option.If data1 contains the tstart variable then that is used asthe starting point for the created time intervals, otherwise the initialinterval for each id will begin at 0 by default.This will lead to an invalid interval and subsequent error if say adeath time were <= 0.

The na.rm option affects creation of time-dependent covariates.Should a data row in data2 that has a missing value for thevariable be ignored (na.rm=FALSE, default) or should it generate anobservation with a value of NA? The default value leads to'last value carried forward' behavior.The delay option causes a time-dependent covariate's newvalue to be delayed, see the vignette for an example.


Merge R Function

a data frame with two extra attributes tname andtcount.The first contains the names of the key variables; it's persistencefrom call to call allows the user to avoid constantly reentering theoptions argument.The tcount variable contains counts of the match types.New time values that occur before the first interval for a subjectare 'early', those after the last interval for a subject are 'late',and those that fall into a gap are of type 'gap'. All these are are considered to be outside the specified time frame for thegiven subject. An event of this type will be discarded.An observation in data2 whose identifier matches no rows indata1 is of type 'missid' and is also discarded.A time-dependent covariate value will be applied to later intervals butwill not generate a new time point in the output.

Combine Data In R

The most common type will usually be 'within', corresponding tothose new times thatfall inside an existing interval and cause it to be split into two.Observations that fall exactly on the edge of an interval but within the(min, max] time for a subject are countedas being on a 'leading' edge, 'trailing' edge or 'boundary'.The first corresponds for instanceto an occurrence at 17 for someone with an intervals of (0,15] and (17, 35].A tdc at time 17 will affect this intervalbut an event at 17 would be ignored. An eventoccurrence at 15 would count in the (0,15] interval.The last case is where the main data set has touchingintervals for a subject, e.g. (17, 28] and (28,35] and a new occurrencelands at the join. Events will go to the earlier interval and countsto the latter one. A last column shows the number of additionswhere the id and time point were identical.When this occurs, the tdc and event operators will usethe final value in the data (last edit wins), but ignoring missing,while cumtdc and cumevent operators add up the values.

These extra attributes are ephemeral and will be discardedif the dataframe is modified. This is intentional, since they willbecome invalid if for instance a subset were selected.


R Merge Function Multiple Columns

Terry Therneau

R Merge Two Functions

See Also

R Merge Function Example