Construct necessary information for among-site variation in mutation rates that will be used in create_variants.

site_var(reference, shape = NULL, region_size = NULL, invariant = 0,
  mats = NULL, out_prefix = NULL, compress = FALSE,
  comp_method = "bgzip")

Arguments

reference

A ref_genome object from which you will eventually generate variants.

shape

Shape parameter for the Gamma distribution that generates gamma distances, The variance of the distribution is 1 / shape, and its mean is fixed to 1. Values <= 0 are not allowed. Defaults to NULL.

region_size

Size of regions to break the genome into, where all sites within a region have the same gamma distance. Defaults to NULL.

invariant

Proportion of regions that are invariant. Must be in range [0,1). Defaults to 0.

mats

List of matrices, one for each sequence in the genome. Each matrix should have two columns. The first should contain the end points for each region. The second should contain the gamma distances for each region. Note that if gamma distances don't have a mean (weighted by sequence length for each gamma-distance value) equal to 1, you're essentially changing the overall mutation rate. If this argument is provided, shape and region_size are ignored. Defaults to NULL.

out_prefix

String specifying the file name prefix for an output BED file that will be generated by this function and that will specify the gamma distances for each region. If NULL, no output file is produced. Defaults to NULL.

compress

Logical specifying whether or not to compress output file, or an integer specifying the level of compression, from 1 to 9. If TRUE, a compression level of 6 is used. Defaults to FALSE.

comp_method

Character specifying which type of compression to use if any is desired. Options include "gzip" and "bgzip". This is ignored if compress is FALSE. Defaults to "bgzip".

Value

A site_var_mats object, which is a wrapper around a list of matrices, one for each sequence in the reference genome. Although the print method is different, you can otherwise treat these objects the same as you would a list (e.g., x[[1]], x[1:2], length(x)).

Details

A site's deviance from the average mutation rate is determined by its "gamma distance". A site's overall mutation rate is the mutation rate for that nucleotide (substitution + indel) multiplied by the site's gamma distance. There are two options for specifying gamma distances:

  1. Generate gamma distances from a Gamma distribution. This method will be used if the shape and region_size arguments are provided. If the mats argument is also provided, this method will NOT be used. See argument descriptions for more info.

  2. Manually input matrices that specify the gamma distance and end points for regions each gamma distance refers to. This method will be used if the mats argument is provided. See argument descriptions for more info.

Examples

ref <- create_genome(3, 100) # generating from Gamma distribution gamma_mats <- site_var(ref, shape = 0.5, region_size = 5) # with custom matrices gamma_mats <- site_var(ref, mats = replicate(3, cbind(seq(10, 100, 10), rgamma(10, 0.9)), simplify = FALSE))