Construct necessary information for among-site variation in mutation rates that will
be used in create_variants
.
site_var(reference, shape = NULL, region_size = NULL, invariant = 0, mats = NULL, out_prefix = NULL, compress = FALSE, comp_method = "bgzip")
reference | A |
---|---|
shape | Shape parameter for the Gamma distribution that generates gamma distances,
The variance of the distribution is |
region_size | Size of regions to break the genome into,
where all sites within a region have the same gamma distance.
Defaults to |
invariant | Proportion of regions that are invariant.
Must be in range |
mats | List of matrices, one for each sequence in the genome.
Each matrix should have two columns.
The first should contain the end points for each region.
The second should contain the gamma distances for each region.
Note that if gamma distances don't have a mean (weighted by
sequence length for each gamma-distance value) equal to 1,
you're essentially changing the overall mutation rate.
If this argument is provided, |
out_prefix | String specifying the file name prefix for an output BED file that
will be generated by this function and that will specify the
gamma distances for each region.
If |
compress | Logical specifying whether or not to compress output file, or
an integer specifying the level of compression, from 1 to 9.
If |
comp_method | Character specifying which type of compression to use if any
is desired. Options include |
A site_var_mats
object, which is a wrapper around a list of matrices,
one for each sequence in the reference genome.
Although the print method is different, you can otherwise treat these objects
the same as you would a list (e.g., x[[1]]
, x[1:2]
, length(x)
).
A site's deviance from the average mutation rate is determined by its "gamma distance". A site's overall mutation rate is the mutation rate for that nucleotide (substitution + indel) multiplied by the site's gamma distance. There are two options for specifying gamma distances:
Generate gamma distances from a Gamma distribution.
This method will be used if the shape
and region_size
arguments
are provided.
If the mats
argument is also provided, this method will NOT be used.
See argument descriptions for more info.
Manually input matrices that specify the gamma distance and end points
for regions each gamma distance refers to.
This method will be used if the mats
argument is provided.
See argument descriptions for more info.
ref <- create_genome(3, 100) # generating from Gamma distribution gamma_mats <- site_var(ref, shape = 0.5, region_size = 5) # with custom matrices gamma_mats <- site_var(ref, mats = replicate(3, cbind(seq(10, 100, 10), rgamma(10, 0.9)), simplify = FALSE))