From either a reference genome or set of variant haplotypes, create PacBio reads
and write them to FASTQ output file(s).
I encourage you to cite the reference below in addition to jackalope
if you use
this function.
pacbio(obj, out_prefix, n_reads, chi2_params_s = c(0.01214, 5.12, 675, 48303.0732881, 1.4691051212330266), chi2_params_n = c(0.00189237136, 2.53944970, 5500), max_passes = 40, sqrt_params = c(0.5, 0.2247), norm_params = c(0, 0.2), prob_thresh = 0.2, ins_prob = 0.11, del_prob = 0.04, sub_prob = 0.01, min_read_length = 50, lognorm_read_length = c(0.200110276521, 10075.4363813, 17922.611306), custom_read_lengths = NULL, prob_dup = 0.0, haplotype_probs = NULL, sep_files = FALSE, compress = FALSE, comp_method = "bgzip", n_threads = 1L, read_pool_size = 100L, show_progress = FALSE, overwrite = FALSE)
obj  Sequencing object of class 

out_prefix  Prefix for the output file(s), including entire path except for the file extension. 
n_reads  Number of reads you want to create. 
chi2_params_s  Vector containing the 5 parameters for the curve determining
the scale parameter for the chi^2 distribution.
Defaults to 
chi2_params_n  Vector containing the 3 parameters for the function
determining the n parameter for the chi^2 distribution.
Defaults to 
max_passes  Maximal number of passes for one molecule.
Defaults to 
sqrt_params  Vector containing the 2 parameters for the square root
function for the quality increase.
Defaults to 
norm_params  Vector containing the 2 parameters for normal distributed
noise added to quality increase square root function
Defaults to 
prob_thresh  Upper bound for the modified total error probability.
Defaults to 
ins_prob  Probability for insertions for reads with one pass.
Defaults to 
del_prob  Probability for deletions for reads with one pass.
Defaults to 
sub_prob  Probability for substitutions for reads with one pass.
Defaults to 
min_read_length  Minium read length for lognormal distribution.
Defaults to 
lognorm_read_length  Vector containing the 3 parameters for lognormal
read length distribution.
Defaults to 
custom_read_lengths  Sample read lengths from a vector or column in a
matrix; if a matrix, the second column specifies the sampling weights.
If 
prob_dup  A single number indicating the probability of duplicates.
Defaults to 
haplotype_probs  Relative probability of sampling each haplotype.
This is ignored if sequencing a reference genome.

sep_files  Logical indicating whether to make separate files for each haplotype.
This argument is coerced to 
compress  Logical specifying whether or not to compress output file, or
an integer specifying the level of compression, from 1 to 9.
If 
comp_method  Character specifying which type of compression to use if any
is desired. Options include 
n_threads  The number of threads to use in processing.
If 
read_pool_size  The number of reads to store before writing to disk.
Increasing this number should improve speed but take up more memory.
Defaults to 
show_progress  Logical for whether to show a progress bar.
Defaults to 
overwrite  Logical for whether to overwrite existing FASTQ file(s) of the same name, if they exist. 
Nothing is returned.
The ID lines for FASTQ files are formatted as such:
@<genome name><chromosome name><starting position><strand>
where genome name
is always REF
for reference genomes (as opposed to haplotypes).
Stöcker, B. K., J. Köster, and S. Rahmann. 2016. SimLoRD: simulation of long read data. Bioinformatics 32:2704–2706.
# \donttest{ rg < create_genome(10, 100e3, 100) dir < tempdir(TRUE) pacbio(rg, paste0(dir, "/pacbio_reads"), n_reads = 100) # }