Note: This class wraps a pointer to a C++ object, so do NOT change fields in this class directly. It will cause your R session to do bad things. (Ever seen the bomb popup on RStudio? Manually mess with these fields and you surely will.) For safe ways of manipulating the reference genome, see the "Methods" section.



An R6Class generator object


An object of class ref_genome.



An externalptr to a C++ object storing the sequences representing the genome.


Viewing information:


View the number of sequences.


View vector of sequence sizes.


View vector of sequence names.


View a sequence string based on an index, seq_ind.

gc_prop(seq_ind, start, end)

View the GC proportion for a range within a reference sequence.

nt_prop(nt, seq_ind, start, end)

View the proportion of a range within a reference sequence that is of nucleotide nt.

Editing information:


Set names for all sequences. new_names is a character vector of what to change names to, and it must be the same length as the # sequences.


Clean sequence names, converting " :;=%,\|/\"\'" to "_".

add_seqs(new_seqs, new_names = NULL)

Add one or more sequences directly. They can optionally be named (using new_names). Otherwise, their names are auto-generated.


Remove one or more sequences based on names in the seq_names vector.


Merge all sequences into one after first shuffling their order.

filter_seqs(threshold, method)

Filter sequences by size (method = "size") or for a proportion of total bases (method = "prop"). For the latter, sequences are first size-sorted, then the largest N sequences are retained that allow at least threshold * sum(<all sequence sizes>) base pairs remaining after filtering.

replace_Ns(pi_tcag, n_threads = 1, show_progress = FALSE)

Replace Ns in reference sequence with nucleotides sampled with probabilities given in pi_tcag. You can optionally use multiple threads (n_threads argument) and/or show a progress bar (show_progress).

See also