Bonito is a experimental basecaller for Oxford Nanopore reads. I wanted to assess if/how this basecaller improves read accuracy with data of one of the crops with which we are working; namely potato. Our initial focus was on improving basecalling accuracy using data from a nearly homozygous diploid potato cultivar names Solyntus. We developed and published the genome sequence of this cultivar. Besides improving read accuracy, which could help to further improve that genome assembly, our main ambition is to improve basecalling accuracy in tetraploid potato, which would allow us to perform phase specific genome assemblies.
ONT provides a set of experimental models via its rerio repository. Rerio is comprised of "research release" base calling models and configuration files.
ONT Bonito allows training of (crop specific) basecalling models.
We tested the current state of the art basecaller (guppy v4.5.2) using the High Accuracy Model and three rerio models and compared this to several self trained potato models using bonito. Data was obtained from R9.4.1. pores.
Self trained models are more accurate but more accurate basecalling is increasing processing time.
A subset of 40K ONT reads ware take and using either guppy or bonito mapped against the Solyntus reference genome. Per read mapping accuracy was measured and plotted.
*note crf model required smaller chunks parameter during base calling with guppy, otherwise the NVIDIA card (RTX2080) ran out of memory (8Gb).
Guppy models clearly outperform the self-trained bonito models, though increase in time using the guppy compatible crf model increases base calling time as well.
Method / model
Guppy v4.5.2 HAC
Guppy v4.5.2 min_modbases_5dmc
Guppy v4.5.2 min_modbases 5mc_5hmC
Guppy v4.5.2 crf 0.3.2