Bonito is a experimental basecaller for Oxford Nanopore reads. I wanted to assess if/how this basecaller improves read accuracy with data of one of the crops with which we are working; namely potato. Our initial focus was on improving basecalling accuracy using data from a nearly homozygous diploid potato cultivar names Solyntus. We developed and published the genome sequence of this cultivar. Besides improving read accuracy, which could help to further improve that genome assembly, our main ambition is to improve basecalling accuracy in tetraploid potato, which would allow us to perform phase specific genome assemblies.
ONT provides a set of experimental models via its rerio repository. Rerio is comprised of "research release" base calling models and configuration files.
ONT Bonito allows training of (crop specific) basecalling models.
We tested the current state of the art basecaller (guppy v4.5.2) using the High Accuracy Model and three rerio models and compared this to several self trained potato models using bonito. Data was obtained from R9.4.1. pores.
Conclusion
Self trained models are more accurate but more accurate basecalling is increasing processing time.
Accuracy
A subset of 40K ONT reads ware take and using either guppy or bonito mapped against the Solyntus reference genome. Per read mapping accuracy was measured and plotted.
*note crf model required smaller chunks parameter during base calling with guppy, otherwise the NVIDIA card (RTX2080) ran out of memory (8Gb).
Speed
Guppy models clearly outperform the self-trained bonito models, though increase in time using the guppy compatible crf model increases base calling time as well.
Method / model | time |
Guppy v4.5.2 HAC | 8 min |
Guppy v4.5.2 min_modbases_5dmc | 8 min |
Guppy v4.5.2 min_modbases 5mc_5hmC | 8 min |
Guppy v4.5.2 crf 0.3.2 | 40 min |
Bonito_05 | 140 min |
Bonito_05adjust | 74 min |
Bonito_15 | 140 min |