Plant Health Bioinformatics Network (PHBN)
Authors
Haegeman, Annelies; Rott, Michael; Nicolaisen, Morgen; Remenant, Benoît; Candresse, Thierry; Sela, Noa; Kutnjak, Denis; De Cal, Antonieta; Margaria, Paolo; Miozzi, Laura; Chiumenti, Michela; Okinyi Sewe, Steven; van Duivenbode, Inge; van Gurp, Thomas; Schumpp, Olivie
Description
Plant disease detection by high-throughput sequencing (HTS) is a relatively new and fast developing discipline, with very variable levels of expertise in phytopathology and diagnostics laboratories across the world. The overall goal of the “Plant Health Bioinformatics Network” project (PHBN) was to join different laboratories working with HTS applied to plant disease diagnostics problems and stimulate the exchange of information regarding HTS data analysis, as well as on the interpretation of the results of HTS data in a plant diagnostic context.
In a collaborative effort of > 20 scientists from 11 different countries, open source training materials were developed. This resulted in the guide ‘A primer on the analysis of high-throughput sequencing data for detection of plant viruses‘, which is useful for both beginners and experts. This guide includes a glossary of terms, a flowchart (showing the typical workflow of an analysis), a checklist with things to keep in mind during data processing, a checklist with points of consideration during taxonomic classification and a quick-start guide. Data analysis pipelines were converted to training materials and compiled with other already well-documented pipelines to make them publicly available.
Training people alone is not sufficient to develop good bioinformatics skills. People may follow a tutorial meticulously, but if the steps/parameters used are not suited for the specific case they investigate, they might misinterpret the results. In order to make virologists and bioinformaticians more aware of the strengths and weaknesses of their pipeline, (semi-)artificial datasets were designed and tested. Nine challenges in data analysis that can occur when analyzing HTS datasets for the detection and identification of plant viruses were identified. Based on these challenges, several plant-derived Illumina RNA-seq datasets were selected from different international partners. Three of them showed already one of the challenges and were not modified. For 7 other datasets, artificial reads were added as spike-in, with known read numbers, to mimic one of the challenges. Finally, 8 completely artificial datasets were made for haplotype reconstruction. The (semi-)artificial datasets were made publicly available and recommended by “Peer Community in Genomics”. A VIROMOCK challenge was then launched to encourage scientists to analyze the data and upload their results. Although only 29 reports were received (i.e. on average 3 per dataset), we were able to observe that most differences between the participants were due to mapping settings and the choice of the reference genome(s).
Finally, we wanted to demonstrate the potential of HTS in the detection of (non-viral) plant pathogens and pests by re-analyzing existing RNA-seq datasets in an RNA-seq screening effort. More specifically, we asked the plant virology community to re-analyze some of their existing datasets, in order to check if traces could be found of non-viral pathogens. This is often overlooked since most virologists only compare the reads or contigs with plant virus sequence databases. In total 15 scientists participated in the screening, together analyzing 101 datasets of which 37 datasets were selected for detailed analysis at ILVO (BE). 29 of the 37 datasets revealed the potential presence of non-viral plant pathogens, with fungi, insects and mites the most observed organism categories. These results show that RNA-seq data generated by virologists can be used to investigate the potential presence of other potentially harmful organisms or potential virus vectors.
License
Funding
Files
File | Size | |
---|---|---|
external link | 49,74kB |