NGS Science Case: identifying genes that correlate to disease
Life Sciences, Genomics
High-throughput experimental methods in Molecular Biology, such as next generation sequencing (NGS), provide the quantitative basis for gaining a better understanding of human disease. However, multifactorial diseases such as diabetes and atherosclerosis are complex disorders involving hundreds of genes and many developmental and environmental factors. Computational methods are needed that can uncover the molecular networks perturbed by disease. The set of WS-PGRADE workflows developed during the ER-flow project provide a toolbox to analyse DNA sequencing data. They execute on the Dutch grid using resources of the VLEMED VO. They can be executed from the generic interface of the SHIWA Simulation Platform and from the AMC generic WS-PGRADE portal.
NGS is the holy grail of clinical research due to the promise to reveal the most hidden mechanisms of disease. Massive amounts of NGS data are being collected all around the globe. These workflows help the challenging task of extracting information from such big data.
The analysis of NGS data involves a large number of steps that have been implemented in individual workflows, such as alignment of sequences to a reference genome; sequence assembly, identification of insertions/deletions, identification of SNP (single nucleotide polymorphism), etc.
Note: this is the most prestigious example of grid workflow usage for genomics that we have in our organization so far. In this case the sequence alignment workflow (BWA) was executed to validate the results against a larger population, upon request of the reviewers of that journal. The authors chose to validate their findings against the Genome of the Netherlands dataset (750 individuals). The processing needed to be done within the given rebuttal time, therefore the usage of a grid made it possible for the authors to be able to respond within the deadline. The workflow suite developed in ER-flow actually covers much more than sequence alignment.
Nicolaides-Baraitser syndrome (NBS) is characterized by sparse hair, distinctive facial morphology, distal-limb anomalies and intellectual disability. In a study conducted at the AMC the exomes of ten individuals with NBS and identified heterozygous variants in SMARCA2 in eight of them. Exomes are an interesting region of interest in genes corresponding to the part of the genome formed by exons, the sequences which when transcribed remain within the mature RNA after introns are removed by RNA splicing. Exons are related to transcription into proteins and are considered to indicate more direct relationships with disease manifestation.
Extended molecular screening identified non-synonymous SMARCA2 mutations in 36 of 44 individuals with NBS. These mutations were confirmed to be de novo when parental samples were available. SMARCA2 encodes the core catalytic unit of the SWI/SNF ATP-dependent chromatin re-modelling complex that is involved in the regulation of gene transcription. The identification of SMARCA2 mutations in humans provides insight into the function of the Snf2 helicase family.
- Van Houdt JK, Nowakowska BA, Sousa SB, van Schaik BD, Seuntjens E, Avonce N, Sifrim A, Abdul-Rahman OA, van den Boogaard MJ, Bottani A, Castori M, Cormier-Daire V, Deardorff MA, Filges I, Fryer A, Fryns JP, Gana S, Garavelli L, Gillessen-Kaesbach G, Hall BD, Horn D, Huylebroeck D, Klapecki J, Krajewska-Walasek M, Kuechler A, Lines MA, Maas S, Macdermot KD, McKee S, Magee A, de Man SA, Moreau Y, Morice-Picard F, Obersztyn E, Pilch J, Rosser E, Shannon N, Stolte-Dijkstra I, Van Dijck P, Vilain C, Vogels A, Wakeling E, Wieczorek D, Wilson L, Zuffardi O, van Kampen AH, Devriendt K, Hennekam R, Vermeesch JR. (2012) Heterozygous missense mutations in SMARCA2 cause Nicolaides-Baraitser syndrome. Nature Genetics, 44(4), 445-9