Home Topics OMICs The Rise of Long-Read Transcriptome Sequencing

Plenary speaker Nicola Hall, University of Oxford, who presented on revealing mRNA alternative splicing complexity in the human brain.

The Rise of Long-Read Transcriptome Sequencing

July 11, 2019

Plenary speaker Nicola Hall, University of Oxford, who presented on revealing mRNA alternative splicing complexity in the human brain.

Review written by Wilfried Haerty–Earlham Institute, from London Calling 2019

LondonCalling London Calling is an annual conference hosted by Oxford Nanopore Technologies, dedicated to sharing the latest nanopore sequencing research from scientists around the world.

Oxford Nanopore offers scalable DNA/RNA sequencing devices from portable to benchtop. Our goal is to enable the analysis of any living thing, by any person, in any environment.

Sponsored content brought to you by

Nearly 95% of the genes in humans undergo alternative splicing, the process through which exons are differentially included in the mature transcripts, leading to the production of many different protein isoforms from a single gene.

For the past decade, researchers have relied on existing annotations and short-read sequencing to reconstruct alternatively spliced transcripts and quantify their expression across individuals, development, tissues, and single cells. This approach has led to fundamental novel biological insights on the prevalence and importance of splicing. While short-read sequencing power stems from large read output and high accuracy in base calling enabling exon and splice junction quantification, end-to-end transcript reconstruction remains a significant challenge.¹ Because of the short read size and often complex gene models, most existing algorithms predicting transcripts have high false positive rates.

The rise of long-read technologies now opens the possibility to fully capture a full-length transcript within a single read and thereby accurately annotate and quantify transcripts. The need for such technologies is exemplified by a single fruit fly gene: Dscam. This gene is predicted to have nearly 36,000 isoforms due to the presence of blocks of mutually exclusive exons.² The application of long-read sequencing and long-range PCR allowed validation of nearly 18,000 of these transcripts in an organism predicted to have almost 14,000 protein coding genes.

In the past few years, similar studies for disease-relevant genes revealed significantly underappreciated transcript diversity, even in organisms such as humans – considered to have gold standard annotations. For single genes, hundreds of novel transcripts, either including previously unannotated exons, novel splice junctions, and/or splice sites have been added to a single gene, while only a fraction of previously annotated transcripts could be confirmed.^3,4

With the increased sequencing throughput and cost reductions, long-read full transcriptome sequencing is now being performed, either to generate highly accurate annotations for newly assembled genomes, to further expand and improve existing annotations, or conduct differential splicing analyses. Alternative splicing has been well described to be regulated across development, tissue, and cell types.

With the rise of single-cell sequencing, long-read sequencing is now being applied to identify cell-specific splicing patterns. Current single-cell sequencing approaches relying on 3’ end sequence, while enabling the characterization of the transcriptome of thousands of single cells in a single experiment, prevent transcript level analysis. The first studies that applied long-read sequencing made use of PCR amplification to reach the required input. However, new protocols take advantage of microfluidic approaches to isolate cells and barcode the endogenous RNA prior to pooling for long-read sequencing, relying on computational identification of cells. This approach allows the screening of thousands of cells, and despite the relatively low read count per cell, has already enabled the annotation of 10,691 novel transcripts affecting 4,859 genes and the identification of cell type-specific splicing.⁵ With the increased sequencing output of new platforms and lower input requirements, single-cell full transcriptomes will soon be attained.

The democratization of long-read technologies means that novel applications pertaining to RNA are appearing. RNA modifications⁶ as well as RNA secondary structure can now be probed through direct RNA sequencing. This revolution in RNA sequencing technologies opens new areas of investigation in biology.

Find out more about transcriptome analysis with nanopore sequencing at
www.nanoporetech.com/transcriptome

References
1. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat Methods. 10:1177–1184 (2013).
2. Sun, W. et al. Ultra-deep profiling of alternatively spliced Drosophila Dscam isoforms by circularization-assisted multi-segment sequencing. EMBO J. 32(14):2029–2038 (2013).
3. Clark, M. et al. Long-read sequencing reveals the splicing profile of the calcium channel gene CACNA1C in human brain. BioRxiv. 260562 (2018).
5. Gupta, I. et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat Biotechnol. 36(12):1197–1202 (2018).
6. Wongsurawat, T. et al. Rapid Sequencing of Multiple RNA Viruses in Their Native Form. Front Microbiol 10:260 (2019).

The Rise of Long-Read Transcriptome Sequencing

OncoMethylome Enhances Pharmacogenomic Services in Deal with BioTrove

Mechanism for Gene Silencing in Skin Cancer Discovered