It is likely that some of the annotated isoforms that were not found in adult brain are expressed in other tissues and/or at other stages of development and ageing, but some may be false positives. Despite detecting a total of 251 different CACNA1C isoforms, there was strong support for only 10 of the 31 previously annotated in GENCODE (v27) and only one of these was amongst the ten most abundant isoforms. Long-read sequencing studies may also require current annotations to be re-evaluated to remove false positives. This is far higher than would be expected if they simply represented transcriptional noise, which would, by definition, be expected to induce frame shifts in two thirds of transcripts.
they predict full length channels that include all domains critical for function). In support of this assertion, ~ 90% of the population of CACNA1C transcripts sequenced in human brain are predicted to encode functional voltage-gated calcium channels (i.e. However, it is also possible that transcripts that appear minor in studies of bulk tissue are more prominent in cellular subpopulations. Clearly, it is possible that many of the minor isoforms reflect transcriptional noise. As the use of long-read sequencing becomes more prominent it is likely that many novel exons and isoforms will be discovered for other genes. For example, targeted long-read sequencing of CACNA1C transcripts from just one start exon identified 38 novel exons and 241 novel transcript isoforms, as well as abundant splice site variations. In addition, sequencing at depth and/or combining this technology with enrichment approaches also provides a means to identify novel, full-length transcripts. Novel long-read RNA sequencing technologies, such as Oxford Nanopore Technologies and PacBio, allow full length transcript isoforms to be sequenced, thereby providing the potential to eliminate false positive isoforms arising from reconstruction errors. For example, in the case of human brain, outside of rarefied cellular populations, large-scale sequencing efforts necessarily focus either on bulk tissue, which contains a mixture of diverse cellular populations, or single nucleus sequencing, which does not necessarily reflect the total transcript pool. However, even if it were possible to prepare perfect sequencing libraries, annotations are inherently biased by the relative unavailability of many types of relevant input material, particularly in the case of human tissues. Technical biases can be introduced by sample preparation. transcripts that exist but are not currently annotated, are also likely.
Conversely, biases in the types of samples that have been historically sequenced mean that false negatives, i.e. Thus, it is possible that some currently annotated full-length isoforms are either incomplete or represent false positives. However, although annotations continue to improve, inaccuracies are introduced by the need to computationally reconstruct full-length transcript isoforms from short-read data. Many projects have produced, or are aiming to produce, a reference transcriptome to synthesise the wealth of (highly redundant) sequencing information, although the resulting annotations vary due to differences in algorithms applied and the extent to which annotations are manually curated. We highlight three areas that pose potential barriers to effective information transfer and offer suggestions as to how these may be addressed: firstly, a lack of clarity about the strength of the evidence for individual transcripts in current annotations secondly, limitations to the transfer of information between nucleotide and protein databases thirdly, challenges relating to the nomenclature used for transcriptional events and RNA modifications, both for genomic researchers and the wider scientific community. Here, we outline how the status quo can result in information becoming siloed and/or ambiguous, using the CACNA1C gene, which encodes a voltage-gated calcium channel, as an example. However, there are multiple barriers to their efficient dissemination and their translation into functional insights. It is important that these findings are readily accessible to the wider scientific community to maximise their impact. The advent of cost-effective high-throughput nucleotide sequencing means that information about the transcriptome is accruing at an exponential rate, rapidly refining our understanding of the diversity of gene products.