Genome annotation by high-throughput 5 RNA end determinationByung Joon Hwang, Hans-Michael Mller, and Paul W. SternbergPaul W. SternbergB.A. degree from Hampshire College Ph.D. degree from M. I. T. under Robert HorvitzPostdoctoral research with Ira Herskowitz at the U. C., San Francisco Professor of Biology at the California Institute of Technology and Adjunct Professor of Cell and Neurobiology at the University of Southern California School of Medicine, Los AngelesSternberg LabByung Joon Hwang Post DocHans-Michael Mller - WormbaseTrans-spliced Exon Coupled RNA End Determination (TEC-RED)Identifies 5 ends of expressed genesCan distinguish coding regions from regulatory regionsUseful for identifying genes with alternatively spliced 5 ends.Developed in nematodes, but can work for any organism which use a spliced leader sequence (Sarcomastigophora, cndarians, nematodes, acoelomate flatworms and ascidians).Trans-splicing70% of mRNAs have one of two splice-leader sequences (SL1, SL2) trans-spliced onto the 5 end Spliced-leader sequences are transcribed independently as snRNAsTrans-splicing with splice-leader sequences produces single-gene mRNAsTEC-REDSequential ConcentrationEliminates the large-scale PCR reactions and gel purification of small oligonucleotides steps found in SAGE protocols.Since the 5 tags are directionally concentrated, the 5 end of the 5 tags are found next to the first anchor RE cut site.DNA Sequence AnalysisData13 525 5 tags (9 401 with SL1 and 4 124 for SL2), obtained from 800 sequencing reactions, were matched to the genomeRepresents 2 159 different sequences, 1 639 for SL1 and 520 for SL290% of tags corresponded to unique sitesOf the remaining 10%, 90% matched 2 or 3 sitesCont.To remove false positives they analyzed the just 5 of the 5 tag sites for a conserved splice acceptor consensus sequence93% of the 5 tag sequences remained true positives.75% of tag sequences matched know 5 end in WS10099 new genes identified32 previously unknown operons identifiedConclusionMethod for annotation of 5 end of genes.This protocol works for only organisms with a splice leader sequenceSequential concentration method applied to SAGE protocols