Genomics Abstract To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples 97 underwent low-depth-of-coverage whole-genome sequencing. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.

Author:Kigul Gagal
Language:English (Spanish)
Published (Last):17 March 2015
PDF File Size:5.72 Mb
ePub File Size:17.83 Mb
Price:Free* [*Free Regsitration Required]

Metrics details Abstract Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide.

The genomes of 12 Drosophila species, ten of which are presented here for the first time sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi , illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale.

These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions.

These may prove to underlie differences in the ecology and behaviour of these diverse species. Download PDF Main As one might expect from a genus with species living in deserts, in the tropics, on chains of volcanic islands and, often, commensally with humans, Drosophila species vary considerably in their morphology, ecology and behaviour 1.

Species in this genus span a wide range of global distributions: the 12 sequenced species originate from Africa, Asia, the Americas and the Pacific Islands, and also include cosmopolitan species that have colonized the planet D. A variety of behavioural strategies is also encompassed by the sequenced species, ranging in feeding habit from generalist, such as D. Despite this wealth of phenotypic diversity, Drosophila species share a distinctive body plan and life cycle. Although only D. Thus, in addition to providing an extensive resource for the study of the relationship between sequence and phenotypic diversity, the genomes of these species provide an excellent model for studying how conserved functions are maintained in the face of sequence divergence.

These genome sequences provide an unprecedented dataset to contrast genome structure, genome content, and evolutionary dynamics across the well-defined phylogeny of the sequenced species Fig.

Figure 1: Phylogram of the 12 sequenced species of Drosophila. Phylogram derived using pairwise genomic mutation distances and the neighbour-joining method , Numbers below nodes indicate the per cent of genes supporting a given relationship, based on evolutionary distances estimated from fourfold-degenerate sites left of solidus and second codon positions right of solidus.

Coloured blocks indicate support from bayesian posterior probability PP , upper blocks and maximum parsimony MP; bootstrap values, lower blocks analyses of data partitioned by chromosome arm. Branch lengths indicate the number of mutations per site at fourfold-degenerate sites using the ordinary least squares method.

See ref. Genome assembly, annotation and alignment Genome sequencing and assembly We used the previously published sequence and updated assemblies for two Drosophila species, D. These species were chosen to span a wide variety of evolutionary distances, from closely related pairs such as D. Whereas the time to the most recent common ancestor of the sequenced species may seem small on an evolutionary timescale, the evolutionary divergence spanned by the genus Drosophila exceeds that of the entire mammalian radiation when generation time is taken into account, as discussed further in ref.

We sequenced seven of the new species D. We sequenced two species, D. Finally, seven inbred strains of D. Further details of the sequencing strategy can be found in Table 1 , Supplementary Table 1 and section 1 in Supplementary Information. Table 1 A summary of sequencing and assembly properties of each new genome Full size table We generated an initial draft assembly for each species using one of three different whole-genome shotgun assembly programs Table 1.

For D. We improved the initial 2. This integration markedly improved the D. Finally, one advantage of sequencing genomes of multiple closely related species is that these evolutionary relationships can be exploited to dramatically improve assemblies.

For the remaining species, comparative syntenic information, and in some cases linkage information, were also used to pinpoint locations of probable genome mis-assembly, to assign assembly scaffolds to chromosome arms and to infer their order and orientation along euchromatic chromosome arms, supplementing experimental analysis based on known markers A.

Bhutkar, S. Russo, S. Schaeffer, T. Smith and W. Gelbart, personal communication Supplementary Information section 2. The mitochondrial mt DNA of D. For the remaining species except D. In addition, the genome sequences of three Wolbachia endosymbionts Wolbachia wSim, Wolbachia wAna and Wolbachia wWil were assembled from trace archives, in D. All of the genome sequences described here are available in FlyBase www. Repeat and transposable element annotation Repetitive DNA sequences such as transposable elements pose challenges for whole-genome shotgun assembly and annotation.

Previously curated transposable element libraries in D. We assessed the accuracy of each method by calibration with the estimated 5. On the basis of our results, we suggest a hybrid strategy for new genome sequences, employing translated BLAST with general transposable element libraries and RepeatMasker with species-specific ReAS libraries to estimate the upper and lower bound on transposable element content.

These gene prediction sets were combined using GLEAN, a gene model combiner that chooses the most probable combination of start, stop, donor and acceptor sites from the input predictions 27 , All analyses reported here, unless otherwise noted, relied on a reconciled consensus set of predicted gene models—the GLEAN-R set Table 2 , and Supplementary Information section 4. Table 2 A summary of annotated features across all 12 genomes Full size table Quality of gene models As the first step in assessing the quality of the GLEAN-R gene models, we used expression data from microarray experiments on adult flies, with arrays custom-designed for D.

Evolutionarily conserved gene models are much more likely to be expressed than lineage-specific ones Fig. Although these data cannot confirm the detailed structure of gene models, they do suggest that the majority of GLEAN-R models contain sequence that is part of a poly-adenylated transcript.

Thus, transcript abundance cannot conclusively establish the presence or absence of a protein-coding gene. Nonetheless, we believe these expression data increase our confidence in the reliability of the GLEAN-R models, particularly those supported by homology evidence Fig. Figure 2: Gene models in 12 Drosophila genomes. Number of gene models that fall into one of five homology classes: single-copy orthologues in all species single-copy orthologues , conserved in all species as orthologues or paralogues conserved homologues , a D.

For those species with expression data 29 , pie charts indicate the fraction of genes in each homology class that fall into one of four evidence classes see text for details. Full size image Because the GLEAN-R gene models were built using assemblies that were not repeat masked, it is likely that some proportion of gene models are false positives corresponding to coding sequences of transposable elements.

These procedures suggest that 5. Transposable element-contaminated gene models are excluded from the final gene prediction set used for subsequent analysis, unless otherwise noted. Homology assignment Two independent approaches were used to assign orthology and paralogy relationships among euchromatic D.

Because the FRB algorithm does not integrate syntenic information, we also used a second approach based on Synpipe Supplementary Information section 5. To generate a reconciled set of homology calls, pairwise Synpipe calls between each species and D. There were 8, genes with single-copy orthologues in the melanogaster group and 6, genes with single-copy orthologues in all 12 species; similar numbers of genes were also obtained with an independent approach Most single-copy orthologues are expressed and are free from potential transposable element contamination, suggesting that the reconciled orthologue set contains robust and high-quality gene models Fig.

Moreover, assembly gaps and poor-quality sequence may lead to erroneous inferences of gene loss. To validate putative gene absences, we used a synteny-based GeneWise pipeline to find potentially missed homologues of D. Of the 21, cases in which a D.

Because this approach is conservative and only confirms strongly supported absences, we are probably underestimating the number of genuine absences. Coding gene alignment and filtering Investigating the molecular evolution of orthologous and paralogous genes requires accurate multi-species alignments.

To reduce biases in downstream analyses, a simple computational screen was developed to identify and mask problematic regions of each alignment Supplementary Information section 6. Overall, 2. The vast majority of masked bases are masked in no more than one species Supplementary Fig. We find an appreciably higher frequency of masked bases in lower-quality D. We used masked versions of the alignments, including only the longest D. This suggests that ncRNA pseudogenes are largely absent from Drosophila genomes, which is consistent with the low number of protein-coding pseudogenes in Drosophila The relatively low numbers of some classes of ncRNA genes for example, small nucleolar sno RNAs in the Drosophila subgenus are likely to be an artefact of rapid rates of evolution in these types of genes and the limitation of the homology-based methods used to annotate distantly related species.

Evolution of genome structure Coarse-level similarities among Drosophilids At a coarse level, genome structure is well conserved across the 12 sequenced species. Total genome size estimated by flow cytometry varies less than threefold across the phylogeny, ranging from Mb D. Total protein-coding sequence ranges from Intronic DNA content is also largely conserved, ranging from To investigate overall conservation of genome architecture at an intermediate scale, we analysed synteny relationships across species using Synpipe 32 Supplementary Information section 9.

Synteny block size and average number of genes per block varies across the phylogeny as expected, with the number of blocks increasing and the average size of blocks decreasing with increasing evolutionary distance from D. Russo, T. Gelbart, personal communication Supplementary Fig. We inferred syntenic blocks between D. Similarity across genomes is largely recapitulated at the level of individual genes, with roughly comparable numbers of predicted protein-coding genes across the 12 species Table 2.

The majority of predicted genes in each species have homologues in D. Moreover, most of the 13, protein-coding genes in D. The number of functional non-coding RNA genes predicted in each Drosophila genome is also largely conserved, ranging from in D. There are several possible explanations for the observed interspecific variation in gene content.

First, approximately D. Second, because low-coverage genomes tend to have more predicted gene models, we suspect that artefactual duplication of genomic segments due to assembly errors inflates the number of predicted genes in some species.

Finally, the non-melanogaster species have many more predicted lineage-specific genes than D. In the absence of experimental evidence, it is difficult to distinguish genuine lineage-specific genes from putative artefacts. Future experimental work will be required to fully disentangle the causes of interspecific variation in gene number.

Abundant genome rearrangements during Drosophila evolution To study the structural relationships among genomes on a finer scale, we analysed gene-level synteny between species pairs. These synteny maps allowed us to infer the history and locations of fixed genomic rearrangements between species.

Although Drosophila species vary in their number of chromosomes, there are six fundamental chromosome arms common to all species. Muller, and are denoted A—F.

Although most pairs of orthologous genes are found on the same Muller element, there is extensive gene shuffling within Muller elements between even moderately diverged genomes Fig. The horizontal axis shows D.


Full Cast & Crew

Background and early life Robert Drummond was born on 13 November , the 3rd surviving son of at least 13 children of William Drummond, 4th Viscount Strathallan and his wife Margaret, daughter of Lord William Murray. William Drummond was a prominent Jacobite. He had taken part in the Rising, and had been taken prisoner at Sherrifmuir. Robert Drummond was brought up on the family estate at Machany in Perthshire, but in he and his younger brother Henry were sent to London to live with their uncle Andrew Drummond.


Editorial Summary


Related Articles