Current Genomics (v.12, #5)

Shaping the Genome with Non-Coding RNAs by Xue Q.D. Wang, Jennifer L. Crutchley, Josee Dostie (307-321).
The human genome must be tightly packaged in order to fit inside the nucleus of a cell. Genome organization isfunctional rather than random, which allows for the proper execution of gene expression programs and other biologicalprocesses. Recently, three-dimensional chromatin organization has emerged as an important transcriptional controlmechanism. For example, enhancers were shown to regulate target genes by physically interacting with them regardless oftheir linear distance and even if located on different chromosomes. These chromatin contacts can be measured with thechromosome conformation capture (3C) technology and other 3C-related techniques. Given the recent innovation of3C-derived approaches, it is not surprising that we still know very little about the structure of our genome at highresolution.Even less well understood is whether there exist distinct types of chromatin contacts and importantly, whatregulates them. A new form of regulation involving the expression of long non-coding RNAs (lncRNAs) was recentlyidentified. lncRNAs are a very abundant class of non-coding RNAs that are often expressed in a tissue-specific manner.Although their different subcellular localizations point to their involvement in numerous cellular processes, it is clear thatlncRNAs play an important role in regulating gene expression. How they control transcription however is mostlyunknown. In this review, we provide an overview of known lncRNA transcription regulation activities. We also discusspotential mechanisms by which ncRNAs might exert three-dimensional transcriptional control and what recent studieshave revealed about their role in shaping our genome.

Genetics and Mitochondrial Abnormalities in Autism Spectrum Disorders: A Review by Sukhbir Dhillon, Jessica A. Hellings, Merlin G. Butler (322-332).
We review the current status of the role and function of the mitochondrial DNA (mtDNA) in the etiology ofautism spectrum disorders (ASD) and the interaction of nuclear and mitochondrial genes. High lactate levels reported inabout one in five children with ASD may indicate involvement of the mitochondria in energy metabolism and braindevelopment. Mitochondrial disturbances include depletion, decreased quantity or mutations of mtDNA producing defectsin biochemical reactions within the mitochondria. A subset of individuals with ASD manifests copy number variation orsmall DNA deletions/duplications, but fewer than 20 percent are diagnosed with a single gene condition such as fragile Xsyndrome. The remaining individuals with ASD have chromosomal abnormalities (e.g., 15q11-q13 duplications), othergenetic or multigenic causes or epigenetic defects. Next generation DNA sequencing techniques will enable bettercharacterization of genetic and molecular anomalies in ASD, including defects in the mitochondrial genome particularlyin younger children.

The Illusion of Distribution-Free Small-Sample Classification in Genomics by Edward R. Dougherty, Amin Zollanvari, Ulisses M. Braga-Neto (333-341).
Classification has emerged as a major area of investigation in bioinformatics owing to the desire to discriminatephenotypes, in particular, disease conditions, using high-throughput genomic data. While many classification rules havebeen posed, there is a paucity of error estimation rules and an even greater paucity of theory concerning error estimationaccuracy. This is problematic because the worth of a classifier depends mainly on its error rate. It is common place in bioinformaticspapers to have a classification rule applied to a small labeled data set and the error of the resulting classifier beestimated on the same data set, most often via cross-validation, without any assumptions being made on the underlyingfeature-label distribution. Concomitant with a lack of distributional assumptions is the absence of any statement regardingthe accuracy of the error estimate. Without such a measure of accuracy, the most common one being the root-mean-square(RMS), the error estimate is essentially meaningless and the worth of the entire paper is questionable. The concomitanceof an absence of distributional assumptions and of a measure of error estimation accuracy is assured in small-sample settingsbecause even when distribution-free bounds exist (and that is rare), the sample sizes required under the bounds areso large as to make them useless for small samples. Thus, distributional bounds are necessary and the distributional assumptionsneed to be stated. Owing to the epistemological dependence of classifiers on the accuracy of their estimated errors,scientifically meaningful distribution-free classification in high-throughput, small-sample biology is an illusion.

Since the time of Darwin, biologists have studied the origin and evolution of the Orchidaceae, one of the largestfamilies of flowering plants. In the last two decades, the extreme diversity and specialization of floral morphology and theuncoupled rate of morphological and molecular evolution that have been observed in some orchid species have spurred interestin the study of the genes involved in flower development in this plant family. As part of the complex network ofregulatory genes driving the formation of flower organs, the MADS-box represents the most studied gene family, bothfrom functional and evolutionary perspectives. Despite the absence of a published genome for orchids, comparative geneticanalyses are clarifying the functional role and the evolutionary pattern of the MADS-box genes in orchids. Variousevolutionary forces act on the MADS-box genes in orchids, such as diffuse purifying selection and the relaxation of selectiveconstraints, which sometimes reveals a heterogeneous selective pattern of the coding and non-coding regions. Theemerging theory regarding the evolution of floral diversity in orchids proposes that the diversification of the orchid perianthwas a consequence of duplication events and changes in the regulatory regions of the MADS-box genes, followed bysub- and neo-functionalization. This specific developmental-genetic code is termed the orchid code.

Stemming Epigenetics in Marine Stramenopiles by Florian Maumus, Pablo Rabinowicz, Chris Bowler, Maximo Rivarola (357-370).
Epigenetics include DNA methylation, the modification of histone tails that affect chromatin states, and smallRNAs that are involved in the setting and maintenance of chromatin modifications. Marine stramenopiles (MAS), whichare a diverse assemblage of algae that acquired photosynthesis from secondary endosymbiosis, include single-celled organismssuch as diatoms as well as multicellular forms such as brown algae. The recent publication of two diatom genomesthat diverged ~90 million years ago (mya), as well as the one of a brown algae that diverged from diatoms ~250Mya, provide a great system of related, yet diverged set of organisms to compare epigenetic marks and their relationships.For example, putative DNA methyltransferase homologues were found in diatoms while none could be identified in thebrown algal genome. On the other hand, no canonical DICER-like protein was found in diatoms in contrast to what is observedin brown algae. A key interest relies in understanding the adaptive nature of epigenetics and its inheritability. Incontrast to yeast that lack DNA methylation, homogeneous cultures of diatoms constitute an attractive system to studyepigenetic changes in response to environmental conditions such as nutrient-rich to nutrient-poor transitions which is especiallyrelevant because of their ecological importance. P. tricornutum is also of outstanding interest because it is observedas three different morphotypes and thus constitutes a simple and promising model for the study of the epigeneticphenomena that accompany cellular differentiation. In this review we focus on the insights obtained from MAS comparativegenomics and epigenomic analyses.

The Genetics of Vitamin C Loss in Vertebrates by Guy Drouin, Jean-Remi Godin, Benoit Page (371-378).
Vitamin C (ascorbic acid) plays important roles as an anti-oxidant and in collagen synthesis. These importantroles, and the relatively large amounts of vitamin C required daily, likely explain why most vertebrate species are able tosynthesize this compound. Surprisingly, many species, such as teleost fishes, anthropoid primates, guinea pigs, as well assome bat and Passeriformes bird species, have lost the capacity to synthesize it. Here, we review the genetic bases behindthe repeated losses in the ability to synthesize vitamin C as well as their implications. In all cases so far studied, the inabilityto synthesize vitamin C is due to mutations in the L-gulono-γ-lactone oxidase (GLO) gene which codes for the enzymeresponsible for catalyzing the last step of vitamin C biosynthesis. The bias for mutations in this particular gene is likelydue to the fact that losing it only affects vitamin C production. Whereas the GLO gene mutations in fish, anthropoid primatesand guinea pigs are irreversible, some of the GLO pseudogenes found in bat species have been shown to be reactivatedduring evolution. The same phenomenon is thought to have occurred in some Passeriformes bird species. Interestingly,these GLO gene losses and reactivations are unrelated to the diet of the species involved. This suggests that losingthe ability to make vitamin C is a neutral trait.