BBA - General Subjects (v.1860, #11PB)

Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis).A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis.pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR < 0.1). The three clusters of genes have distinct expression patterns during odontogenesis.Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects.By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Sparse affinity propagation clustering; Time series microarray; Dental anomalies; pySAPC;

Analysis of the chemical toxicity effects using the enrichment of Gene Ontology terms and KEGG pathways by Lei Chen; Yu-Hang Zhang; Quan Zou; Chen Chu; Zhiliang Ji (2619-2626).
Chemical toxicity is one of the major barriers for designing and detecting new chemical entities during drug discovery. Unexpected toxicity of an approved drug may lead to withdrawal from the market and significant loss of the associated costs. Better understanding of the mechanisms underlying various toxicity effects can help eliminate unqualified candidate drugs in early stages, allowing researchers to focus their attention on other more viable candidates.In this study, we aimed to understand the mechanisms underlying several toxicity effects using Gene Ontology (GO) terms and KEGG pathways. GO term and KEGG pathway enrichment theories were adopted to encode each chemical, and the minimum redundancy maximum relevance (mRMR) was used to analyze the GO terms and the KEGG pathways. Based on the feature list obtained by the mRMR method, the most related GO terms and KEGG pathways were extracted.Some important GO terms and KEGG pathways were uncovered, which were concluded to be significant for determining chemical toxicity effects.Several GO terms and KEGG pathways are highly related to all investigated toxicity effects, while some are specific to a certain toxicity effect.The findings in this study have the potential to further our understanding of different chemical toxicity mechanisms and to assist scientists in developing new chemical toxicity prediction algorithms. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Chemical toxicity effect; Minimum redundancy maximum relevance; Gene Ontology; KEGG pathway; Enrichment score;

Drug-induced drug resistance in cancer has been attributed to diverse biological mechanisms at the individual cell or cell population scale, relying on stochastically or epigenetically varying expression of phenotypes at the single cell level, and on the adaptability of tumours at the cell population level.We focus on intra-tumour heterogeneity, namely between-cell variability within cancer cell populations, to account for drug resistance. To shed light on such heterogeneity, we review evolutionary mechanisms that encompass the great evolution that has designed multicellular organisms, as well as smaller windows of evolution on the time scale of human disease. We also present mathematical models used to predict drug resistance in cancer and optimal control methods that can circumvent it in combined therapeutic strategies.Plasticity in cancer cells, i.e., partial reversal to a stem-like status in individual cells and resulting adaptability of cancer cell populations, may be viewed as backward evolution making cancer cell populations resistant to drug insult. This reversible plasticity is captured by mathematical models that incorporate between-cell heterogeneity through continuous phenotypic variables. Such models have the benefit of being compatible with optimal control methods for the design of optimised therapeutic protocols involving combinations of cytotoxic and cytostatic treatments with epigenetic drugs and immunotherapies.Gathering knowledge from cancer and evolutionary biology with physiologically based mathematical models of cell population dynamics should provide oncologists with a rationale to design optimised therapeutic strategies to circumvent drug resistance, that still remains a major pitfall of cancer therapeutics. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Heterogeneity; Cancer cell populations; Evolution; Drug resistance; Cancer therapeutics; Optimal control;

TRAF3 signaling: Competitive binding and evolvability of adaptive viral molecular mimicry by Emine Guven-Maiorov; Ozlem Keskin; Attila Gursoy; Carter VanWaes; Zhong Chen; Chung-Jung Tsai; Ruth Nussinov (2646-2655).
Background: The tumor necrosis factor receptor (TNFR) associated factor 3 (TRAF3) is a key node in innate and adaptive immune signaling pathways. TRAF3 negatively regulates the activation of the canonical and non-canonical NF-κB pathways and is one of the key proteins in antiviral immunity.Scope of Review: Here we provide a structural overview of TRAF3 signaling in terms of its competitive binding and consequences to the cellular network. For completion, we also include molecular mimicry of TRAF3 physiological partners by some viral proteins.Major Conclusions: By out-competing host partners, viral proteins aim to subvert TRAF3 antiviral action. Mechanistically, dynamic, competitive binding by the organism's own proteins and same-site adaptive pathogen mimicry follow the same conformational selection principles.General Significance: Our premise is that irrespective of the eliciting event – physiological or acquired pathogenic trait – pathway activation (or suppression) may embrace similar conformational principles. However, even though here we largely focus on competitive binding at a shared site, similar to physiological signaling other pathogen subversion mechanisms can also be at play. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Inflammation; Cancer; Antiviral immunity; Evolvable; Structure; Host-pathogen interactions;

Understanding the genetic and epigenetic basis of common variable immunodeficiency disorder through omics approaches by Jin Li; Zhi Wei; Yun R. Li; S. Melkorka Maggadottir; Xiao Chang; Akshatha Desai; Hakon Hakonarson (2656-2663).
Common variable immunodeficiency disorder (CVID) is the most frequently encountered symptomatic primary immunodeficiency, characterized by highly heterogeneous immunological features and clinical presentations. As better targeted therapies are importantly needed for CVID, improved understanding of the genetic and epigenetic basis for the development of CVID presents the most promising venue for improvement.Several genomic and epigenomic studies of CVID have recently been carried out on cohorts of sporadic cases of CVID. Using high-throughput array and sequencing technologies, these studies identified several loci associated with the disease. Here, we review the omics approaches used in these studies and resulting discoveries. We also discuss how these findings lead to improved understanding of the molecular basis of CVID and possible future directions to pursue.High-throughput omics approaches have been productive in genetic and epigenetic studies of CVID, leading to the identifications of several significantly associated loci of different variant types, as well as genes and pathways elucidating the shared genetic basis of CVID and autoimmunity. Complex polygenic model of inheritance together with interplay between genetic components and environmental factors may account for the etiology of CVID and various associated comorbidities.The genetic and epigenetic basis of CVID when further translated through functional studies will allow for improved understanding of the CVID etiology and will provide new insights into the development of potential new therapeutic approaches for this devastating condition. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Common variable immunodeficiency disorder; Genome-wide association study; Copy number variation; Meta-analysis; Epigenetic;

Estimation of elimination half-lives of organic chemicals in humans using gradient boosting machine by Jing Lu; Dong Lu; Xiaochen Zhang; Yi Bi; Keguang Cheng; Mingyue Zheng; Xiaomin Luo (2664-2671).
Elimination half-life is an important pharmacokinetic parameter that determines exposure duration to approach steady state of drugs and regulates drug administration. The experimental evaluation of half-life is time-consuming and costly. Thus, it is attractive to build an accurate prediction model for half-life.In this study, several machine learning methods, including gradient boosting machine (GBM), support vector regressions (RBF-SVR and Linear-SVR), local lazy regression (LLR), SA, SR, and GP, were employed to build high-quality prediction models. Two strategies of building consensus models were explored to improve the accuracy of prediction. Moreover, the applicability domains (ADs) of the models were determined by using the distance-based threshold.Among seven individual models, GBM showed the best performance (R 2  = 0.820 and RMSE = 0.555 for the test set), and Linear-SVR produced the inferior prediction accuracy (R 2  = 0.738 and RMSE = 0.672). The use of distance-based ADs effectively determined the scope of QSAR models. However, the consensus models by combing the individual models could not improve the prediction performance. Some essential descriptors relevant to half-life were identified and analyzed.An accurate prediction model for elimination half-life was built by GBM, which was superior to the reference model (R2  = 0.723 and RMSE = 0.698).Encouraged by the promising results, we expect that the GBM model for elimination half-life would have potential applications for the early pharmacokinetic evaluations, and provide guidance for designing drug candidates with favorable in vivo exposure profile. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Elimination half-life; Gradient boosting machine; Applicability domain; Consensus model;

The dominant feature in neurodegenerative diseases is protein aggregations that lead to neuronal loss. Immunotherapies using antibodies or antibody fragments to target the aggregations are a highly perused approach. The molecular mechanisms underlying the amyloid-based immunotherapy are complex. Deciphering the properties of amyloidogenic proteins responsible for these diseases is essential to obtain insights into antibody recognition of the amyloid antigens.We systematically explore all available crystal structures of antibody-amyloid complexes related to neurodegenerative diseases, including antibodies that recognize the Aβ peptide, tau protein, prion protein, alpha-synuclein, huntingtin protein (mHTT), and polyglutamine.We found that antibodies mostly use the conformational selection mechanism to recognize the highly flexible amyloid antigens. In particular, solanezumab bound to Aβ12–28 tripeptide motif conformation (F19F20A21), which is shared with the Aβ42 fibril. This motif, which is trapped by the antibody, may provide the missing link in amyloid formation. Water molecules often bridge between the antibody and amyloid, contributing to the recognition.This paper provides the structural basis for antibody recognition of amyloidogenic proteins. The analysis and discussion of known structures are expected to help in the design and optimization of antibodies in neurodegenerative diseases. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.Display Omitted
Keywords: Antibody; Biological drug; Amyloid; Neurodegenerative diseases; Alzheimer's disease; Prion; Antigen;

Metabolomic-based biomarker discovery for non-invasive lung cancer screening: A case study by Keiron O'Shea; Simon J.S. Cameron; Keir E. Lewis; Chuan Lu; Luis A.J. Mur (2682-2687).
Lung cancer (LC) is one of the leading lethal cancers worldwide, with an estimated 18.4% of all cancer deaths being attributed to the disease. Despite developments in cancer diagnosis and treatment over the previous thirty years, LC has seen little to no improvement in the overall five year survival rate after initial diagnosis.In this paper, we extended a recent study which profiled the metabolites in sputum from patients with lung cancer and age-matched volunteers smoking controls using flow infusion electrospray ion mass spectrometry. We selected key metabolites for distinguishing between different classes of lung cancer, and employed artificial neural networks and leave-one-out cross-validation to evaluate the predictive power of the identified biomarkers.The neural network model showed excellent performance in classification between lung cancer and control groups with the area under the receiver operating characteristic curve of 0.99. The sensitivity and specificity of for detecting cancer from controls were 96% and 94% respectively. Furthermore, we have identified six putative metabolites that were able to discriminate between sputum samples derived from patients suffering small cell lung cancer (SCLC) and non-small cell lung cancer. These metabolites achieved excellent cross validation performance with a sensitivity of 80% and specificity of 100% for predicting SCLC.These results indicate that sputum metabolic profiling may have potential for screening of lung cancer and lung cancer recurrence, and may greatly improve effectiveness of clinical intervention. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Lung cancer; Small vs non-small cell lung cancer; Sputum; Metabolomics; Biomarkers; Artificial neural networks;

dbPHCC: a database of prognostic biomarkers for hepatocellular carcinoma that provides online prognostic modeling by Jian Ouyang; Ying Sun; Wei Li; Wen Zhang; Dandan Wang; Xiangqiong Liu; Yong Lin; Baofeng Lian; Lu Xie (2688-2695).
Hepatocellular carcinoma (HCC) is one of the most common malignant cancers with a poor prognosis. For decades, more and more biomarkers were found to effect on HCC prognosis, but these studies were scattered and there were no unified identifiers. Therefore, we built the database of prognostic biomarkers and models for hepatocellular carcinoma (dbPHCC).dbPHCC focuses on biomarkers which were related to HCC prognosis by traditional experiments rather than high-throughput technology. All of the prognostic biomarkers came from literatures issued during 2002 to 2014 in PubMed and were manually selected. dbPHCC collects comprehensive information of candidate biomarkers and HCC prognosis.dbPHCC provides a comprehensive and convenient search and analysis platform for HCC prognosis research.dbPHCC is the first database to focus on experimentally verified individual biomarkers, which are related to HCC prognosis. Prognostic markers in dbPHCC have the potential to be therapeutic drug targets and may help in designing new treatments to improve survival of HCC patients. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Hepatocellular carcinoma; Prognostic biomarkers; Survival analysis; Database;

A heuristic model for working memory deficit in schizophrenia by Zhen Qi; Gina P. Yu; Felix Tretter; Oliver Pogarell; Anthony A. Grace; Eberhard O. Voit (2696-2705).
The life of schizophrenia patients is severely affected by deficits in working memory. In various brain regions, the reciprocal interactions between excitatory glutamatergic neurons and inhibitory GABAergic neurons are crucial. Other neurotransmitters, in particular dopamine, serotonin, acetylcholine, and norepinephrine, modulate the local balance between glutamate and GABA and therefore regulate the function of brain regions. Persistent alterations in the balances between the neurotransmitters can result in working memory deficits.Here we present a heuristic computational model that accounts for interactions among neurotransmitters across various brain regions. The model is based on the concept of a neurochemical interaction matrix at the biochemical level and combines this matrix with a mobile model representing physiological dynamic balances among neurotransmitter systems associated with working memory.The comparison of clinical and simulation results demonstrates that the model output is qualitatively very consistent with the available data. In addition, the model captured how perturbations migrated through different neurotransmitters and brain regions. Results showed that chronic administration of ketamine can cause a variety of imbalances, and application of an antagonist of the D2 receptor in PFC can also induce imbalances but in a very different manner.The heuristic computational model permits a variety of assessments of genetic, biochemical, and pharmacological perturbations and serves as an intuitive tool for explaining clinical and biological observations.The heuristic model is more intuitive than biophysically detailed models. It can serve as an important tool for interdisciplinary communication and even for psychiatric education of patients and relatives. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Interaction matrix; Mesoscopic model; Mobile; Neurotransmitter; Schizophrenia; Systems biology; Working memory deficit;

Deciphering hallmark processes of aging from interaction networks by Suchi Smita; Falko Lange; Olaf Wolkenhauer; Rüdiger Köhling (2706-2715).
Aging is broadly considered to be a dynamic process that accumulates unfavourable structural and functional changes in a time dependent fashion, leading to a progressive loss of physiological integrity of an organism, which eventually leads to age-related diseases and finally to death.The majority of aging-related studies are based on reductionist approaches, focusing on single genes/proteins or on individual pathways without considering possible interactions between them. Over the last few decades, several such genes/proteins were independently analysed and linked to a role that is affecting the longevity of an organism. However, an isolated analysis on genes and proteins largely fails to explain the mechanistic insight of a complex phenotype due to the involvement and integration of multiple factors.Technological advance makes it possible to generate high-throughput temporal and spatial data that provide an opportunity to use computer-based methods. These techniques allow us to go beyond reductionist approaches to analyse large-scale networks that provide deeper understanding of the processes that drive aging.In this review, we focus on systems biology approaches, based on network inference methods to understand the dynamics of hallmark processes leading to aging phenotypes. We also describe computational methods for the interpretation and identification of important molecular hubs involved in the mechanistic linkage between aging related processes. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Aging; Data integration; Network inference; Target hubs; Systems biology;

A network-based method for the identification of putative genes related to infertility by ShaoPeng Wang; GuoHua Huang; Qinghua Hu; Quan Zou (2716-2724).
Infertility has become one of the major health problems worldwide, with its incidence having risen markedly in recent decades. There is an urgent need to investigate the pathological mechanisms behind infertility and to design effective treatments. However, this is made difficult by the fact that various biological factors have been identified to be related to infertility, including genetic factors.A network-based method was established to identify new genes potentially related to infertility. A network constructed using human protein–protein interactions based on previously validated infertility-related genes enabled the identification of some novel candidate genes. These genes were then filtered by a permutation test and their functional and structural associations with infertility-related genes.Our method identified 23 novel genes, which have strong functional and structural associations with previously validated infertility-related genes.Substantial evidence indicates that the identified genes are strongly related to dysfunction of the four main biological processes of fertility: reproductive development and physiology, gametogenesis, meiosis and recombination, and hormone regulation.The newly discovered genes may provide new directions for investigating infertility. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Infertility; Protein–protein interaction; Shortest path; BLAST;

The use of Gene Ontology terms and KEGG pathways for analysis and prediction of oncogenes by Zhihao Xing; Chen Chu; Lei Chen; Xiangyin Kong (2725-2734).
Oncogenes are a type of genes that have the potential to cause cancer. Most normal cells undergo programmed cell death, namely apoptosis, but activated oncogenes can help cells avoid apoptosis and survive. Thus, studying oncogenes is helpful for obtaining a good understanding of the formation and development of various types of cancers.In this study, we proposed a computational method, called OPM, for investigating oncogenes from the view of Gene Ontology (GO) and biological pathways. All investigated genes, including validated oncogenes retrieved from some public databases and other genes that have not been reported to be oncogenes thus far, were encoded into numeric vectors according to the enrichment theory of GO terms and KEGG pathways. Some popular feature selection methods, minimum redundancy maximum relevance and incremental feature selection, and an advanced machine learning algorithm, random forest, were adopted to analyze the numeric vectors to extract key GO terms and KEGG pathways.Along with the oncogenes, GO terms and KEGG pathways were discussed in terms of their relevance in this study. Some important GO terms and KEGG pathways were extracted using feature selection methods and were confirmed to be highly related to oncogenes. Additionally, the importance of these terms and pathways in predicting oncogenes was further demonstrated by finding new putative oncogenes based on them.This study investigated oncogenes based on GO terms and KEGG pathways. Some important GO terms and KEGG pathways were confirmed to be highly related to oncogenes. We hope that these GO terms and KEGG pathways can provide new insight for the study of oncogenes, particularly for building more effective prediction models to identify novel oncogenes. The program is available upon request.We hope that the new findings listed in this study may provide a new insight for the investigation of oncogenes. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Oncogenes; Gene Ontology; KEGG pathway; Minimum redundancy maximum relevance; Incremental feature selection; Random forest;

Prediction and validation of association between microRNAs and diseases by multipath methods by Xiangxiang Zeng; Xuan Zhang; Yuanlu Liao; Linqiang Pan (2735-2739).
Deciphering the genetic basis of human diseases is an important goal in biomedical research. There is increasing evidence suggesting that microRNAs play critical roles in many key biological processes. So the identification of microRNAs associated with disease is very important for understanding the pathogenesis of diseases.Two multipath methods are introduced to predict the associations between microRNAs and diseases based on microRNA-disease heterogeneous network. The first method, HeteSim_MultiPath (HSMP), uses the HeteSim measure to calculate the similarity between objects and combines the HeteSim scores of different paths with a constant that dampens the contributions of longer paths. The second one, HeteSim_SVM (HSSVM), uses the HeteSim measure and the machine learning method used to combine HeteSim scores instead of a constant.We use the leave-one-out cross-validation to evaluate our novel methods, and find that our methods are better than other methods. We achieve an area under the ROC curve of 0.981 and 0.984 respectively. We also check the top-10 most similarity of microRNAs-diseases associations and find that our predictions are reasonable and credible.The encouraging results suggest that multipath methods can provide help in identifying novel microRNA-disease associations, and guide biological experiments for scientific research. This article is part of a Special Issue entitled “System Genetics”. Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: MicroRNA-disease association prediction; Link prediction; Multipath method; HeteSim;

Mining for genes related to choroidal neovascularization based on the shortest path algorithm and protein interaction information by Jian Zhang; Yan Suo; Yu-Hang Zhang; Qing Zhang; XiJia Chen; Xun Xu; WenCong Lu (2740-2749).
Background: Choroidal neovascularization (CNV) is a serious eye disease that may cause visual loss, especially for older people. Many factors have been proven to induce this disease including age, gender, obesity, and so on. However, until now, we have had limited knowledge on CNV's pathogenic mechanism. Discovering the genes that underlie this disease and performing extensive studies on them can help us to understand how CNV occurs and design effective treatments.Methods: In this study, we designed a computational method to identify novel CNV-related genes in a large protein network constructed using the protein–protein interaction information in STRING. The candidate genes were first extracted from the shortest paths connecting any two known CNV-related genes and then filtered by a permutation test and using knowledge of their linkages to known CNV-related genes.A list of putative CNV-related candidate genes was accessed by our method. These genes are deemed to have strong relationships with CNV.Extensive analyses of several of the putative genes such as ANK1, ITGA4, CD44 and others indicate that they are related to specific biological processes involved in CNV, implying they may be novel CNV-related genes.General significance: The newfound putative CNV-related genes may provide new insights into CNV and help design more effective treatments. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: Disease gene; Choroidal neovascularization; Protein–protein interaction; Protein sequence; BLAST; Dijkstra's algorithm;

Classification of cancers based on copy number variation landscapes by Ning Zhang; Meng Wang; Peiwei Zhang; Tao Huang (2750-2755).
Genomic alterations in DNA can cause human cancer. DNA copy number variants (CNV), as one of the types of DNA mutations, have been considered to be associated with various human cancers. CNVs vary in size from 1 bp up to one complete chromosome arm. In order to understand the difference between different human cancers on CNVs, in this study, we developed a method to computationally classify six human cancer types by using only CNV level values. The CNVs of 23,082 genes were used as features to construct the classifier. Then the features are carefully selected by mRMR (minimum Redundancy Maximum Relevance Feature Selection) and IFS (Incremental Feature Selection) methods. An accuracy of over 0.75 was reached by using only the CNVs of 19 genes based on Dagging method in 10-fold cross validation. It was indicated that these 19 genes may play important roles in differentiating cancer types. We also analyzed the biological functions of several top genes within the 19 gene list. The statistical results and biological analysis of these genes from this work might further help understand different human cancer types and provide guidance for related validation experiments. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: DNA copy number variants; Cancer type; Sequential minimal optimization; Feature selection; IFS;

Hepatitis is a type of infectious disease that induces inflammation of the liver without pinpointing a particular pathogen or pathogenesis. Type C hepatitis, as a type of hepatitis, has been reported to induce cirrhosis and hepatocellular carcinoma within a very short amount of time. It is a great threat to human health. Some studies have revealed that trace elements are associated with infection with and immune rejection against hepatitis C virus (HCV). However, the mechanism underlying this phenomenon is still unclear.In this study, we aimed to expand our knowledge of this phenomenon by designing a computational method to identify genes that may be related to both HCV and trace element metabolic processes. The searching procedure included three stages. First, a shortest path algorithm was applied to a large network, constructed by protein-protein interactions, to identify potential genes of interest. Second, a permutation test was executed to exclude false discoveries. Finally, some rules based on the betweenness and associations between candidate genes and HCV and trace elements were built to select core genes among the remaining genes.12 lists of genes, corresponding to 12 types of trace elements, were obtained. These genes are deemed to be associated with HCV infection and trace elements metabolism.The analyses indicate that some genes may be related to both HCV and trace element metabolic processes, further confirming the associations between HCV and trace elements. The method was further tested on another set of HCV genes, the results indicate that this method is quite robustness.The newly found genes may partially reveal unknown mechanisms between HCV infection and trace element metabolism. This article is part of a Special Issue entitled “System Genetics” Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Keywords: HCV; Trace element; Shortest path algorithm; Protein-protein interaction; Permutation test;