![]() |
The sequencing of the human genome has brought human genetics into a new era of study resulting in the generation of an explosive amount of information. Application of genomic, proteomic, and bioinformatics technologies to the study of human genetics has made it possible for human genetic diseases to be studied on an unprecedented scale, both in silico and in the wet lab. This volume provides up-to-date coverage of the broad range of research topics in this fascinating area. In the first part of the book, a whole spectrum of approaches to human genetics research is reviewed for both background and the latest progress. In the second, important topics related to genetic research of various complex human diseases are discussed. The robust content and diverse array of subjects allow the book to serve as both a concise “encyclopedia” that introduces basic and essential concepts of human genetics and an in-depth review of the current understanding of genetic research in human diseases.
Sample Chapter(s)
Chapter 1: Regression-based Linkage Analysis Methods (162 KB)
https://doi.org/10.1142/9789812790811_fmatter
The following sections are included:
https://doi.org/10.1142/9789812790811_0001
Regression-based methods of model-free linkage analysis offer a valuable framework for mapping both quantitative and qualitative traits. Beginning with the method proposed by Haseman and Elston,1 these methods have been widely used in practice because of their simplicity and robustness. Furthermore, the newer methods can utilize full information from trait values, and they are applicable to any type of pedigree data. With the availability of the denser markers and appropriate sampling, these methods give hope that they may play an important role in mapping complex genetic traits. The information yielded by such an analysis can guide and facilitate the design and result inference of further association studies.
https://doi.org/10.1142/9789812790811_0002
Linkage disequilibrium arising from the recent admixture of genetically distinct populations can be used to map genes influencing complex diseases. Mapping genes based on locus ancestral information is called admixture mapping. Admixture mapping requires much less markers than association studies, and is less affected by allelic heterogeneity. Theoretical works demonstrate that admixture mapping can be more powerful, with a higher mapping resolution than the traditional linkage studies. In this chapter, I will review the recent theoretical developments, the applications to localize genes influencing complex traits, and the technological challenges faced in using this method.
https://doi.org/10.1142/9789812790811_0003
Mapping genes for complex human diseases is a challenging problem because many such diseases are the result of both genetic and environmental risk factors. Many also exhibit phenotypic heterogeneity, such as variable age of onset. Information on the variable age of disease onset is often a good indicator for disease heterogeneity. The incorporation of such information together with environmental risk factors in genetic analysis should lead to more powerful tests. Because of the problem of censoring, survival analysis methods have proved to be very useful for genetic analysis. In this paper, I review some recent methodological developments on integrating modern survival analysis methods and human genetics in order to rigorously incorporate both the age of onset and the environmental covariate data into aggregation analysis, segregation analysis, linkage analysis, association analysis, and gene risk characterization. I also briefly discuss the issue of ascertainment correction and survival analysis methods for high-dimensional genomic data. Finally, I outline several areas that need further methodological developments.
https://doi.org/10.1142/9789812790811_0004
The following sections are included:
https://doi.org/10.1142/9789812790811_0005
Linkage disequilibrium (LD) plays a central role in association studies for identifying the genetic variation responsible for complex human diseases. Recent studies on LD patterns of the human genome using several large-scale genotype data have suggested that the human genome has a haplotype block structure: it can be decomposed into long regions with strong LD and relatively few haplotypes, separated by short regions of low LD. This observation has practical and important implications in association studies because a small fraction of single nucleotide polymorphisms (SNPs), referred to as “tag SNPs”, can be chosen in each block to map the genetic variation responsible for complex human diseases. The use of tag SNPs can significantly reduce genotyping effort without much loss of power. Therefore, it has recently drawn considerable attention and become a very active research field. Many methods have been developed, and new methods for tag SNP selection are continuously being developed. In this chapter, we review and discuss recent developments for haplotype block partitioning and tag SNP selection, and their applications in association studies. The aim of this chapter is not to enumerate and detail all available methods for haplotype block partitioning and tag SNP selection, but rather to focus on how to use the available methods, tools, and resources to facilitate tag SNP selection in association studies.
https://doi.org/10.1142/9789812790811_0006
Haplotype, the combination of marker alleles on the same chromosome that were inherited as a unit from one parent, plays a very important role in the study of the genetic basis of diseases and population genetics. Because haplotypes may carry more information than single markers, a variety of statistical methods, such as those for assessing haplotype–disease associations, have been proposed for studying haplotypes. With unambiguous haplotype information, haplotypes may be treated as alleles of a single locus, and many existing methods for single markers can be used in analysis. Because obtaining haplotype information directly from experiments can be costly, haplotype analyses present many unique challenges and opportunities for human geneticists. In this chapter, we focus on two issues: haplotype inference and haplotype association analysis. We first review several methods for haplotype inference, including Clark's algorithm, methods based on the expectation-maximization algorithm, Bayesian methods, and others. In the context of haplotype association analysis, we cover a number of statistical methods. In addition, we discuss the advantages and limitations of the different methods.
https://doi.org/10.1142/9789812790811_0007
Two grand challenges in the postgenomic era are the development of a detailed understanding of heritable variation in the human genome, and robust strategies for identifying genetic contribution to diseases and drug responses.1 Haplotypes of single nucleotide polymorphisms (SNPs) have been suggested as an effective representation of human variation, and various haplotype-based association mapping methods for complex traits have been proposed in the literature. However, humans are diploid and, in practice, genotype data instead of haplotype data are collected directly. Therefore, efficient and accurate computational methods for haplotype reconstruction are needed and have recently been investigated intensively, especially for tightly linked markers such as SNPs. This paper reviews statistical and combinatorial haplotyping algorithms using pedigree data, unrelated individuals, or pooled samples.
https://doi.org/10.1142/9789812790811_0008
DNA pooling is a cost-effective approach for estimating genetic marker allele frequencies. It is often advocated as a screening tool to identify candidate markers, followed by individual genotyping. This paper reviews recent developments in DNA pooling methods for association studies. Various DNA pooling designs and association tests using pooled DNA samples are introduced. Guidelines are provided on using this strategy, and open problems warranting further research are also discussed.
https://doi.org/10.1142/9789812790811_0009
The following sections are included:
https://doi.org/10.1142/9789812790811_0010
Although genetic association studies using unrelated individuals may be subject to bias caused by population stratification, alternative family-based association methods that are robust to population stratification may be less powerful. This chapter discusses methods that use unrelated individuals to identify associations between candidate markers and traits of interest (both qualitative and quantitative), while controlling population stratification through a set of genomic markers of the same individual. These methods can control population stratification and are more powerful than family-based methods. We first introduce association tests based on population samples in a homogeneous population, and discuss why population stratification can cause false-positive results in association studies. Then, we review established methods, which include the genomic control, structure association, and semiparametric approaches, for controlling false-positive results by using a set of unlinked markers of the same individual instead of using family members in family-based association studies. Finally, we discuss some possible extensions of the existing methods and some topics that need further investigation.
https://doi.org/10.1142/9789812790811_0011
Many complex diseases, such as obesity, diabetes, hypertension, and cancer, are determined by multiple genetic and environmental factors. The identification of quantitative trait loci (QTLs) is critical for understanding the biochemical bases of complex diseases, and thus for the identification of drug targets. Animal models have proved to be powerful in elucidating the genetic architectures and etiologies of common human diseases. The ability to control both genotype and environment in inbred populations of animals greatly simplifies the analysis of a complex genetic architecture. This chapter reviews the Bayesian statistical methods and computer software for mapping multiple QTLs in experimental crosses, and comments on several of the statistical issues to consider in the application of these methods.
https://doi.org/10.1142/9789812790811_0012
The following sections are included:
https://doi.org/10.1142/9789812790811_0013
Gene expression profiling using microarray technology has become an important genetic tool in the study of complex diseases. However, this technology is prone to noise and the accuracy of its measurements is often in question. Here, we describe the technological and analytical advancements that we have made in microarrays to overcome this problem. We have extended the conventional dual-color spotted microarray technology to a novel three-color microarray platform, where the probes on printed arrays are labeled with a third dye for quality control (QC) in array fabrication. We have also developed a microarray image analysis package called Matarray, which achieves quantitative QC of data acquisition through the definition of a set of quality scores. With these advances, a better dissection of the sources of data variability and more efficient QC are achieved. In addition, our data QC approach has led to a new weighted statistical procedure for evaluating the significance of microarray findings that can more sensitively detect changes in gene expression. This resolves the missing value problem that has often plagued microarray data analysis. Finally and most importantly, we show that with our analytical and technical advancements, where a comprehensive and efficient QC procedure is in place, accurate gene expression measurements comparable in quality to those by quantitative reverse transcription–polymerase chain reaction (RT-PCR) can be achieved with microarrays fabricated in academic laboratories.
https://doi.org/10.1142/9789812790811_0014
DNA microarrays are a unique cost-effective method for simultaneously assessing the expression levels of thousands of genes. Microarrays can be fabricated by robotic spotting of gene-specific cDNAs or oligonucleotides and by in situ synthesis of oligonucleotides. More recent approaches include piezoelectric inkjets for noncontact printing and maskless light-directed synthesis of oligonucleotides. Although Affymetrix GeneChip arrays occupy the majority of the microarray marketplace, their drawbacks of less flexibility and high cost leave much room for the development of robotic spotting and other novel technologies. Here, we review recent technological advances in spotted arrays, in situ synthesized arrays, and their applications in genetics and other life science areas.
https://doi.org/10.1142/9789812790811_0015
With the recent advances in genomewide expression microarray technology, combining the power of gene expression profiling and genetics is a natural step forward. Jansen and Nap5 first formally proposed a new research area termed “genetical genomics”, which describes the combined study of expression variations and DNA variations in segregating populations. The gene expression levels (i.e. mRNA transcript abundance) are treated as quantitative traits potentially affected by multiple genes and other factors. Traditional methods for detecting quantitative trait loci (QTLs) could be utilized to detect chromosomal regions affecting expression levels; these regions are referred to as expression quantitative trait loci (eQTLs). Recent studies have demonstrated the utility of this approach in unraveling many features of the genetic basis of variation in gene expression. Despite its great potential, there are many limitations to the current eQTL methods that demand statistical and computational novelties. Some of the issues are inherited from the traditional multiple-trait methods in QTL analysis and microarray technology. More importantly, the unique challenge is the joint consideration of tens of thousands of correlated phenotypes (i.e. transcription levels) with hundreds or thousands of genotypes. In this context, the issue of multiple testing needs to be better addressed not only to control the overall rate of false discoveries, but also to fully take advantage of the correlated expression patterns. In this chapter, we will summarize statistical methods that have been applied to eQTL studies, review the knowledge and patterns emerging from these studies, and discuss future research directions.
https://doi.org/10.1142/9789812790811_0016
High-throughput molecular profiling technologies such as microarrays have recently been used in the genetic analysis of complex traits. Genetic loci influencing gene expression can be identified using quantitative trait locus (QTL) mapping. These genetic loci may also influence higher-order traits. The coregulation of gene expression traits and higher-order traits indicates that some of the gene expression traits may be involved in the causal pathways connecting the genetic loci and higher-order traits. Strong associations between the modules of gene expression traits, higher-order traits, and the genetic loci regulating them have been found, indicating that genetic loci may influence higher-order traits by affecting gene expression traits. In this chapter, I discuss the recent trend in combining genetic and genomic methods to analyze complex traits, and introduce a computational approach to studying the causal pathways from DNA to complex traits.
https://doi.org/10.1142/9789812790811_0017
Proteomics, viewed as postgenomics, is the large-scale study of proteins, especially their structures and functions. It is much more complicated than genomics: whereas genomics is a rather constant entity, the proteome differs from cell to cell and is constantly changing through its biochemical interactions with the genome and the environment. Over the last several years, the emphasis on genomics has shifted to proteomics, since transcriptional regulation is often difficult to reconcile with protein abundance and the transcriptome has poor correlation with proteome in a cell.1 Proteomics has now gained more interest in directly analyzing protein expression at the posttranslational level because it permits the qualitative and quantitative assessment of a broad spectrum of proteins that can be related to specific cellular responses, including the response to oxidative stress.2–4 Moreover, identified protein markers can offer more effective validation protocols (e.g. tissue microarray). Analysis on proteome provides an idea of biological processes happening at their level of occurrence, allowing the comparison of physiological and pathological states of a cell line or tissue.
https://doi.org/10.1142/9789812790811_0018
Biomedical informatics can be defined as a multidisciplinary subject applying information-based, i.e. computational and statistical, technologies to clinical, genomic, and proteomic studies. One of its major components involves healthcare informatics, as it focuses on advancing the practice of medicine. The other component focuses on basic and applied research, integrating clinical perspectives into genomic and proteomic studies to address issues of clinical significance. Biomedical informatics is a comprehensive subject in the study of human disease, encompassing a cycle of from bedside to bench and then from bench to bedside. This chapter will give an overview of human genetic studies in the genomic era from the biomedical informatics perspective. Included topics are carefully selected to minimize overlap with other chapters, while at the same time presenting a relatively comprehensive picture. It is expected that, after reading this chapter and others on bioinformatics in this book, readers will have a deeper understanding of the importance of bioinformatics and biomedical informatics in the study of human genetics as well as the differences in their approach. Conscientiously keeping up with and open-mindedly adopting new technologies developed in bioinformatics and biomedical informatics has proven invaluable to modern human genetic studies.
https://doi.org/10.1142/9789812790811_0019
The publication of the draft human genome consisting of 30 000 genes is merely the beginning of genome biology. A new way to understand the complexity and richness of the molecular and cellular functions of proteins in human biology is through an understanding of large collections of protein–protein interactions and the network that they form. In this chapter, we first introduce various concepts involved in the study of protein–protein interaction data. Then, we introduce different experimental methods that can be used to measure protein interactions. Next, we review public data sources that provide protein interaction data and compare their coverage. We also review the computational tools and methods used to predict, validate, and interpret protein interaction data. Finally, we present our perspectives on how systems-scale protein interaction and network analysis represent an essential step towards understanding human biology and identifying novel targets for drug development.
https://doi.org/10.1142/9789812790811_0020
The following sections are included:
https://doi.org/10.1142/9789812790811_0021
Diseases related to abnormal RNA metabolism is an understudied field. RNA plays a central role in translating genetic information into proteins, and in many other catalytic and regulatory tasks. Recent advances in the study of RNA metabolism revealed complicated pathways for the generation and maintenance of functional RNA. Defects in RNA are detrimental to cells and cause diseases. This chapter aims to summarize the current understanding of various aspects of RNA metabolism, and to highlight recent findings of diseases related to defects in RNA processing and quality control.
https://doi.org/10.1142/9789812790811_0022
Osteoporosis is a complex multifactorial disease, determined by genetic and environmental factors as well as their interactions. Extensive efforts have been made to identify the genetic determinants of osteoporosis. Studies using multiple approaches, including linkage analysis, association analysis, functional genomic analysis, as well as genetic and functional studies in animal models, have greatly advanced our understanding of the genetic basis of osteoporosis. This review summarizes recently published important and representative molecular genetics studies of gene identification for osteoporosis. Evidence from candidate gene association studies and genomewide linkage studies in humans as well as quantitative trait locus (QTL) mapping in animal models are reviewed separately. We also summarize the recent progress of functional genomic studies (including DNA microarrays and proteomics) on osteogenesis and osteroporosis in light of the rapid advances and promising prospects of this field.
https://doi.org/10.1142/9789812790811_0023
The following sections are included:
https://doi.org/10.1142/9789812790811_0024
Type 1 diabetes (T1D) is an autoimmune disorder characterized by specific destruction of the insulin-secreting beta cells of the pancreatic islets. It is believed that susceptibility to T1D is determined by the interactions of multiple genes with unknown environmental factors. Because the onset of the autoimmune process occurs many years before the onset of clinical diabetes, it is difficult to ascertain the nature of possible environmental triggers. Therefore, for the past two decades, a great deal of research has been focused on identifying T1D susceptibility genes. To date, only two susceptibility loci, the HLA region on chromosome 6p21 (IDDM1) and the insulin gene (INS) on chromosome 11p15 (IDDM2), have been well characterized. These two loci only contribute a portion of the familial clustering (∼40% for IDDM1 and ∼10% for INS), suggesting that other susceptibility loci must exist. The next confirmed locus for T1D was the PTPN22 gene, which contributes to multiple autoimmune disorders with a relative risk (RR) of 1.67–2.3. Recent studies have also provided evidence for the existence of two additional susceptibility genes, the SUMO4 gene on chromosome 6q25 (IDDM5) and the CTLA4 gene on chromosome 2q33 (IDDM12). In addition, extensive association studies and linkage analyses using various analytical methods have suggested a large number of putative genomic intervals that may contribute genetic susceptibility to T1D. This chapter summarizes the current state of genetic linkage and association studies in T1D, and discusses challenges and strategies for future studies.
https://doi.org/10.1142/9789812790811_0025
Obesity is a condition of excess body fat that causes or exacerbates several major public health problems. There has been considerable success in elucidating the molecular basis of monogenic forms of obesity in both rodents and humans. The most common form of obesity, however, is considered to be a polygenic disorder arising from the interaction of multiple genetic and environmental factors. The identification and characterization of susceptibility genes to obesity will contribute to a greater understanding of the pathogenesis of obesity, and will ultimately assist to develop better strategies for prevention and therapeutic intervention. In this review article, we provide an overview of the recent findings in the genetic dissection of obesity. We also consider emerging strategies for gene identification. Some major problems are addressed, concluding with a brief discussion on the future perspective of research in the genetics of obesity.
https://doi.org/10.1142/9789812790811_0026
The following sections are included:
https://doi.org/10.1142/9789812790811_0027
Breast cancer is a major disease affecting women in industrialized countries, for whom the lifetime risk exceeds 10%. In the US, breast cancer affects one in eight women.1 Ovarian cancer is substantially less common than breast cancer, with a lifetime risk of 1.6% for women in industrialized countries2; however, it is one of the most lethal gynecological malignancies. It is estimated that 5%–10% of breast and invasive ovarian cancer cases are hereditary and attributable to mutations in several highly penetrant susceptibility genes.3,4 Inherited susceptibility has already been recognized as a significant risk factor for cancers of the breast and female genital organs.
https://doi.org/10.1142/9789812790811_0028
The Wnt signaling pathway regulates cell growth and differentiation, and plays an important role in embryonic development. Aberrant activation of Wnt signaling results in the transactivation of β-catenin/ T-cell factor/lymphoid enhancer factor 1, which leads to downstream gene transcription and causes tumor formation and development. As a key element in the Wnt pathway, β-catenin can be upregulated in its activity by stabilizing its protein expression via the inhibition of glycogen synthase kinase 3β or via the mutation of Wnt pathway components such as β-catenin, adenomatous polyposis coli, and Axin. β-catenin activity can also be enhanced by increasing its nuclear translocation without altering its protein turnover.
https://doi.org/10.1142/9789812790811_0029
The contribution of major histocompatibility complex (MHC) genes to autoimmune diseases has been well demonstrated. However, less is known regarding the roles of non-MHC susceptibility genes in these diseases. Nonetheless, recent studies have not only suggested their critical roles in pathogenesis, but have also identified several characteristics of their influence on the diseases. First, because multiple genes participate in the pathogenesis of an autoimmune disease, only a statistically weak linkage to any particular gene in a disease could be detected. Second, because these genes exhibit complicated epistatic interactions, expression of a disease phenotype may only be correlated with susceptibility genes using the “liability threshold” model. Third, it remains unclear which factors, genetic or environmental, play a more dominant role in pathogenesis. These characteristics continue to make mapping susceptibility genes in autoimmune diseases very challenging. Although only a few disease-associated genes have been identified thus far, thanks to the Human Genome Project and other new technologies, the pace of gene discovery is accelerating.
https://doi.org/10.1142/9789812790811_0030
The following sections are included:
https://doi.org/10.1142/9789812790811_0031
Metabolic syndrome (MS) is a common disorder that affects 25% of the American population. MS is characterized by obesity, hyperglycemia, hypertriglyceridemia, low HDL cholesterol levels, and hypertension. In general, MS is the result of insulin resistance and is linked to the pathogenesis of type 2 diabetes and cardiovascular disease (CVD). Genetic factors play important roles in the pathogenesis of MS and its endophenotypes. Ample studies have been carried out to dissect the genetics of MS, including hundreds of genome scans, animal quantitative trait loci (QTLs), and association studies. Like in other complex traits, gene–gene and gene–environment interactions, low penetrance, and genetic heterogeneity have made the MS gene hunting very challenging; on the other hand, factor analyses, replicated linkage results, and high-throughput association studies have yielded promising results. In this chapter, we integrate pieces of the genetics of MS; technical issues such as linkage disequilibrium (LD) mapping and the parent-of-origin effect are also discussed.
https://doi.org/10.1142/9789812790811_0032
The following sections are included:
https://doi.org/10.1142/9789812790811_bmatter
The following sections are included:
Sample Chapter(s)
Chapter 1: Regression-based Linkage Analysis Methods (162k)