Ebook: Data Mining in Bioinformatics
- Tags: Database Management, Programming Techniques, Information Systems Applications (incl.Internet), Data Structures, Data Storage Representation, Bioinformatics
- Series: Advanced Information and Knowledge Processing
- Year: 2005
- Publisher: Springer-Verlag London
- Edition: 1
- Language: English
- pdf
8. 1. 1 Protein Subcellular Location The life sciences have entered the post-genome era where the focus of biologicalresearchhasshiftedfromgenomesequencestoproteinfunctionality. Withwhole-genomedraftsofmouseandhumaninhand,scientistsareputting more and more e?ort into obtaining information about the entire proteome in a given cell type. The properties of a protein include its amino acid sequences, its expression levels under various developmental stages and in di?erenttissues,its3Dstructureandactivesites,itsfunctionalandstructural binding partners, and its subcellular location. Protein subcellular location is important for understanding protein function inside the cell. For example, the observation that the product of a gene is localized in mitochondria will support the hypothesis that this protein or gene is involved in energy metabolism. Proteins localized in the cytoskeleton are probably involved in intracellular tra?cking and support. The context of protein functionality is well represented by protein subcellular location. Proteins have various subcellular location patterns [250]. One major category of proteins is synthesized on free ribosomes in the cytoplasm. Soluble proteins remain in the cytoplasm after their synthesis and function as small factories catalyzing cellular metabolites. Other proteins that have a target signal in their sequences are directed to their target organelle (such as mitochondria) via posttranslational transport through the organelle membrane. Nuclear proteins are transferred through pores on the nuclear envelope to the nucleus and mostly function as regulators. The second major category of proteins is synthesized on endoplasmic reticulum(ER)-associated ribosomes and passes through the reticuloendothelial system, consisting of the ER and the Golgi apparatus.
The goal of this book is to help readers understand state-of-the-art techniques in biological data mining and data management and includes topics such as:
- preprocessing tasks such as data cleaning and data integration as applied to biological data
- classification and clustering techniques for microarrays
- comparison of RNA structures based on string properties and energetics
- discovery of the sequence characteristics of different parts of the genome
- mining of haplotypes to find disease markers
- sequencing of events leading to the folding of a protein
- inference of the subcellular location of protein activity
- classification of chemical compounds based on structure
- special purpose metrics and index structures for phylogenetic applications
- a new query language for protein searching based on the shape of proteins
- very fast indexing schemes for sequences and pathways
Aimed at computer scientists, necessary biology is explained.
The goal of this book is to help readers understand state-of-the-art techniques in biological data mining and data management and includes topics such as:
- preprocessing tasks such as data cleaning and data integration as applied to biological data
- classification and clustering techniques for microarrays
- comparison of RNA structures based on string properties and energetics
- discovery of the sequence characteristics of different parts of the genome
- mining of haplotypes to find disease markers
- sequencing of events leading to the folding of a protein
- inference of the subcellular location of protein activity
- classification of chemical compounds based on structure
- special purpose metrics and index structures for phylogenetic applications
- a new query language for protein searching based on the shape of proteins
- very fast indexing schemes for sequences and pathways
Aimed at computer scientists, necessary biology is explained.
Content:
Front Matter....Pages i-xi
Introduction to Data Mining in Bioinformatics....Pages 3-8
Survey of Biodata Analysis from a Data Mining Perspective....Pages 9-39
AntiClustAl: Multiple Sequence Alignment by Antipole Clustering....Pages 43-57
RNA Structure Comparison and Alignment....Pages 59-81
Piecewise Constant Modeling of Sequential Data Using Reversible Jump Markov Chain Monte Carlo....Pages 85-103
Gene Mapping by Pattern Discovery....Pages 105-126
Predicting Protein Folding Pathways....Pages 127-141
Data Mining Methods for a Systematics of Protein Subcellular Location....Pages 143-187
Mining Chemical Compounds....Pages 189-215
Phyloinformatics: Toward a Phylogenetic Database....Pages 219-241
Declarative and Efficient Querying on Protein Secondary Structures....Pages 243-273
Scalable Index Structures for Biological Data....Pages 275-296
Back Matter....Pages 297-340
The goal of this book is to help readers understand state-of-the-art techniques in biological data mining and data management and includes topics such as:
- preprocessing tasks such as data cleaning and data integration as applied to biological data
- classification and clustering techniques for microarrays
- comparison of RNA structures based on string properties and energetics
- discovery of the sequence characteristics of different parts of the genome
- mining of haplotypes to find disease markers
- sequencing of events leading to the folding of a protein
- inference of the subcellular location of protein activity
- classification of chemical compounds based on structure
- special purpose metrics and index structures for phylogenetic applications
- a new query language for protein searching based on the shape of proteins
- very fast indexing schemes for sequences and pathways
Aimed at computer scientists, necessary biology is explained.
Content:
Front Matter....Pages i-xi
Introduction to Data Mining in Bioinformatics....Pages 3-8
Survey of Biodata Analysis from a Data Mining Perspective....Pages 9-39
AntiClustAl: Multiple Sequence Alignment by Antipole Clustering....Pages 43-57
RNA Structure Comparison and Alignment....Pages 59-81
Piecewise Constant Modeling of Sequential Data Using Reversible Jump Markov Chain Monte Carlo....Pages 85-103
Gene Mapping by Pattern Discovery....Pages 105-126
Predicting Protein Folding Pathways....Pages 127-141
Data Mining Methods for a Systematics of Protein Subcellular Location....Pages 143-187
Mining Chemical Compounds....Pages 189-215
Phyloinformatics: Toward a Phylogenetic Database....Pages 219-241
Declarative and Efficient Querying on Protein Secondary Structures....Pages 243-273
Scalable Index Structures for Biological Data....Pages 275-296
Back Matter....Pages 297-340
....