
With either approach, the corresponding SPARCLE record(s) will display the name and functional label of the architecture, supporting evidence, and links to other proteins with the same architecture.Ĭonserved Domain Architecture Retrieval Tool (CDART) performs similarity searches of the Entrez Protein database based on domain architecture, defined as the sequential order of conserved domains in protein queries. To use SPARCLE, you can either: (1) enter a query protein sequence into CD-Search, which will display a "Protein Classification" on the results page if the query protein has a hit to a curated domain architecture in the SPARCLE database ( example, using NP_387887 as the query sequence), or (2) search the SPARCLE database by keyword to retrieve domain architectures that contain the term(s) of interest in their descriptions ( example, searching for the words " chloride" and " channel" in the domain architecture record, and limiting the results to curated architectures). Subfamily Protein Architecture Labeling Engine (SPARCLE) is a resource for the functional characterization and labeling of protein sequences that have been grouped by their characteristic domain architecture. The Batch CD-Search Help provides additional details.
Two protein sequence alignment full#
It enables you to view a graphical display of the concise or full search result for any individual protein from your input list, or to download the results for the complete set of proteins. The CD-Search Help provides additional details, including information about running CD-Search locally.īatch CD-Search serves as both a web application and a script interface for a conserved domain search on multiple protein sequences, accepting up to 4,000 proteins in a single job. High confidence associations between a query sequence and conserved domains are shown as specific hits. The results of CD-Search are presented as an annotation of protein domains on the user query sequence ( illustrated example), and can be visualized as domain multiple sequence alignments with embedded user queries. It uses RPS-BLAST, a variant of PSI-BLAST, to quickly scan a set of pre-calculated position-specific scoring matrices ( PSSMs) with a protein query. CDD content includes NCBI-curated domains, which use 3D-structure information to explicitly define domain boundaries and provide insights into sequence/structure/function relationships, as well as domain models imported from a number of external source databases ( Pfam, SMART, COG, PRK, TIGRFAMs).ĬD-Search is NCBI's interface to searching the Conserved Domain Database with protein or nucleotide query sequences. These are available as position-specific score matrices ( PSSMs) for fast identification of conserved domains in protein sequences via RPS-BLAST.

Conserved Domains Database (CDD) and ResourcesĬonserved Domains and Protein ClassificationĬDD is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins.
