Uniclust Downloads

The following gzipped tar files are available for download:

uniclust##_yyyy_mm.tar.gz

These archive contains the following files, with ## being the target sequence identity of the clustering and yyyy_mm the corresponding Uniprot release. It will be updated every two months.

uniclust##_yyyy_mm_seed.fasta:

Representative (=seed) sequences of every cluster in FASTA format.

uniclust##_yyyy_mm_consensus.fasta:

Consensus sequences of every cluster in FASTA format. The sequence header starts with the Uniclust cluster identifier uc##-yymm-<number>, the UniProt accession code of the representative sequence, the size of the cluster, the up to 5 best functional annotations from cluster members, and UniProt identifiers of all cluster members.

uniclust##_yyyy_mm_cluster_mapping.tsv:

Tab-separated list with two columns of UniProt accession codes, the first for the representative sequence of the cluster and the second for the member sequence.

uniclust_yyyy_mm_annotation.tar.gz

archive containing three files with for Pfam, SCOP and PDB annotations, each formatted as tab-separated lists with nine columns: (1,2) identifiers for query and target, (3-5, 6-8) domain start and end-position and total sequence length for both UniProt and database sequence, (9) HHblits E-value.

uniboost##_yyyy_mm.tar.gz

Uniboost database files in compressed A3M alignment format, with additional support files for the HH-suite version 3.

uniclust30_yyyy_mm_hhsuite.tar.gz

Archive containing Uniclust multiple sequence alignments for all clusters in a3m format, generated with Clustal Omega, and additional support files for use with legacy HH-suite version 2 and current version 3.