About us
The Orengo group, at University College London has developed and maintained the CATH database since the mid-1990s [Orengo et al., 1997]. The CATH database (https://www.cathdb.info), is one of the ELIXIR’s Core Data Resources. It provides an up-to-date and systematic structure-based classification of protein 3D structures in the PDB. CATH classifies protein domains into following hierarchical levels: Class (C), Architecture (A), Topology (T) and Homologous Superfamilies (H) [Sillitoe et al., 2018]. CATH also classifies domain sequences predicted to belong to the CATH Superfamilies. An additional layer of classification i.e. Functional Families (FunFams) within Homologous Superfamilies was introduced in 2012 [Lee et al., 2012; Sillitoe et al., 2013; Das et al., 2015].
We now we introduce the Functional Variation (FunVar) platform (https://funvar.cathdb.info) which exploits the FunFams and structural data in CATH. FunVar has been designed to facilitate analysis of population/pathogenic variants in human proteins or pathogen/host genes. Variants (specifically non-synonymous polymorphisms i.e. residue mutations) are mapped to protein structures, where available, to allow assessment of their proximity to functional sites and therefore possible impact on protein function. Variant data is obtained from publicly available sources. We hope that visualisation of these mutations on the protein structure, illustrating their proximity to functional sites can help guide diagnostics and therapeutics. Future editions of these pages will add quantitative data suggesting the likelihood of functional impact.
Currently, CATH FunVar provides two use cases:
- annotations for proteins of the SARS-CoV-2 virus and its human host interactor proteins i.e. human proteins interacting with viral proteins
- human proteins with mutations implicated in cancer, taken from TCGA.
In the future, the data on other important pathogenic infections such as tuberculosis will also be made available at CATH FunVar.