Computational Biology is unusually accessible as an applied science in part because so much can be done by an individual on modest hardware without access to a laboratory or computing cluster. All you need to bring is your brain.
A large part of the reason for the accessibility of the topic is the sustained drive for Open Science practised by bioinformatics, computational biologists, and other scientists. These have encouraged, and sometimes demanded, open, free, FAIR (findable, accessible, interoperable, reusable) data, which has benefited us all.
This page lists some of the incredibly valuable, open data resources that might be of use to you in your project. It is not an exhaustive list.
1 Sequence data repositories (including annotated genome data)
- NCBI - the repository of record for many datasets, not just sequence data
- Assembly - assembled genomes and other metadata
- GenBank - all publicly available DNA sequences
- Nucleotide - aggregated data from GenBank, RefSeq, and elsewhere
- RefSeq - curated, non-redundant, gDNA, transcript, and protein sequences
- SRA - sequencing read data
- UniProt - protein sequence and annotation data
- Ensembl - vertebrate genome data
- Ensembl Bacteria - bacterial genome data
- Ensembl Fungi - fungal genome data
- Ensembl Plants - plant genome data
- Ensembl Protists - protist genome data
- InterPro - protein families and sequence domains
2 Structural data repositories
- RCSB-PBD - the repository of record for biomolecular structure data
- EMBL AlphaFold - EMBL’s AlphaFold predictions for multiple organisms
3 Transcriptome data repositories
4 Molecular interaction databases
5 Biological models
- BioModels - mathematical models of biological systems
6 Specialised functional databases
7 Taxonomic and other classification resources
- NCBI Taxonomy
- Widely-used, but not as widely trusted, as it is often at odds with other classification databases - LP
- GTDB
- Excellent genome-based microbial taxonomy and classification database and resource - LP
- genomeRxiv
- Genome-based, taxonomy-independent classification. I work on this - LP
- Enterobase
- The central resource for enteric bacteria genomic variation and classification - LP
- PhytoBacExplorer
- Like Enterobase, but for plant pathogenic bacteria - LP