Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

https://github.com/delalamo/af2_conformations
11 March 2022, 09:41:56 UTC
  • Code
  • Branches (1)
  • Releases (0)
  • Visits
    • Branches
    • Releases
    • HEAD
    • refs/heads/main
    No releases to show
  • 732ab36
  • /
  • notebooks
  • /
  • choose_templates.ipynb
Raw File Download
Take a new snapshot of a software origin

If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.

Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.
swh spinner

Processing "take a new snapshot" request ...

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
  • revision
  • snapshot
origin badgecontent badge
swh:1:cnt:c796851bd4b8328ea2e1c9a0679e9f2512bf069d
origin badgedirectory badge
swh:1:dir:8c14b282022ddf4d3a689d9fcae874d275765607
origin badgerevision badge
swh:1:rev:d60db86886186e80622deaa91045caccaf4103d3
origin badgesnapshot badge
swh:1:snp:cb0ae8b45df2c5f548867364bbfe52debb482bda

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
  • revision
  • snapshot
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Tip revision: d60db86886186e80622deaa91045caccaf4103d3 authored by Diego del Alamo on 17 February 2022, 16:16:20 UTC
Update predict.py
Tip revision: d60db86
choose_templates.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1-Hs48uFzsYx"
   },
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/delalamo/af2_conformations/blob/main/notebooks/choose_templates.ipynb)\n",
    "\n",
    "# Conformationally selective AlphaFold predictions\n",
    "\n",
    "This notebook provides an interface for predicting the structures of proteins using AlphaFold [1]. It simplifies the use of custom templates for the prediction of specific conformations. **Its intended audience are users familiar with Python.** The code borrows heavily from ColabFold [2], and makes use of the same MMSeqs2 API for retrieval of sequence alignments and templates [3,4]. Users of this notebook should cite these publications (listed below).\n",
    "\n",
    "The fundamental differences between this notebook and those provided by DeepMind and ColabFold are that 1) it simplifies the tuning of specific parameters by exposing them directly to the user, and 2) it allows users to specify which templates should be retrieved from the PDB and used for modeling. The former is useful when various parameters need to be chosen (e.g. MSA depth), while the latter allows targeting of specific conformational subspaces.\n",
    "\n",
    "Some notes and caveats:\n",
    "* Template subsampling is turned on by default. This should have no impact for predictions using four or fewer total templates (turned off in AlphaFold and ColabFold).\n",
    "* Currently only the structures of monomers can be predicted.\n",
    "* Relax is disabled. If you plan on evaluating these structures using an energy function, be sure to minimize them using OpenMM [5] or Rosetta [6] beforehand.\n",
    "* Not all PDBs are in the MMSeqs2 template database. There is a chance that PDBs of interest will not be retrieved.\n",
    "* Templates are aligned based on sequence similarity, not structural similarity. This may pose a problem when using distantly related proteins as templates.\n",
    "* We removed many of the bells and whistles of other colab notebooks, including pLDDT-based model ranking, visualization of sequence alignment coverage, progress bars, etc.\n",
    "\n",
    "Models can be downloaded either at the end of the run or incrementally while the program is still running. For the latter, click the folder icon on the left sidebar, hovering over the file of interest and click the three vertical dots, and select \"download\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "7iS51RcjffzJ",
    "outputId": "8322a12c-3a48-41c5-8c8d-751e285eb5d0"
   },
   "outputs": [],
   "source": [
    "#@title Set up Colab environment (1 of 2)\n",
    "%%bash\n",
    "\n",
    "pip install biopython dm-haiku==0.0.5 ml-collections\n",
    "\n",
    "# get templates\n",
    "git clone https://github.com/delalamo/af2_conformations.git\n",
    "\n",
    "# get AF2\n",
    "git clone https://github.com/deepmind/alphafold.git\n",
    "\n",
    "mv alphafold alphafold_\n",
    "mv alphafold_/alphafold .\n",
    "rm -r alphafold_\n",
    "# remove \"END\" from PDBs, otherwise biopython complains\n",
    "sed -i \"s/pdb_lines.append('END')//\" /content/alphafold/common/protein.py\n",
    "sed -i \"s/pdb_lines.append('ENDMDL')//\" /content/alphafold/common/protein.py\n",
    "\n",
    "# download model params (~1 min)\n",
    "mkdir params\n",
    "curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar | tar x -C params\n",
    "\n",
    "# download libraries for interfacing with MMseqs2 API\n",
    "apt-get -y update\n",
    "apt-get -y install jq curl zlib1g gawk\n",
    "\n",
    "# setup conda\n",
    "wget -qnc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n",
    "bash Miniconda3-latest-Linux-x86_64.sh -bfp /usr/local  2>&1 1>/dev/null\n",
    "rm Miniconda3-latest-Linux-x86_64.sh\n",
    "\n",
    "# setup template search\n",
    "conda install -q -y  -c conda-forge -c bioconda kalign3=3.2.2 hhsuite=3.3.0 python=3.7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "zBcw1SX3ZGJH"
   },
   "outputs": [],
   "source": [
    "#@title Set up Colab environment (2 of 2)\n",
    "\n",
    "from google.colab import files\n",
    "\n",
    "from af2_conformations.scripts import predict\n",
    "from af2_conformations.scripts import util\n",
    "from af2_conformations.scripts import mmseqs2\n",
    "\n",
    "import random\n",
    "import os\n",
    "\n",
    "from absl import logging\n",
    "logging.set_verbosity(logging.DEBUG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gJx1y_Fx3iMz"
   },
   "source": [
    "Once everything has been installed, the code below can be modified and executed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "qREPMYc5UTYm",
    "outputId": "1d60963a-d5f3-499b-e342-8d7133bc3180"
   },
   "outputs": [],
   "source": [
    "jobname = 'T4_lysozyme'\n",
    "sequence = (\"MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNCNGVIT\"\n",
    "            \"KDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCALINMVFQMGETGVAGFTNSL\"\n",
    "            \"RMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL\" )\n",
    "\n",
    "# PDB IDs, written uppercase with chain ID specified\n",
    "pdbs = [\"6LB8_A\",\n",
    "        \"6LB8_C\",\n",
    "        \"4PK0_A\",\n",
    "        \"6FW2_A\"]\n",
    "\n",
    "# The MMSeqs2Runner object submits the amino acid sequence to\n",
    "# the MMSeqs2 server, generates a directory, and populates it with\n",
    "# data retrieved from the server. Templates may be specified by the user.\n",
    "# All templates are fetched if none are provided or the list is empty.\n",
    "mmseqs2_runner = mmseqs2.MMSeqs2Runner( jobname, sequence )\n",
    "\n",
    "# Fetch sequences and download data\n",
    "a3m_lines, template_path = mmseqs2_runner.run_job( templates = pdbs )\n",
    "\n",
    "# A nested loop in which 5 models are generated per MSA depth value\n",
    "# In our manuscript we use three MSA depths: 32 sequences, 128, and 5120\n",
    "for nseq in range( 16, 34 ):\n",
    "  for n_model in range( 5 ):\n",
    "\n",
    "    # Randomly choose one of the two AlphaFold neural\n",
    "    # networks capable of using templates.\n",
    "    # In our experience, model 1 is more sensitive to input templates.\n",
    "    # However, this observation is purely anecdotal and not backed up by\n",
    "    # hard numbers.\n",
    "    model_id = random.choice( ( 1, 2 ) )\n",
    "\n",
    "    # Specify the name of the output PDB\n",
    "    outname = f\"{ n_model }_{ nseq }.pdb\"\n",
    "\n",
    "    # Run the job and save as a PDB\n",
    "    predict.predict_structure_from_templates(\n",
    "        mmseqs2_runner.seq, # NOTE mmseqs2_runner removes whitespace from seq\n",
    "        outname,\n",
    "        a3m_lines,\n",
    "        template_path = template_path,\n",
    "        model_id = model_id,\n",
    "        max_msa_clusters = nseq // 2,\n",
    "        max_extra_msa = nseq,\n",
    "        max_recycles = 1\n",
    "    )\n",
    "\n",
    "    # Alternatively, users can run a template-free prediction by uncommenting\n",
    "    # the line below:\n",
    "\n",
    "    '''\n",
    "    predict.predict_structure_no_templates( sequence, outname,\n",
    "         a3m_lines, model_id = model_id, max_msa_clusters = nseq // 2,\n",
    "         max_extra_msa = nseq, max_recycles = 1 )\n",
    "    '''\n",
    "\n",
    "# To download predictions:\n",
    "!zip -FSr \"af2.zip\" *\".pdb\"\n",
    "files.download( \"af2.zip\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "diiyiNsm8WQZ"
   },
   "source": [
    "# References:\n",
    "1. Jumper et al \"Highly accurate protein structure prediction with AlphaFold\" Nature (2021)\n",
    "2. Mirdita et al \"ColabFold - making protein folding accessible to all\" biorXiv (2021)\n",
    "3. Steinegger & Söding \"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets\" Nature Biotechnology (2017)\n",
    "4. Mirdita et al \"MMseqs2 desktop and local web server app for fast, integrative sequence searches\" Bioinformatics (2019)\n",
    "5. Eastman et al \"OpenMM 7: Rapid development of high performance algorithms for molecular dynamics\" Plos Comp Bio (2017)\n",
    "6. Koehler-Leman et al \"Macromolecular modeling and design in Rosetta: recent methods and frameworks\" Nature Methods (2020)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "OaYh1cy9cIIr"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ih2OPyWtoGRu"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "machine_shape": "hm",
   "name": "choose_templates.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

back to top

Software Heritage — Copyright (C) 2015–2026, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API