Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

https://github.com/delalamo/af2_conformations
11 March 2022, 09:41:56 UTC
  • Code
  • Branches (1)
  • Releases (0)
  • Visits
    • Branches
    • Releases
    • HEAD
    • refs/heads/main
    No releases to show
  • 732ab36
  • /
  • notebooks
  • /
  • mutate_msa.ipynb
Raw File Download
Take a new snapshot of a software origin

If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.

Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.
swh spinner

Processing "take a new snapshot" request ...

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
  • revision
  • snapshot
origin badgecontent badge
swh:1:cnt:d98af4891201835ce862ae361eca4b23dae462e7
origin badgedirectory badge
swh:1:dir:8c14b282022ddf4d3a689d9fcae874d275765607
origin badgerevision badge
swh:1:rev:d60db86886186e80622deaa91045caccaf4103d3
origin badgesnapshot badge
swh:1:snp:cb0ae8b45df2c5f548867364bbfe52debb482bda

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
  • revision
  • snapshot
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Tip revision: d60db86886186e80622deaa91045caccaf4103d3 authored by Diego del Alamo on 17 February 2022, 16:16:20 UTC
Update predict.py
Tip revision: d60db86
mutate_msa.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1-Hs48uFzsYx"
   },
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/delalamo/af2_conformations/blob/main/notebooks/mutate_msa.ipynb)\n",
    "\n",
    "# Conformationally selective AlphaFold predictions by mutagenesis\n",
    "\n",
    "This notebook provides an interface for predicting the structures of proteins using AlphaFold [1]. It simplifies the introduction of mutations in the query sequence and multiple sequence alignment (MSA). **Its intended audience are users familiar with Python.** The code borrows heavily from ColabFold [2], and makes use of the same MMSeqs2 API for retrieval of sequence alignments and templates [3,4]. Additionally, the principle outline in this notebook is introduced and described in ref [5]. Users of this notebook should cite these publications (listed below).\n",
    "\n",
    "This notebook provides an approach, described independently by [Sergey Ovchinnikov](https://twitter.com/sokrypton/status/1464748132852547591) and [Richard A. Stein and Hassane S. Mchaourab](https://www.biorxiv.org/content/10.1101/2021.11.29.470469v1) [5], at concealing and/or modifying interresidue relationships across the MSA by mutagenesis. This has the effect of causing AlphaFold2 to sample alternative conformations.\n",
    "\n",
    "Some notes and caveats:\n",
    "* Currently only the structures of monomers can be predicted.\n",
    "* Relax is disabled. If you plan on evaluating these structures using an energy function, be sure to minimize them using OpenMM [6] or Rosetta [7] beforehand.\n",
    "* We removed many of the bells and whistles of other colab notebooks, including pLDDT-based model ranking, visualization of sequence alignment coverage, progress bars, etc.\n",
    "\n",
    "Models can be downloaded either at the end of the run or incrementally while the program is still running. For the latter, click the folder icon on the left sidebar, hovering over the file of interest and click the three vertical dots, and select \"download\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "collapsed": true,
    "id": "7iS51RcjffzJ",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "#@title Set up Colab environment (1 of 2)\n",
    "%%bash\n",
    "\n",
    "pip install biopython dm-haiku==0.0.5 ml-collections\n",
    "\n",
    "# get templates\n",
    "git clone https://github.com/delalamo/af2_conformations.git\n",
    "\n",
    "# get AF2\n",
    "git clone https://github.com/deepmind/alphafold.git\n",
    "\n",
    "mv alphafold alphafold_\n",
    "mv alphafold_/alphafold .\n",
    "rm -r alphafold_\n",
    "# remove \"END\" from PDBs, otherwise biopython complains\n",
    "sed -i \"s/pdb_lines.append('END')//\" /content/alphafold/common/protein.py\n",
    "sed -i \"s/pdb_lines.append('ENDMDL')//\" /content/alphafold/common/protein.py\n",
    "\n",
    "# download model params (~1 min)\n",
    "mkdir params\n",
    "curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar | tar x -C params\n",
    "\n",
    "# download libraries for interfacing with MMseqs2 API\n",
    "apt-get -y update\n",
    "apt-get -y install jq curl zlib1g gawk\n",
    "\n",
    "# setup conda\n",
    "wget -qnc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n",
    "bash Miniconda3-latest-Linux-x86_64.sh -bfp /usr/local  2>&1 1>/dev/null\n",
    "rm Miniconda3-latest-Linux-x86_64.sh\n",
    "\n",
    "# setup template search\n",
    "conda install -q -y  -c conda-forge -c bioconda kalign3=3.2.2 hhsuite=3.3.0 python=3.7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "zBcw1SX3ZGJH"
   },
   "outputs": [],
   "source": [
    "#@title Set up Colab environment (2 of 2)\n",
    "\n",
    "from google.colab import files\n",
    "\n",
    "from af2_conformations.scripts import predict\n",
    "from af2_conformations.scripts import util\n",
    "from af2_conformations.scripts import mmseqs2\n",
    "\n",
    "import random\n",
    "import os\n",
    "\n",
    "from absl import logging\n",
    "logging.set_verbosity(logging.DEBUG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gJx1y_Fx3iMz"
   },
   "source": [
    "Once everything has been installed, the code below can be modified and executed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qREPMYc5UTYm"
   },
   "outputs": [],
   "source": [
    "# In this example we introduce alanine mutations into the sequence of MCT1\n",
    "# MCT1 exclusively adopts the inward-facing conformation when no templates are used\n",
    "# Alanine mutations were placed at the extracellular gate\n",
    "jobname = 'MCT1'\n",
    "sequence = (\"MPPAVGGPVGYTPPDGGWGWAVVIGAFISIGFSYAFPKSITVFFKEIEGIFHATTSEVSWISS\"\n",
    "            \"IMLAVMYGGGPISSILVNKYGSRIVMIVGGCLSGCGLIAASFCNTVQQLYVCIGVIGGLGLAF\"\n",
    "            \"NLNPALTMIGKYFYKRRPLANGLAMAGSPVFLCTLAPLNQVFFGIFGWRGSFLILGGLLLNCC\"\n",
    "            \"VAGALMRPIGPKPTKAGKDKSKASLEKAGKSGVKKDLHDANTDLIGRHPKQEKRSVFQTINQF\"\n",
    "            \"LDLTLFTHRGFLLYLSGNVIMFFGLFAPLVFLSSYGKSQHYSSEKSAFLLSILAFVDMVARPS\"\n",
    "            \"MGLVANTKPIRPRIQYFFAASVVANGVCHMLAPLSTTYVGFCVYAGFFGFAFGWLSSVLFETL\"\n",
    "            \"MDLVGPQRFSSAVGLVTIVECCPVLLGPPLLGRLNDMYGDYKYTYWACGVVLIISGIYLFIGM\"\n",
    "            \"GINYRLLAKEQKANEQKKESKEEETSIDVAGKPNEVTKAAESPDQKDTDGGPKEEESPV\" )\n",
    "\n",
    "# The MMSeqs2Runner object submits the amino acid sequence to\n",
    "# the MMSeqs2 server, generates a directory, and populates it with\n",
    "# data retrieved from the server.\n",
    "mmseqs2_runner = mmseqs2.MMSeqs2Runner( jobname, sequence )\n",
    "\n",
    "# Fetch sequences and download data\n",
    "a3m_lines, _ = mmseqs2_runner.run_job()\n",
    "\n",
    "# Define the mutations and introduce into the sequence and MSA\n",
    "muts = { x: \"A\" for x in [ 41,42,45,46,56,59,60,63,281,282,285,286,403,407 ] }\n",
    "\n",
    "mutated_msa = util.mutate_msa( a3m_lines, muts )\n",
    "mutated_seq = util.mutate_msa( sequence, muts )\n",
    "\n",
    "for n_model in range( 5 ):\n",
    "\n",
    "  # Specify the name of the output PDB\n",
    "  outname = f\"model_{ n_model }.pdb\"\n",
    "\n",
    "  predict.predict_structure_no_templates( mutated_seq, outname, mutated_msa )\n",
    "\n",
    "# To download predictions:\n",
    "!zip -FSr \"af2.zip\" *\".pdb\"\n",
    "files.download( \"af2.zip\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "diiyiNsm8WQZ"
   },
   "source": [
    "# References:\n",
    "1. Jumper et al \"Highly accurate protein structure prediction with AlphaFold\" Nature (2021)\n",
    "2. Mirdita et al \"ColabFold - making protein folding accessible to all\" biorXiv (2021)\n",
    "3. Steinegger & Söding \"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets\" Nature Biotechnology (2017)\n",
    "4. Mirdita et al \"MMseqs2 desktop and local web server app for fast, integrative sequence searches\" Bioinformatics (2019)\n",
    "5. Stein & Mchaourab \"Modeling Alternate Conformations with Alphafold2 via Modification of the Multiple Sequence Alignment\" bioRxiv (2021)\n",
    "6. Eastman et al \"OpenMM 7: Rapid development of high performance algorithms for molecular dynamics\" Plos Comp Bio (2017)\n",
    "7. Koehler-Leman et al \"Macromolecular modeling and design in Rosetta: recent methods and frameworks\" Nature Methods (2020)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "OaYh1cy9cIIr"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ih2OPyWtoGRu"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "machine_shape": "hm",
   "name": "mutate_msa.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

back to top

Software Heritage — Copyright (C) 2015–2026, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API