Quick tests of Model-Angelo

Last week, Sjors Scheres at MRC-LMB announced at twitter that Model-Angelo developed in his group is released at GitHub. It used deep machine learning approach to build cryo-EM protein structures from scratch. Model-Angelo supports two modes, with and without sequence, to build 3D structures.

The installation was pretty simple and easy. Docker-based anaconda/miniconda software makes it easy for users to run Model-Angelo under user’s home directory. Although Model-Angelo suggests Nvidia GTx 2080 GPU cards or newer for computation, CUDA 11 is also a requirement. My workstation has been using Nvidia driver 438 (CUDA 10.1) for 3 years. As it is quite stable for all users and all programs, I firstly hesitated to upgrade it to Nvidia driver 480 or newer for CUDA 11. I gave a try and I found out CPUs alone are also okay but much slower. CPU only took about 10.5 hours to finish the job. While I used a new Nvidia drive with CUDA11, the computation time is 0.56 hours.

Two maps were used to test Model-Angelo. One is a 2.4 Å map of Ecoli glutamine synthetase (GS). The other one is a 2.9 Å map of 723-aa malate synthase G (MSG). Both tests were amazing and I wrote the Ecoli GS case here.

Ecoli GS is D6-fold dodecamer (12mer). I gave a copy of GS sequence for a quick test to know how ModelAngelo handles multimers. It turns out the quality is pretty good. Structures below are Ecoli GS cryo-EM structures made by Model-Angelo (red) and myself (blue). The backbone RMSD is 0.8 Å (CE align in PyMOL). Most flexible loops, especially the loops on the outer surface of hexameric ring are well built by Model-Angelo. I am very surprised but also satisfied that how powerful Model-Angelo is.

Red by ModelAngelo, Blue by myself (manually curated)

Although Model-Angelo did a good job for model building, the structural quality needs to be improved. It has clash score 76.43, 7.5% rotamer outliers, and 2.76% outliers in the Ramachandran plot. So I did a quick 1-cycle refinement by Phenix for this model+map, the refined structure is much improved (see cyan GS structure below). The clash score now is 12.97. Outliers of Ramachandran plot and roatmers are 0.93% and 1.81%, respectively.

red: model-angelo built, cyan: phenix refined model-angelo structure
blue: manually curated, red: model-angelo, cyan: red + phenix refinement

The monomeric subunits shown above for the 3 Ecoli GS present great consistences in structural regions and the loops. I don’t go for the sidechains residue-by-residue yet, but the quality is impressive. One can get a publishable structure within a half day by combining live data processing, model-angelo, and automatic refinement.

Here is a summary of the built structures.

AnalysisModel-angeloModel-angelo + PhenixManually curated
Chains121212
Protein residues555955595628
MolProbity score3.482.041.56
Clash score76.4312.977.61
Ramachandran favored92.77%95.56%97.22%
Ramachandran outliers2.76%0.93%0
Rotamer outliers7.52%1.81%0
CaBLAM outlier0%0.14%0