surflex 2.7 包括surflex-dock 和 surflex-sim. The former combines a refined descendant of the Hammerhead scoring function coupled with the alignment/conformation optimization proce-dures implemented for morphological similarity. The latter addresses aspects of computational drug design in the absence of a protein structure.

align复合物

# surflex-dock 入门

• 对于虚拟筛选通常使用 “-pscreen” 或者 “-lscreen” 选项，对于最优构象预测通常使用 “-pgeom”选项，对于以上两种情况使用 ring search (“+ring”) 均能够提升计算结果(同时会消耗更多时间).
• 基本的参数为 -pscreen 和 -pgeom. 前者多用于虚拟筛选，自动添加参数 +premin 和 +remin，二者分别turn on ligand minimization prior to docking and all-atom in-pocket minimization after docking.
-pgeom 则包括 -pscreen的内容，并添加了 -multistart 4, -ndock_final 20 和 -div_rms 0.5 三个参数。
• 建议使用Sybyl mol2 结构作为输入，同时也可使用MDL或者sd格式(小分子)，PDB(蛋白，容易出错)。所有输入文件需要加氢(包括非极性氢)
• surflex-dock 允许 multi-structure docking (multiple protein conformations for a single target), and protein pocket flexibility.

• 构建protomol
• 对接
• 结果分析及后处理

## proto command: Protomol Generation

Notes: When using a ligand to specify an active site, by default, the voxels occupied by the ligand are explored by the protomol even if they are not highly buried. This can be turned off by employing the -mark_lig switch.

none选项会让surflex探索蛋白表面的口袋，并在每一个口袋处形成对应的component文件(p1-comp-*.mol2)，用于形成protomol:

proto_thresh (default 0.5, 0-1): determines the degree of buried-ness for the primary volume used to generate the protomol. 值越大则体积越小。文件p1-marked-thresh-<n>.pdb对应于不同临界值的空间。

proto_bloat (default 0): indicates how far beyond the primary volume (in Angstroms) the protomol volume should be expanded。

Smaller protomols yield faster searches, and it is not the case that the docked ligands are strictly limited to the volume of the protomol.

With this procedure, Surflex-Dock will build a protomol that gives a fixed degree of coverage against the residues that are proximal to the ligand (as specified) or explicitly listed residues (using the resproto command). Higher values of adapt_thresh yield higher degrees of coverage.

## dock command: Docking a Single Molecule

1. -multistart <n> 以n个初始位置作为起点进行对接，然后返回最佳结果。
For flexible molecules, since the search is not exhaustive, using multiple starting points will frequently yield higher-scoring and more consistent results, independent of initial starting pose. Generally speaking, -multistart 10 is as high as one sees returns on the investment of time, with the plateau beginning at –multistart 4.
2. -ndock_final <n> 确定最终输出的top n个构象。
3. The third option affects the density of alignment search. The new method can be controlled in terms of search density with the –spindense parameter (higher numbers indicate more dense search) as well as the -nspin parameter (higher numbers indicate denser sampling of axial rotations).

## -fmatch option: Docking using Placed Fragments

surflex允许基于分子片段的位置进行对接，通过将目标分子ligand.mol2中的片段与给定的frag.mol2中片段进行align来实现

frag.mol2中允许存在多个片段。

-cpen 选项确定片段align时的约束力(eg. -cpne 100)。

The fragment options allow for coarser conformational search (+fcoarse), varying the depth of initial conformational search from the placed fragment (-fidepth [default 3]), and varying the depth of successive conformational enumerations (-frdepth [default 3]).

## mdock_list command: Docking to Multiple Protein Conformations

Targets 文件描述蛋白的信息及对应的protomol，允许每个蛋白构象使用单独的protomol，文件格式如下:

## Protein Pocket Alignment

ProteinList 文件每一行包含蛋白和对应的配体的路径，配体用来定义蛋白的口袋。输出文件为 palign-results，包含align的信息。接下来构建 optimal tree of alignments

out-ligands-aligned.mol2 和 out-proteins-aligned.mol2 包含align之后的结构(蛋白只有结合口袋)，多个结构在一个文件中。

psim_one and psim_list offer the opportunity to align one protein to another or a list of proteins to a single one and function analogously to above, without the requirement to separately build a final alignment.

Note also that the psim_buildtree command produces a file suitable for generating a visual depiction of the alignment tree using the Dot program from the GraphViz collection (out-tree.dot in the example above).

## Protein Pocket Preparation

Frequently, a protein structure will exhibit significant clashes with a cognate ligand. 所以可以通过pprep_protons 或pprep_all命令来调整结合口袋中氢原子或所有原子的位置来消除clashes。

## Scoring Function Optimization

Surflex-Dock Version 2.4 and higher offers the opportunity to tune the Surflex-Dock scoring function based on additional data. The data can be positive, which includes protein/ligand complexes with known affinity. The data can also include negative information.

## Ring Flexibility

Surflex-Dock implements in-lined ring flexibility in a general way. The +ring option turns this procedure on. The behavior is modified by the -rthresh parameter (kcals above global energy minimum beyond which ring conformations are not kept).
Turning on ring search (“+ring”) can yield improved results for both screening and pose predic-tion at the expense of some additional computational cost.

## Post-Processing Results

### logprocess command

1) a derived combination score, and 2) a method for placing thresholds on the crash and polar score

A combination score is given first, which combines the reported score and crash values.
The threshold that is supplied on the command line for crash (-1.0 in the example above) is allowed “for free” so the amount of crash that is below that level is given back to the affinity score. So, with an observed crash of –2.0, a thresh-old of –1.0, and an affinity score of 7.1, the combination score would be 8.1.
The smaller the crash threshold, generally the better able Surflex is to reject false positives. However, this may come at the expense of some true positives that have particularly tight fits into the protein active site in question.

With Version 1.31 and later, we recommend running the logprocess command with all default parameters. This yields no change from the reported score in the log file and has been used for the results in all protein screening enrichment benchmark tests.

### mget command

mget 选项具有相同的格式，但第二个参数是一个一个文件，内容为待提取的分子名称。

## Other commands:

### prot 对分子进行质子化

It is suggested to make use of +misc_remin in order to eliminate conformational bias that may affect different scaffolds differently and lead to bias in results.
Note: +misc_ring will generate ring conformations as well. If –misc_outconfs is set to greater than 1, then each molecule will generate an individual *-ring.mol2 file in addition to the single conformation in the molecule archive.
-fp 选项设定删去所有的氢原子，然后加氢。

### rms

RMS computes rmsd between mol1 and mol2

## Other options:

-multiproc # 指定在n个处理器上运行

-lparam # This parameter takes a file as input that contains alternate scoring function parameters for Surflex-Dock. The file “default.param” contains the default pa-rameters for Surflex-Dock, and it is at the top-level in the software distribution.

# Surflex-Sim:

1. 基于多个配体形成 ligand-based hypothesis
2. 基于 hypothesis 利用相似性来筛选分子
3. 结果后处理

## Pre-Searching Molecules

search_library 命令，一次性将多个分子结构转化为library文件，后续可被-lscreen-lscreenopt使用。

mol_list 文件中每一行分别为输入分子的路径，比如:

## Aligning Molecules

Generally, the first hypothesis (hypo-0) is the most sensible to use. However, when something is known about the SAR of analogs of the ligands used to generate a hypothesis, a better selection of hypothesis may be possible by browsing the top scoring ones

Lists of molecules can be aligned to either a single molecule or to mutually aligned sets of molecules using align_list:

-nsim_final 参数控制输出的最终构象的个数。

The –fscreen option yields very fast similarity computations. It does this by turning off pre-minimization, all-atom post alignment optimization, eliminating molecular fragmenta-tion in favor of conformer sampling, and by limiting the degree of local pose optimization performed in the last stages of molecular alignment. This can be an especially useful parameter-ization for going through large databases quickly, especially if followed by a more thorough screen of the top fraction of ligands using more thorough settings. The –pscreen option is the recommended setting for screening, and it is fast enough for large databases in situations where multiple processors are available. For detailed studies of relative alignments, the –pgeom option is suggested, and where ring geometries are also to be considered the –pgeomx option is preferred.

## Reference Sets and Similarity Vectors

liglist 包含待比较的分子(内容为小分子的存储路径)
Reflist is a well-chosen set of (usually 20) molecules in fixed conformations and alignments.

-multiproc 指定使用多核，-fscreen 能提升速度和精确性

### 选择合适的reference set比较重要

1) the molecules must be orthogonal (i.e. dissimilar from each other); and 2) the molecules must come from the population of molecules for which comparisons are to be made

In the case of small-molecule drugs, selecting a diverse reference set from the CMC database (available from MDL) has worked well.

The basis set of molecules used in Cleves and Jain (2006) is included in the distribution (see the BasisMols folder under the Similarity examples). Note that we do not suggest that this set will be generally useful for all populations of molecules. We have found that the set works well for reproducing similarities within the space of approved therapeutics. It will probably be the case that for specific collections, different sets may be better.

# SURFLE-XQMOD TECHNICAL MANUAL

The SurflexQMOD set of algorithms integrates underlying ideas and algorithms from molecular similarity [1–6], molecular docking [7–14], and multipleinstance learning [7, 11, 13, 15–18] in order permit the construction of protein binding site analogs. The theory and use of the method for binding affinity prediction and iterative lead optimization is discussed in the companion book to this manual as well as several papers [19–22].