af_analysis package
Submodules
af_analysis.data module
- class af_analysis.data.Data(directory=None, data_dict=None, csv=None, verbose=True, format=None)[source]
Bases:
objectData class
- Parameters:
- verbosebool
Print progress bar during analysis.
- dirstr
Path to the directory containing the log.txt file.
- formatstr
Format of the data.
- dfpandas.DataFrame
Dataframe containing the information extracted from the log.txt file.
- chainsdict
Dictionary containing the chains of each query.
- chain_lengthdict
Dictionary containing the length of each chain of each query.
Methods
read_directory(directory, keep_recycles=False)
Read a directory.
export_csv(path)
Export the dataframe to a csv file.
import_csv(path)
Import a csv file to the dataframe.
add_json()
Add json files to the dataframe.
extract_data()
Extract json/npz files to the dataframe.
add_pdb()
Add pdb files to the dataframe.
add_fasta(csv)
Add fasta sequence to the dataframe.
keep_last_recycle()
Keep only the last recycle for each query.
plot_maxscore_as_col(score, col, hue=’query’)
Plot the maxscore as a function of a column.
plot_pae(index, cmap=cm.vik)
Plot the PAE matrix.
plot_plddt(index_list)
Plot the pLDDT.
show_3d(index)
Show the 3D structure.
plot_msa(filter_qid=0.15, filter_cov=0.4)
Plot the msa from the a3m file.
show_plot_info()
Show the plot info.
- add_fasta(csv)[source]
Add fasta sequence to the dataframe.
- Parameters:
- csvstr
Path to the csv file containing the fasta sequence.
- Returns:
- None
- count_msa_seq()[source]
Count for each chain the number of sequences in the MSA.
- Parameters:
- None
- Returns:
- None
- ..Warning only tested with colabfold 1.5
- export_csv(path)[source]
Export the dataframe to a csv file.
- Parameters:
- pathstr
Path to the csv file.
- Returns:
- None
- extract_fields(fields, disable=False)[source]
Extract fields from data files to the dataframe.
- Parameters:
- fieldslist
List of fields to extract.
- disablebool
Disable the progress bar.
- Returns:
- None
- get_plddt(index)[source]
Extract the pLDDT array either from the pdb file or form the json/plddt files.
- Parameters:
- indexint
Index of the dataframe.
- Returns:
- np.array
pLDDT array.
- import_csv(path)[source]
Import a csv file to the dataframe.
- Parameters:
- pathstr
Path to the csv file.
- Returns:
- None
- plot_msa(filter_qid=0.15, filter_cov=0.4)[source]
Plot the msa from the a3m file.
- Parameters:
- filter_qidfloat
Minimal sequence identity to keep a sequence.
- filter_covfloat
Minimal coverage to keep a sequence.
- Returns:
- None
- ..Warning only tested with colabfold 1.5
- read_directory(directory, keep_recycles=False, verbose=True, format=None)[source]
Read a directory.
If the directory contains a log.txt file, the format is set to colabfold_1.5.
- Parameters:
- directorystr
Path to the directory containing the log.txt file.
- keep_recyclesbool
Keep only the last recycle for each query.
- verbosebool
Print information about the directory.
- Returns:
- None
af_analysis.plot module
- af_analysis.plot.plot_msa_v2(feature_dict, sort_lines=True, dpi=100)[source]
Taken from: https://github.com/sokrypton/ColabFold/blob/main/colabfold/plot.py
af_analysis.sequence module
- af_analysis.sequence.parse_a3m(a3m_lines=None, a3m_file=None, filter_qid=0.15, filter_cov=0.5, N=100000)[source]
Parses an A3M file or list of A3M lines and filters sequences based on sequence identity and coverage.
- Parameters:
- a3m_lines: list of str, optional
List of lines from an A3M file. Default is None.
- a3m_file: str, optional
Path to an A3M file. Default is None.
- filter_qid: float, optional
Minimum sequence identity threshold for filtering. Default is 0.15.
- filter_cov: float, optional
Minimum coverage threshold for filtering. Default is 0.5.
- N: int, optional
Maximum number of sequences to return. Default is 100000.
- Returns:
- tuple: A tuple containing:
seqs (list of str): List of filtered sequences.
mtx (list of list of int): List of deletion matrices corresponding to the sequences.
nams (list of str): List of sequence names.
af_analysis.docking module
- af_analysis.docking.LIS_pep(my_data, pae_cutoff=12.0, fun=<function max>)[source]
Compute the LIS score for the peptide-peptide interface.
- Parameters:
- my_dataAF2Data
object containing the data
- pae_cutofffloat
cutoff for native contacts, default is 12.0 A
- funfunction
function to apply to the LIS matrix
- Returns:
- None
The log_pd dataframe is modified in place.
- af_analysis.docking.cLIS_lig(my_data, pae_cutoff=12.0, dict_cutoff=8.0, fun=<function max>)[source]
Compute the cLIS score for the peptide-peptide interface.
- Parameters:
- my_dataAF2Data
object containing the data
- pae_cutofffloat
cutoff for native contacts, default is 12.0 A
- dist_cutofffloat
cutoff for distance contacts, default is 8.0 A
- funfunction
function to apply to the LIS matrix
- Returns:
- None
The log_pd dataframe is modified in place.
- af_analysis.docking.iLIS_lig(my_data, pae_cutoff=12.0, dict_cutoff=8.0, fun=<function max>)[source]
Compute the cLIS score for the peptide-peptide interface.
- Parameters:
- my_dataAF2Data
object containing the data
- pae_cutofffloat
cutoff for native contacts, default is 12.0 A
- dist_cutofffloat
cutoff for distance contacts, default is 8.0 A
- funfunction
function to apply to the LIS matrix
- Returns:
- None
The log_pd dataframe is modified in place.
- af_analysis.docking.ipSAE_lig(my_data, weight_avg=False)[source]
Compute the ipSAE score for the receptor-ligand interface.
- Parameters:
- my_dataAF2Data
object containing the data
- Returns:
- None
The my_data.df dataframe is modified in place.
- af_analysis.docking.ipTM_between_chains(my_data, chain_groups)[source]
Extract ipTM from pair_chain_iptm’s array between user-specified chain groups.
- dataAF2Data
object containing the data
- chains: list
list of length 2 for the chain groups in the form of concatenated chain ids, between which the ipTM is extracted
- af_analysis.docking.ipTM_d0_interface_lig(my_data, weight_avg=False)[source]
Compute the ipTM_d0 score for the receptor-ligand interface.
- Parameters:
- my_dataAF2Data
object containing the data
- weight_avgbool
whether to weight the ipTM_d0 by the receptor chain lengths
- Returns:
- None
The my_data.df dataframe is modified in place.
- af_analysis.docking.ipTM_d0_lig(my_data, weight_avg=False)[source]
Compute the ipTM_d0 score for the receptor-ligand interface.
- Parameters:
- my_dataAF2Data
object containing the data
- weight_avgbool
whether to weight the ipTM_d0 by the receptor chain lengths
- Returns:
- None
The my_data.df dataframe is modified in place.
- af_analysis.docking.pae_contact_pep(my_data, fun=<function mean>, cutoff=8.0, max_pae=30.98)[source]
Extract the PAE score for the receptor(s)-peptide interface.
- Parameters:
- my_dataAF2Data
object containing the data
- funfunction
function to apply to the PAE scores
- Returns:
- None
The log_pd dataframe is modified in place.
- af_analysis.docking.pae_pep(my_data, fun=<function mean>)[source]
Extract the PAE score for the receptor(s)-peptide interface.
- Parameters:
- my_dataAF2Data
object containing the data
- funfunction
function to apply to the PAE scores
- Returns:
- None
The log_pd dataframe is modified in place.
- af_analysis.docking.pdockq2_lig(my_data)[source]
Compute the LIS score for the receptor-ligand interface.
- Parameters:
- my_dataAF2Data
object containing the data
- pae_cutofffloat
cutoff for native contacts, default is 8.0 A
- funfunction
function to apply to the LIS matrix
- Returns:
- None
The log_pd dataframe is modified in place.
- af_analysis.docking.plddt_contact_pep(my_data, fun=<function mean>, cutoff=8.0)[source]
Extract the pLDDT score for the peptide-peptide interface.
- Parameters:
- my_dataAF2Data
object containing the data
- funfunction
function to apply to the pLDDT scores
- Returns:
- None
The log_pd dataframe is modified in place.
af_analysis.analysis module
- af_analysis.analysis.LIS_matrix(data, pae_cutoff=12.0)[source]
Compute the LIS score as define in [2].
Implementation was inspired from implementation in:
- Parameters:
- dataAFData
object containing the data
- pae_cutofffloat
cutoff for PAE matrix values, default is 12.0 A
- Returns:
- None
The dataframe is modified in place.
- af_analysis.analysis.PAE_matrix(data, fun=<function average>)[source]
Compute the average (or something else) PAE matrix.
- Parameters:
- dataAFData
object containing the data
- funfunction
function to apply to the PAE scores
- Returns:
- None
The dataframe is modified in place.
- af_analysis.analysis.cLIS_matrix(data, pae_cutoff=12.0, dist_cutoff=8.0)[source]
Compute the cLIS score from the PAE matrix and pdb file.
Implementation is based on the cLIS from the IPSAE package https://github.com/flyark/AFM-LIS
Cite: .. [R2bb242bfe6c0-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).
- Parameters:
- dataAFData
object containing the dipSAE(ata
- ref_dictdict
dictionary containing the reference PAE matrix for each query
- Returns:
- None
The dataframe is modified in place.
- af_analysis.analysis.chain_plddt(data)[source]
Compute for each chain the average plddt from the pdb file.
- Parameters:
- dataAFData
object containing the data
- Returns:
- None
The data.df dataframe is modified in place.
- af_analysis.analysis.compute_LIS_matrix(pae_array, chain_length, pae_cutoff=12.0)[source]
Compute the LIS score as define in [1].
Implementation was inspired from implementation in https://github.com/flyark/AFM-LIS
- Parameters:
- pae_arraynp.array
array of predicted PAE
- chain_lengthlist
list of chain lengths
- pae_cutofffloat
cutoff for native contacts, default is 8.0 A
- Returns:
- list
LIS scores
References
[1]Kim AR, Hu Y, Comjean A, Rodiger J, Mohr SE, Perrimon N. “Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer” bioRxiv (2024). https://www.biorxiv.org/content/10.1101/2024.02.19.580970v1
- af_analysis.analysis.compute_cLIS_matrix(pdb: str, pae_array: ndarray, chain_ids: list, chain_length: dict, pae_cutoff: float = 12.0, dist_cutoff: float = 8.0, sel: str = "name CB C3' or (resname GLY and name CA) or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)") ndarray[source]
Compute the cLIS score from the PAE matrix and pdb file.
- Parameters:
- pdbstr
path to the pdb file
- pae_arraynp.array
array of predicted PAE
- pae_cutofffloat
cutoff for PAE matrix values, default is 10.0 A
- dist_cutofffloat
cutoff for distance between atoms, default is 10.0 A
- chain_idslist
list of chain IDs
- chain_lengthlist
list of chain lengths
- selstr
selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_CB
- Returns:
- list
LIA score matrix
- af_analysis.analysis.compute_dockq(data, ref_dict, fun=<function average>, dockq_thresold=0.3)[source]
Compute the DockQ score from the PAE matrix.
- Parameters:
- dataAFData
object containing the data
- ref_dictdict
dictionary containing the reference PAE matrix for each query
- funfunction
function to apply to the PAE scores
- dockq_thresoldfloat
threshold with multiple chain to recompute DockQ score, default is 0.3
- Returns:
- None
The dataframe is modified in place.
- af_analysis.analysis.compute_ftdmp(my_data, ftdmp_path=None, out_path='tmp_ftdmp', score_list=['raw_scoring_results_without_ranks.txt'], env=None, keep_tmp=False)[source]
Compute ftdmp scores
- Parameters:
- ftdmp_pathstr
Path to the ftdmp output directory
- Returns:
- my_dataAFData
object containing the data
- af_analysis.analysis.compute_ipSAE_matrix(pae_array, pae_cutoff, chain_ids, chain_length, chain_type)[source]
Compute the ipSAE score from the PAE matrix.
- Parameters:
- pae_arraynp.array
array of predicted PAE
- pae_cutofffloat
cutoff for PAE matrix values, default is 10.0 A
- chain_idslist
list of chain IDs
- chain_lengthlist
list of chain lengths
- chain_typelist
list of chain types (e.g. “protein”, “nucleic_acid”)
- Returns:
- list
ipSAE score matrix
- af_analysis.analysis.compute_iptm_d0_interface_values(pdb, pae_array, chain_ids, chain_length, chain_type, sel="name CB C3' or (resname GLY and name CA) or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)")[source]
Compute the ipTM_d0 score from the PAE matrix.
- Parameters:
- pdbstr
path to the pdb file
- pae_arraynp.array
array of predicted PAE
- chain_idslist
list of chain IDs
- chain_lengthlist
list of chain lengths
- chain_typelist
list of chain types (e.g. “protein”, “nucleic_acid”)
- selstr
selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_CB
- Returns:
- list
ipTM_d0 score
- af_analysis.analysis.compute_iptm_d0_values(pae_array, chain_ids, chain_length, chain_type)[source]
Compute the ipTM_d0 score from the PAE matrix.
- Parameters:
- pae_arraynp.array
array of predicted PAE
- chain_idslist
list of chain IDs
- chain_lengthlist
list of chain lengths
- chain_typelist
list of chain types (e.g. “protein”, “nucleic_acid”)
- Returns:
- list
ipTM_d0 score
- af_analysis.analysis.compute_pdockQ(coor, rec_chains=None, lig_chains=None, cutoff=8.0, L=0.724, x0=152.611, k=0.052, b=0.018)[source]
- af_analysis.analysis.compute_pdockQ2(coor, pae_array, cutoff=8.0, L=1.31034849, x0=84.7326239, k=0.0747157696, b=0.00501886443, d0=10.0, sel='(resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and name CA) or (resname DA DC DG DT A T G C U and name P) or ions or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)')[source]
- af_analysis.analysis.extract_fields_file(data_file, fields)[source]
Get the PAE matrix from a json/pickle file.
- Parameters:
- filestr
Path to the json file.
- fieldslist
List of fields to extract.
- Returns:
- value
- af_analysis.analysis.extract_ftdmp(ftdmp_result_path, score_list=['raw_scoring_results_without_ranks.txt'])[source]
Read ftdmp output files
- Parameters:
- ftdmp_result_pathstr
Path to the ftdmp output directory
- Returns:
- my_dataAFData
object containing the data
- af_analysis.analysis.extract_pae_json(json_file)[source]
Get the PAE matrix from a json file.
- Parameters:
- json_filestr
Path to the json file.
- Returns:
- np.array
PAE matrix.
- af_analysis.analysis.extract_pae_npy(npy_file)[source]
Get the PAE matrix from a npy file.
- Parameters:
- npy_filestr
Path to the npy file.
- Returns:
- np.array
PAE matrix.
- af_analysis.analysis.extract_pae_npz(npz_file)[source]
Get the PAE matrix from a npz file.
- Parameters:
- npz_filestr
Path to the npz file.
- Returns:
- np.array
PAE matrix.
- af_analysis.analysis.extract_pae_pkl(pkl_file)[source]
Get the PAE matrix from a pkl file.
- Parameters:
- pkl_filestr
Path to the pkl file.
- Returns:
- np.array
PAE matrix.
- af_analysis.analysis.get_pae(data_file)[source]
Get the PAE matrix from a json/npz file.
- Parameters:
- data_filestr
Path to the json/npz file.
- Returns:
- np.array
PAE matrix.
- af_analysis.analysis.inter_chain_pae(data, fun=<function mean>)[source]
Read the PAE matrix and extract the average inter chain PAE.
- Parameters:
- dataAFData
object containing the data
- funfunction
function to apply to the PAE scores
- Returns:
- None
- af_analysis.analysis.ipSAE(data, pae_cutoff=10.0)[source]
Compute the ipSAE score from the PAE matrix.
Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py
Cite: .. [R2fcdeef135b2-1] Dunbrack RL Jr. Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it bioRxiv (2025).
- Parameters:
- dataAFData
object containing the dipSAE(ata
- ref_dictdict
dictionary containing the reference PAE matrix for each query
- Returns:
- None
The dataframe is modified in place.
- af_analysis.analysis.ipTM_d0(data)[source]
Compute the ipTM_d0 score from the PAE matrix.
Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py
Cite: .. [Rafe578d035f8-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).
- Parameters:
- dataAFData
object containing the data
- ref_dictdict
dictionary containing the reference PAE matrix for each query
- Returns:
- None
The dataframe is modified in place.
- af_analysis.analysis.ipTM_d0_interface(data)[source]
Compute the ipTM_d0 score from the PAE matrix.
Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py
Cite: .. [R0cb857a1874f-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).
- Parameters:
- dataAFData
object containing the data
- ref_dictdict
dictionary containing the reference PAE matrix for each query
- Returns:
- None
The dataframe is modified in place.
- af_analysis.analysis.iplddt(data, sel='(resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and name CB) or (resname GLY and name CA) or (resname DA DC DG DT A T G C U and name P) or ions or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)', cutoff=10.0)[source]
Compute the iplddt from the pdb file.
- Parameters:
- dataAFData
object containing the data
- selstr
selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_IPLDDT
- cutofffloat
distance cutoff to define interface residues, default is 10.0 A
- Implementation was inspired from https://github.com/piercelab/alphafold_v2.2_customize/blob/master/get_interface_plddt.pl
- If contact number is zero, the iplddt score is set to 0.
- Returns:
- None
The data.df dataframe is modified in place.
- af_analysis.analysis.mpdockq(data)[source]
Compute the mpDockq [2] from the pdb file.
\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]where:
\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]\(L = 0.728\), \(x0 = 309.375\), \(k = 0.098\) and \(b = 0.262\).
Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py
- Parameters:
- dataAFData
object containing the data
- Returns:
- None
The log_pd dataframe is modified in place.
References
[2]Bryant P, Pozzati G, Zhu W, Shenoy A, Kundrotas P & Elofsson A. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nature Communications. vol. 13, 6028 (2022) https://www.nature.com/articles/s41467-022-33729-4
- af_analysis.analysis.pdockq(data)[source]
Compute the pDockq [1] from the pdb file.
\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]where:
\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.
Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py
- Parameters:
- dataAFData
object containing the data
- Returns:
- None
The log_pd dataframe is modified in place.
References
[1]Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w