af_analysis package

Submodules

af_analysis.data module

class af_analysis.data.Data(directory=None, data_dict=None, csv=None, verbose=True, format=None)[source]

Bases: object

Data class

Parameters:
verbosebool

Print progress bar during analysis.

dirstr

Path to the directory containing the log.txt file.

formatstr

Format of the data.

dfpandas.DataFrame

Dataframe containing the information extracted from the log.txt file.

chainsdict

Dictionary containing the chains of each query.

chain_lengthdict

Dictionary containing the length of each chain of each query.

Methods

read_directory(directory, keep_recycles=False)

Read a directory.

export_csv(path)

Export the dataframe to a csv file.

import_csv(path)

Import a csv file to the dataframe.

add_json()

Add json files to the dataframe.

extract_data()

Extract json/npz files to the dataframe.

add_pdb()

Add pdb files to the dataframe.

add_fasta(csv)

Add fasta sequence to the dataframe.

keep_last_recycle()

Keep only the last recycle for each query.

plot_maxscore_as_col(score, col, hue=’query’)

Plot the maxscore as a function of a column.

plot_pae(index, cmap=cm.vik)

Plot the PAE matrix.

plot_plddt(index_list)

Plot the pLDDT.

show_3d(index)

Show the 3D structure.

plot_msa(filter_qid=0.15, filter_cov=0.4)

Plot the msa from the a3m file.

show_plot_info()

Show the plot info.

add_fasta(csv)[source]

Add fasta sequence to the dataframe.

Parameters:
csvstr

Path to the csv file containing the fasta sequence.

Returns:
None
add_json(verbose=True)[source]

Add json files to the dataframe.

Parameters:
None
Returns:
None
add_pdb(verbose=True)[source]

Add pdb files to the dataframe.

Parameters:
None
Returns:
None
count_msa_seq()[source]

Count for each chain the number of sequences in the MSA.

Parameters:
None
Returns:
None
..Warning only tested with colabfold 1.5
export_csv(path)[source]

Export the dataframe to a csv file.

Parameters:
pathstr

Path to the csv file.

Returns:
None
extract_data()[source]

Extract json/npz files to the dataframe.

Parameters:
None
Returns:
None
extract_fields(fields, disable=False)[source]

Extract fields from data files to the dataframe.

Parameters:
fieldslist

List of fields to extract.

disablebool

Disable the progress bar.

Returns:
None
get_plddt(index)[source]

Extract the pLDDT array either from the pdb file or form the json/plddt files.

Parameters:
indexint

Index of the dataframe.

Returns:
np.array

pLDDT array.

import_csv(path)[source]

Import a csv file to the dataframe.

Parameters:
pathstr

Path to the csv file.

Returns:
None
keep_last_recycle()[source]

Keep only the last recycle for each query.

plot_maxscore_as_col(score, col, hue='query')[source]
plot_msa(filter_qid=0.15, filter_cov=0.4)[source]

Plot the msa from the a3m file.

Parameters:
filter_qidfloat

Minimal sequence identity to keep a sequence.

filter_covfloat

Minimal coverage to keep a sequence.

Returns:
None
..Warning only tested with colabfold 1.5
plot_pae(index, cmap=<matplotlib.colors.ListedColormap object>)[source]
plot_plddt(index_list=None)[source]
read_directory(directory, keep_recycles=False, verbose=True, format=None)[source]

Read a directory.

If the directory contains a log.txt file, the format is set to colabfold_1.5.

Parameters:
directorystr

Path to the directory containing the log.txt file.

keep_recyclesbool

Keep only the last recycle for each query.

verbosebool

Print information about the directory.

Returns:
None
set_chain_length()[source]

Find chain information from the dataframe.

Parameters:
None
Returns:
None
show_3d(index)[source]
show_plot_info(cmap=<matplotlib.colors.ListedColormap object>)[source]

Need to solve the issue with:

` %matplotlib ipympl `

plots don´t update when changing the model number.

af_analysis.data.concat_data(data_list)[source]

Concatenate data from a list of Data objects.

Parameters:
data_listlist

List of Data objects.

Returns:
Data

Concatenated Data object.

af_analysis.data.read_multiple_alphapulldown(directory)[source]

Read multiple directories containing AlphaPulldown data.

Parameters:
directorystr

Path to the directory containing the directories.

Returns:
Data

Concatenated Data object.

af_analysis.plot module

af_analysis.plot.plot_msa_v2(feature_dict, sort_lines=True, dpi=100)[source]

Taken from: https://github.com/sokrypton/ColabFold/blob/main/colabfold/plot.py

af_analysis.plot.show_info(data_af, cmap=<matplotlib.colors.ListedColormap object>, score_list=['pLDDT', 'pTM', 'ipTM', 'ranking_confidence'])[source]

Use with ` %matplotlib widget `

af_analysis.sequence module

af_analysis.sequence.convert_aa_msa(seqs)[source]

Convert amino acid sequences to numbers.

af_analysis.sequence.parse_a3m(a3m_lines=None, a3m_file=None, filter_qid=0.15, filter_cov=0.5, N=100000)[source]

Parses an A3M file or list of A3M lines and filters sequences based on sequence identity and coverage.

Parameters:
a3m_lines: list of str, optional

List of lines from an A3M file. Default is None.

a3m_file: str, optional

Path to an A3M file. Default is None.

filter_qid: float, optional

Minimum sequence identity threshold for filtering. Default is 0.15.

filter_cov: float, optional

Minimum coverage threshold for filtering. Default is 0.5.

N: int, optional

Maximum number of sequences to return. Default is 100000.

Returns:
tuple: A tuple containing:
  • seqs (list of str): List of filtered sequences.

  • mtx (list of list of int): List of deletion matrices corresponding to the sequences.

  • nams (list of str): List of sequence names.

af_analysis.docking module

af_analysis.docking.LIS_pep(my_data, pae_cutoff=12.0, fun=<function max>)[source]

Compute the LIS score for the peptide-peptide interface.

Parameters:
my_dataAF2Data

object containing the data

pae_cutofffloat

cutoff for native contacts, default is 12.0 A

funfunction

function to apply to the LIS matrix

Returns:
None

The log_pd dataframe is modified in place.

af_analysis.docking.cLIS_lig(my_data, pae_cutoff=12.0, dict_cutoff=8.0, fun=<function max>)[source]

Compute the cLIS score for the peptide-peptide interface.

Parameters:
my_dataAF2Data

object containing the data

pae_cutofffloat

cutoff for native contacts, default is 12.0 A

dist_cutofffloat

cutoff for distance contacts, default is 8.0 A

funfunction

function to apply to the LIS matrix

Returns:
None

The log_pd dataframe is modified in place.

af_analysis.docking.iLIS_lig(my_data, pae_cutoff=12.0, dict_cutoff=8.0, fun=<function max>)[source]

Compute the cLIS score for the peptide-peptide interface.

Parameters:
my_dataAF2Data

object containing the data

pae_cutofffloat

cutoff for native contacts, default is 12.0 A

dist_cutofffloat

cutoff for distance contacts, default is 8.0 A

funfunction

function to apply to the LIS matrix

Returns:
None

The log_pd dataframe is modified in place.

af_analysis.docking.ipSAE_lig(my_data, weight_avg=False)[source]

Compute the ipSAE score for the receptor-ligand interface.

Parameters:
my_dataAF2Data

object containing the data

Returns:
None

The my_data.df dataframe is modified in place.

af_analysis.docking.ipTM_between_chains(my_data, chain_groups)[source]

Extract ipTM from pair_chain_iptm’s array between user-specified chain groups.

dataAF2Data

object containing the data

chains: list

list of length 2 for the chain groups in the form of concatenated chain ids, between which the ipTM is extracted

af_analysis.docking.ipTM_d0_interface_lig(my_data, weight_avg=False)[source]

Compute the ipTM_d0 score for the receptor-ligand interface.

Parameters:
my_dataAF2Data

object containing the data

weight_avgbool

whether to weight the ipTM_d0 by the receptor chain lengths

Returns:
None

The my_data.df dataframe is modified in place.

af_analysis.docking.ipTM_d0_lig(my_data, weight_avg=False)[source]

Compute the ipTM_d0 score for the receptor-ligand interface.

Parameters:
my_dataAF2Data

object containing the data

weight_avgbool

whether to weight the ipTM_d0 by the receptor chain lengths

Returns:
None

The my_data.df dataframe is modified in place.

af_analysis.docking.pae_contact_pep(my_data, fun=<function mean>, cutoff=8.0, max_pae=30.98)[source]

Extract the PAE score for the receptor(s)-peptide interface.

Parameters:
my_dataAF2Data

object containing the data

funfunction

function to apply to the PAE scores

Returns:
None

The log_pd dataframe is modified in place.

af_analysis.docking.pae_pep(my_data, fun=<function mean>)[source]

Extract the PAE score for the receptor(s)-peptide interface.

Parameters:
my_dataAF2Data

object containing the data

funfunction

function to apply to the PAE scores

Returns:
None

The log_pd dataframe is modified in place.

af_analysis.docking.pdockq2_lig(my_data)[source]

Compute the LIS score for the receptor-ligand interface.

Parameters:
my_dataAF2Data

object containing the data

pae_cutofffloat

cutoff for native contacts, default is 8.0 A

funfunction

function to apply to the LIS matrix

Returns:
None

The log_pd dataframe is modified in place.

af_analysis.docking.plddt_contact_pep(my_data, fun=<function mean>, cutoff=8.0)[source]

Extract the pLDDT score for the peptide-peptide interface.

Parameters:
my_dataAF2Data

object containing the data

funfunction

function to apply to the pLDDT scores

Returns:
None

The log_pd dataframe is modified in place.

af_analysis.docking.plddt_pep(my_data, fun=<function mean>)[source]

Extract the pLDDT score for the peptide-peptide interface.

Parameters:
my_dataAF2Data

object containing the data

funfunction

function to apply to the pLDDT scores

Returns:
None

The log_pd dataframe is modified in place.

af_analysis.analysis module

af_analysis.analysis.LIS_matrix(data, pae_cutoff=12.0)[source]

Compute the LIS score as define in [2].

Implementation was inspired from implementation in:

Parameters:
dataAFData

object containing the data

pae_cutofffloat

cutoff for PAE matrix values, default is 12.0 A

Returns:
None

The dataframe is modified in place.

af_analysis.analysis.PAE_matrix(data, fun=<function average>)[source]

Compute the average (or something else) PAE matrix.

Parameters:
dataAFData

object containing the data

funfunction

function to apply to the PAE scores

Returns:
None

The dataframe is modified in place.

af_analysis.analysis.cLIS_matrix(data, pae_cutoff=12.0, dist_cutoff=8.0)[source]

Compute the cLIS score from the PAE matrix and pdb file.

Implementation is based on the cLIS from the IPSAE package https://github.com/flyark/AFM-LIS

Cite: .. [R2bb242bfe6c0-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).

Parameters:
dataAFData

object containing the dipSAE(ata

ref_dictdict

dictionary containing the reference PAE matrix for each query

Returns:
None

The dataframe is modified in place.

af_analysis.analysis.chain_plddt(data)[source]

Compute for each chain the average plddt from the pdb file.

Parameters:
dataAFData

object containing the data

Returns:
None

The data.df dataframe is modified in place.

af_analysis.analysis.compute_LIS_matrix(pae_array, chain_length, pae_cutoff=12.0)[source]

Compute the LIS score as define in [1].

Implementation was inspired from implementation in https://github.com/flyark/AFM-LIS

Parameters:
pae_arraynp.array

array of predicted PAE

chain_lengthlist

list of chain lengths

pae_cutofffloat

cutoff for native contacts, default is 8.0 A

Returns:
list

LIS scores

References

[1]

Kim AR, Hu Y, Comjean A, Rodiger J, Mohr SE, Perrimon N. “Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer” bioRxiv (2024). https://www.biorxiv.org/content/10.1101/2024.02.19.580970v1

af_analysis.analysis.compute_cLIS_matrix(pdb: str, pae_array: ndarray, chain_ids: list, chain_length: dict, pae_cutoff: float = 12.0, dist_cutoff: float = 8.0, sel: str = "name CB C3' or (resname GLY and name CA) or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)") ndarray[source]

Compute the cLIS score from the PAE matrix and pdb file.

Parameters:
pdbstr

path to the pdb file

pae_arraynp.array

array of predicted PAE

pae_cutofffloat

cutoff for PAE matrix values, default is 10.0 A

dist_cutofffloat

cutoff for distance between atoms, default is 10.0 A

chain_idslist

list of chain IDs

chain_lengthlist

list of chain lengths

selstr

selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_CB

Returns:
list

LIA score matrix

af_analysis.analysis.compute_dockq(data, ref_dict, fun=<function average>, dockq_thresold=0.3)[source]

Compute the DockQ score from the PAE matrix.

Parameters:
dataAFData

object containing the data

ref_dictdict

dictionary containing the reference PAE matrix for each query

funfunction

function to apply to the PAE scores

dockq_thresoldfloat

threshold with multiple chain to recompute DockQ score, default is 0.3

Returns:
None

The dataframe is modified in place.

af_analysis.analysis.compute_ftdmp(my_data, ftdmp_path=None, out_path='tmp_ftdmp', score_list=['raw_scoring_results_without_ranks.txt'], env=None, keep_tmp=False)[source]

Compute ftdmp scores

Parameters:
ftdmp_pathstr

Path to the ftdmp output directory

Returns:
my_dataAFData

object containing the data

af_analysis.analysis.compute_ipSAE_matrix(pae_array, pae_cutoff, chain_ids, chain_length, chain_type)[source]

Compute the ipSAE score from the PAE matrix.

Parameters:
pae_arraynp.array

array of predicted PAE

pae_cutofffloat

cutoff for PAE matrix values, default is 10.0 A

chain_idslist

list of chain IDs

chain_lengthlist

list of chain lengths

chain_typelist

list of chain types (e.g. “protein”, “nucleic_acid”)

Returns:
list

ipSAE score matrix

af_analysis.analysis.compute_iptm_d0_interface_values(pdb, pae_array, chain_ids, chain_length, chain_type, sel="name CB C3' or (resname GLY and name CA) or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)")[source]

Compute the ipTM_d0 score from the PAE matrix.

Parameters:
pdbstr

path to the pdb file

pae_arraynp.array

array of predicted PAE

chain_idslist

list of chain IDs

chain_lengthlist

list of chain lengths

chain_typelist

list of chain types (e.g. “protein”, “nucleic_acid”)

selstr

selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_CB

Returns:
list

ipTM_d0 score

af_analysis.analysis.compute_iptm_d0_values(pae_array, chain_ids, chain_length, chain_type)[source]

Compute the ipTM_d0 score from the PAE matrix.

Parameters:
pae_arraynp.array

array of predicted PAE

chain_idslist

list of chain IDs

chain_lengthlist

list of chain lengths

chain_typelist

list of chain types (e.g. “protein”, “nucleic_acid”)

Returns:
list

ipTM_d0 score

af_analysis.analysis.compute_pdockQ(coor, rec_chains=None, lig_chains=None, cutoff=8.0, L=0.724, x0=152.611, k=0.052, b=0.018)[source]
af_analysis.analysis.compute_pdockQ2(coor, pae_array, cutoff=8.0, L=1.31034849, x0=84.7326239, k=0.0747157696, b=0.00501886443, d0=10.0, sel='(resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and name CA) or (resname DA DC DG DT A T G C U and name P) or ions or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)')[source]
af_analysis.analysis.extract_fields_file(data_file, fields)[source]

Get the PAE matrix from a json/pickle file.

Parameters:
filestr

Path to the json file.

fieldslist

List of fields to extract.

Returns:
value
af_analysis.analysis.extract_ftdmp(ftdmp_result_path, score_list=['raw_scoring_results_without_ranks.txt'])[source]

Read ftdmp output files

Parameters:
ftdmp_result_pathstr

Path to the ftdmp output directory

Returns:
my_dataAFData

object containing the data

af_analysis.analysis.extract_pae_json(json_file)[source]

Get the PAE matrix from a json file.

Parameters:
json_filestr

Path to the json file.

Returns:
np.array

PAE matrix.

af_analysis.analysis.extract_pae_npy(npy_file)[source]

Get the PAE matrix from a npy file.

Parameters:
npy_filestr

Path to the npy file.

Returns:
np.array

PAE matrix.

af_analysis.analysis.extract_pae_npz(npz_file)[source]

Get the PAE matrix from a npz file.

Parameters:
npz_filestr

Path to the npz file.

Returns:
np.array

PAE matrix.

af_analysis.analysis.extract_pae_pkl(pkl_file)[source]

Get the PAE matrix from a pkl file.

Parameters:
pkl_filestr

Path to the pkl file.

Returns:
np.array

PAE matrix.

af_analysis.analysis.get_pae(data_file)[source]

Get the PAE matrix from a json/npz file.

Parameters:
data_filestr

Path to the json/npz file.

Returns:
np.array

PAE matrix.

af_analysis.analysis.inter_chain_pae(data, fun=<function mean>)[source]

Read the PAE matrix and extract the average inter chain PAE.

Parameters:
dataAFData

object containing the data

funfunction

function to apply to the PAE scores

Returns:
None
af_analysis.analysis.ipSAE(data, pae_cutoff=10.0)[source]

Compute the ipSAE score from the PAE matrix.

Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py

Cite: .. [R2fcdeef135b2-1] Dunbrack RL Jr. Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it bioRxiv (2025).

Parameters:
dataAFData

object containing the dipSAE(ata

ref_dictdict

dictionary containing the reference PAE matrix for each query

Returns:
None

The dataframe is modified in place.

af_analysis.analysis.ipTM_d0(data)[source]

Compute the ipTM_d0 score from the PAE matrix.

Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py

Cite: .. [Rafe578d035f8-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).

Parameters:
dataAFData

object containing the data

ref_dictdict

dictionary containing the reference PAE matrix for each query

Returns:
None

The dataframe is modified in place.

af_analysis.analysis.ipTM_d0_interface(data)[source]

Compute the ipTM_d0 score from the PAE matrix.

Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py

Cite: .. [R0cb857a1874f-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).

Parameters:
dataAFData

object containing the data

ref_dictdict

dictionary containing the reference PAE matrix for each query

Returns:
None

The dataframe is modified in place.

af_analysis.analysis.iplddt(data, sel='(resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and name CB) or (resname GLY and name CA) or (resname DA DC DG DT A T G C U and name P)  or ions or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)', cutoff=10.0)[source]

Compute the iplddt from the pdb file.

Parameters:
dataAFData

object containing the data

selstr

selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_IPLDDT

cutofffloat

distance cutoff to define interface residues, default is 10.0 A

Implementation was inspired from https://github.com/piercelab/alphafold_v2.2_customize/blob/master/get_interface_plddt.pl
If contact number is zero, the iplddt score is set to 0.
Returns:
None

The data.df dataframe is modified in place.

af_analysis.analysis.mpdockq(data)[source]

Compute the mpDockq [2] from the pdb file.

\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]

where:

\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]

\(L = 0.728\), \(x0 = 309.375\), \(k = 0.098\) and \(b = 0.262\).

Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py

Parameters:
dataAFData

object containing the data

Returns:
None

The log_pd dataframe is modified in place.

References

[2]

Bryant P, Pozzati G, Zhu W, Shenoy A, Kundrotas P & Elofsson A. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nature Communications. vol. 13, 6028 (2022) https://www.nature.com/articles/s41467-022-33729-4

af_analysis.analysis.pdockq(data)[source]

Compute the pDockq [1] from the pdb file.

\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]

where:

\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]

\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.

Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py

Parameters:
dataAFData

object containing the data

Returns:
None

The log_pd dataframe is modified in place.

References

[1]

Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w

af_analysis.analysis.pdockq2(data)[source]

Compute pdockq2 from the pdb file [3].

\[pDockQ_2 = \frac{L}{1 + exp [-k*(X_i-X_0)]} + b\]

with

\[X_i = \langle \frac{1}{1+(\frac{PAE_{int}}{d_0})^2} \rangle * \langle pLDDT \rangle_{int}\]

References:

af_analysis.analysis.read_ftdmp_raw_score(raw_path)[source]

Read raw ftdmp score files

Parameters:
raw_pathstr

Path to the raw score file

Returns:
raw_scorepandas.DataFrame

Dataframe containing the raw score data

af_analysis.format module