af_analysis package

Submodules

af_analysis.data module

class af_analysis.data.Data(directory=None, data_dict=None, csv=None, verbose=True, format=None)[source]

Bases: object

Data class

Parameters:

verbosebool: Print progress bar during analysis.
dirstr: Path to the directory containing the log.txt file.
formatstr: Format of the data.
dfpandas.DataFrame: Dataframe containing the information extracted from the log.txt file.
chainsdict: Dictionary containing the chains of each query.
chain_lengthdict: Dictionary containing the length of each chain of each query.

Methods

read_directory(directory, keep_recycles=False)	Read a directory.
export_csv(path)	Export the dataframe to a csv file.
import_csv(path)	Import a csv file to the dataframe.
add_json()	Add json files to the dataframe.
extract_data()	Extract json/npz files to the dataframe.
add_pdb()	Add pdb files to the dataframe.
add_fasta(csv)	Add fasta sequence to the dataframe.
keep_last_recycle()	Keep only the last recycle for each query.
plot_maxscore_as_col(score, col, hue=’query’)	Plot the maxscore as a function of a column.
plot_pae(index, cmap=cm.vik)	Plot the PAE matrix.
plot_plddt(index_list)	Plot the pLDDT.
show_3d(index)	Show the 3D structure.
plot_msa(filter_qid=0.15, filter_cov=0.4)	Plot the msa from the a3m file.
show_plot_info()	Show the plot info.

add_fasta(csv)[source]

Add fasta sequence to the dataframe.

Parameters:

csvstr: Path to the csv file containing the fasta sequence.

Returns:

None

add_json(verbose=True)[source]

Add json files to the dataframe.

Parameters:

None

Returns:

None

add_pdb(verbose=True)[source]

Add pdb files to the dataframe.

Parameters:

None

Returns:

None

count_msa_seq()[source]

Count for each chain the number of sequences in the MSA.

Parameters:

None

Returns:

None
..Warning only tested with colabfold 1.5

export_csv(path)[source]

Export the dataframe to a csv file.

Parameters:

pathstr: Path to the csv file.

Returns:

None

extract_data()[source]

Extract json/npz files to the dataframe.

Parameters:

None

Returns:

None

extract_fields(fields, disable=False)[source]

Extract fields from data files to the dataframe.

Parameters:

fieldslist: List of fields to extract.
disablebool: Disable the progress bar.

Returns:

None

get_plddt(index)[source]

Extract the pLDDT array either from the pdb file or form the json/plddt files.

Parameters:

indexint: Index of the dataframe.

Returns:

np.array: pLDDT array.

import_csv(path)[source]

Import a csv file to the dataframe.

Parameters:

pathstr: Path to the csv file.

Returns:

None

keep_last_recycle()[source]: Keep only the last recycle for each query.

plot_maxscore_as_col(score, col, hue='query')[source]

plot_msa(filter_qid=0.15, filter_cov=0.4)[source]

Plot the msa from the a3m file.

Parameters:

filter_qidfloat: Minimal sequence identity to keep a sequence.
filter_covfloat: Minimal coverage to keep a sequence.

Returns:

None
..Warning only tested with colabfold 1.5

plot_pae(index, cmap=<matplotlib.colors.ListedColormap object>)[source]

plot_plddt(index_list=None)[source]

read_directory(directory, keep_recycles=False, verbose=True, format=None)[source]

Read a directory.

If the directory contains a log.txt file, the format is set to colabfold_1.5.

Parameters:

directorystr: Path to the directory containing the log.txt file.
keep_recyclesbool: Keep only the last recycle for each query.
verbosebool: Print information about the directory.

Returns:

None

set_chain_length()[source]

Find chain information from the dataframe.

Parameters:

None

Returns:

None

show_3d(index)[source]

show_plot_info(cmap=<matplotlib.colors.ListedColormap object>)[source]

Need to solve the issue with:

` %matplotlib ipympl `

plots don´t update when changing the model number.

af_analysis.data.concat_data(data_list)[source]

Concatenate data from a list of Data objects.

Parameters:

data_listlist: List of Data objects.

Returns:

Data: Concatenated Data object.

af_analysis.data.read_multiple_alphapulldown(directory)[source]

Read multiple directories containing AlphaPulldown data.

Parameters:

directorystr: Path to the directory containing the directories.

Returns:

Data: Concatenated Data object.

af_analysis.plot module

af_analysis.plot.plot_msa_v2(feature_dict, sort_lines=True, dpi=100)[source]: Taken from: https://github.com/sokrypton/ColabFold/blob/main/colabfold/plot.py

af_analysis.plot.show_info(data_af, cmap=<matplotlib.colors.ListedColormap object>, score_list=['pLDDT', 'pTM', 'ipTM', 'ranking_confidence'])[source]: Use with ` %matplotlib widget `

af_analysis.sequence module

af_analysis.sequence.convert_aa_msa(seqs)[source]: Convert amino acid sequences to numbers.

af_analysis.sequence.parse_a3m(a3m_lines=None, a3m_file=None, filter_qid=0.15, filter_cov=0.5, N=100000)[source]

Parses an A3M file or list of A3M lines and filters sequences based on sequence identity and coverage.

Parameters:

a3m_lines: list of str, optional: List of lines from an A3M file. Default is None.
a3m_file: str, optional: Path to an A3M file. Default is None.
filter_qid: float, optional: Minimum sequence identity threshold for filtering. Default is 0.15.
filter_cov: float, optional: Minimum coverage threshold for filtering. Default is 0.5.
N: int, optional: Maximum number of sequences to return. Default is 100000.

Returns:

tuple: A tuple containing:

seqs (list of str): List of filtered sequences.
mtx (list of list of int): List of deletion matrices corresponding to the sequences.
nams (list of str): List of sequence names.

af_analysis.docking module

af_analysis.docking.LIS_pep(my_data, pae_cutoff=12.0, fun=<function max>)[source]

Compute the LIS score for the peptide-peptide interface.

Parameters:

my_dataAF2Data: object containing the data
pae_cutofffloat: cutoff for native contacts, default is 12.0 A
funfunction: function to apply to the LIS matrix

Returns:

None: The log_pd dataframe is modified in place.

af_analysis.docking.cLIS_lig(my_data, pae_cutoff=12.0, dict_cutoff=8.0, fun=<function max>)[source]

Compute the cLIS score for the peptide-peptide interface.

Parameters:

my_dataAF2Data: object containing the data
pae_cutofffloat: cutoff for native contacts, default is 12.0 A
dist_cutofffloat: cutoff for distance contacts, default is 8.0 A
funfunction: function to apply to the LIS matrix

Returns:

None: The log_pd dataframe is modified in place.

af_analysis.docking.iLIS_lig(my_data, pae_cutoff=12.0, dict_cutoff=8.0, fun=<function max>)[source]

Compute the cLIS score for the peptide-peptide interface.

Parameters:

my_dataAF2Data: object containing the data
pae_cutofffloat: cutoff for native contacts, default is 12.0 A
dist_cutofffloat: cutoff for distance contacts, default is 8.0 A
funfunction: function to apply to the LIS matrix

Returns:

None: The log_pd dataframe is modified in place.

af_analysis.docking.ipSAE_lig(my_data, weight_avg=False)[source]

Compute the ipSAE score for the receptor-ligand interface.

Parameters:

my_dataAF2Data: object containing the data

Returns:

None: The my_data.df dataframe is modified in place.

af_analysis.docking.ipTM_between_chains(my_data, chain_groups)[source]

Extract ipTM from pair_chain_iptm’s array between user-specified chain groups.

dataAF2Data: object containing the data
chains: list: list of length 2 for the chain groups in the form of concatenated chain ids, between which the ipTM is extracted

af_analysis.docking.ipTM_d0_interface_lig(my_data, weight_avg=False)[source]

Compute the ipTM_d0 score for the receptor-ligand interface.

Parameters:

my_dataAF2Data: object containing the data
weight_avgbool: whether to weight the ipTM_d0 by the receptor chain lengths

Returns:

None: The my_data.df dataframe is modified in place.

af_analysis.docking.ipTM_d0_lig(my_data, weight_avg=False)[source]

Compute the ipTM_d0 score for the receptor-ligand interface.

Parameters:

my_dataAF2Data: object containing the data
weight_avgbool: whether to weight the ipTM_d0 by the receptor chain lengths

Returns:

None: The my_data.df dataframe is modified in place.

af_analysis.docking.pae_contact_pep(my_data, fun=<function mean>, cutoff=8.0, max_pae=30.98)[source]

Extract the PAE score for the receptor(s)-peptide interface.

Parameters:

my_dataAF2Data: object containing the data
funfunction: function to apply to the PAE scores

Returns:

None: The log_pd dataframe is modified in place.

af_analysis.docking.pae_pep(my_data, fun=<function mean>)[source]

Extract the PAE score for the receptor(s)-peptide interface.

Parameters:

my_dataAF2Data: object containing the data
funfunction: function to apply to the PAE scores

Returns:

None: The log_pd dataframe is modified in place.

af_analysis.docking.pdockq2_lig(my_data)[source]

Compute the LIS score for the receptor-ligand interface.

Parameters:

my_dataAF2Data: object containing the data
pae_cutofffloat: cutoff for native contacts, default is 8.0 A
funfunction: function to apply to the LIS matrix

Returns:

None: The log_pd dataframe is modified in place.

af_analysis.docking.plddt_contact_pep(my_data, fun=<function mean>, cutoff=8.0)[source]

Extract the pLDDT score for the peptide-peptide interface.

Parameters:

my_dataAF2Data: object containing the data
funfunction: function to apply to the pLDDT scores

Returns:

None: The log_pd dataframe is modified in place.

af_analysis.docking.plddt_pep(my_data, fun=<function mean>)[source]

Extract the pLDDT score for the peptide-peptide interface.

Parameters:

my_dataAF2Data: object containing the data
funfunction: function to apply to the pLDDT scores

Returns:

None: The log_pd dataframe is modified in place.

af_analysis.analysis module

af_analysis.analysis.LIS_matrix(data, pae_cutoff=12.0)[source]

Compute the LIS score as define in [2].

Implementation was inspired from implementation in:

[2]

https://github.com/flyark/AFM-LIS

Parameters:

dataAFData: object containing the data
pae_cutofffloat: cutoff for PAE matrix values, default is 12.0 A

Returns:

None: The dataframe is modified in place.

af_analysis.analysis.PAE_matrix(data, fun=<function average>)[source]

Compute the average (or something else) PAE matrix.

Parameters:

dataAFData: object containing the data
funfunction: function to apply to the PAE scores

Returns:

None: The dataframe is modified in place.

af_analysis.analysis.cLIS_matrix(data, pae_cutoff=12.0, dist_cutoff=8.0)[source]

Compute the cLIS score from the PAE matrix and pdb file.

Implementation is based on the cLIS from the IPSAE package https://github.com/flyark/AFM-LIS

Cite: .. [R2bb242bfe6c0-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).

Parameters:

dataAFData: object containing the dipSAE(ata
ref_dictdict: dictionary containing the reference PAE matrix for each query

Returns:

None: The dataframe is modified in place.

af_analysis.analysis.chain_plddt(data)[source]

Compute for each chain the average plddt from the pdb file.

Parameters:

dataAFData: object containing the data

Returns:

None: The data.df dataframe is modified in place.

af_analysis.analysis.compute_LIS_matrix(pae_array, chain_length, pae_cutoff=12.0)[source]

Compute the LIS score as define in [1].

Implementation was inspired from implementation in https://github.com/flyark/AFM-LIS

Parameters:

pae_arraynp.array: array of predicted PAE
chain_lengthlist: list of chain lengths
pae_cutofffloat: cutoff for native contacts, default is 8.0 A

Returns:

list: LIS scores

References

[1]

Kim AR, Hu Y, Comjean A, Rodiger J, Mohr SE, Perrimon N. “Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer” bioRxiv (2024). https://www.biorxiv.org/content/10.1101/2024.02.19.580970v1

af_analysis.analysis.compute_cLIS_matrix(pdb: str, pae_array: ndarray, chain_ids: list, chain_length: dict, pae_cutoff: float = 12.0, dist_cutoff: float = 8.0, sel: str = "name CB C3' or (resname GLY and name CA) or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)") → ndarray[source]

Compute the cLIS score from the PAE matrix and pdb file.

Parameters:

pdbstr: path to the pdb file
pae_arraynp.array: array of predicted PAE
pae_cutofffloat: cutoff for PAE matrix values, default is 10.0 A
dist_cutofffloat: cutoff for distance between atoms, default is 10.0 A
chain_idslist: list of chain IDs
chain_lengthlist: list of chain lengths
selstr: selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_CB

Returns:

list: LIA score matrix

af_analysis.analysis.compute_dockq(data, ref_dict, fun=<function average>, dockq_thresold=0.3)[source]

Compute the DockQ score from the PAE matrix.

Parameters:

dataAFData: object containing the data
ref_dictdict: dictionary containing the reference PAE matrix for each query
funfunction: function to apply to the PAE scores
dockq_thresoldfloat: threshold with multiple chain to recompute DockQ score, default is 0.3

Returns:

None: The dataframe is modified in place.

af_analysis.analysis.compute_ftdmp(my_data, ftdmp_path=None, out_path='tmp_ftdmp', score_list=['raw_scoring_results_without_ranks.txt'], env=None, keep_tmp=False)[source]

Compute ftdmp scores

Parameters:

ftdmp_pathstr: Path to the ftdmp output directory

Returns:

my_dataAFData: object containing the data

af_analysis.analysis.compute_ipSAE_matrix(pae_array, pae_cutoff, chain_ids, chain_length, chain_type)[source]

Compute the ipSAE score from the PAE matrix.

Parameters:

pae_arraynp.array: array of predicted PAE
pae_cutofffloat: cutoff for PAE matrix values, default is 10.0 A
chain_idslist: list of chain IDs
chain_lengthlist: list of chain lengths
chain_typelist: list of chain types (e.g. “protein”, “nucleic_acid”)

Returns:

list: ipSAE score matrix

af_analysis.analysis.compute_iptm_d0_interface_values(pdb, pae_array, chain_ids, chain_length, chain_type, sel="name CB C3' or (resname GLY and name CA) or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)")[source]

Compute the ipTM_d0 score from the PAE matrix.

Parameters:

pdbstr: path to the pdb file
pae_arraynp.array: array of predicted PAE
chain_idslist: list of chain IDs
chain_lengthlist: list of chain lengths
chain_typelist: list of chain types (e.g. “protein”, “nucleic_acid”)
selstr: selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_CB

Returns:

list: ipTM_d0 score

af_analysis.analysis.compute_iptm_d0_values(pae_array, chain_ids, chain_length, chain_type)[source]

Compute the ipTM_d0 score from the PAE matrix.

Parameters:

pae_arraynp.array: array of predicted PAE
chain_idslist: list of chain IDs
chain_lengthlist: list of chain lengths
chain_typelist: list of chain types (e.g. “protein”, “nucleic_acid”)

Returns:

list: ipTM_d0 score

af_analysis.analysis.compute_pdockQ(coor, rec_chains=None, lig_chains=None, cutoff=8.0, L=0.724, x0=152.611, k=0.052, b=0.018)[source]

af_analysis.analysis.compute_pdockQ2(coor, pae_array, cutoff=8.0, L=1.31034849, x0=84.7326239, k=0.0747157696, b=0.00501886443, d0=10.0, sel='(resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and name CA) or (resname DA DC DG DT A T G C U and name P) or ions or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)')[source]

af_analysis.analysis.extract_fields_file(data_file, fields)[source]

Get the PAE matrix from a json/pickle file.

Parameters:

filestr: Path to the json file.
fieldslist: List of fields to extract.

Returns:

value

af_analysis.analysis.extract_ftdmp(ftdmp_result_path, score_list=['raw_scoring_results_without_ranks.txt'])[source]

Read ftdmp output files

Parameters:

ftdmp_result_pathstr: Path to the ftdmp output directory

Returns:

my_dataAFData: object containing the data

af_analysis.analysis.extract_pae_json(json_file)[source]

Get the PAE matrix from a json file.

Parameters:

json_filestr: Path to the json file.

Returns:

np.array: PAE matrix.

af_analysis.analysis.extract_pae_npy(npy_file)[source]

Get the PAE matrix from a npy file.

Parameters:

npy_filestr: Path to the npy file.

Returns:

np.array: PAE matrix.

af_analysis.analysis.extract_pae_npz(npz_file)[source]

Get the PAE matrix from a npz file.

Parameters:

npz_filestr: Path to the npz file.

Returns:

np.array: PAE matrix.

af_analysis.analysis.extract_pae_pkl(pkl_file)[source]

Get the PAE matrix from a pkl file.

Parameters:

pkl_filestr: Path to the pkl file.

Returns:

np.array: PAE matrix.

af_analysis.analysis.get_pae(data_file)[source]

Get the PAE matrix from a json/npz file.

Parameters:

data_filestr: Path to the json/npz file.

Returns:

np.array: PAE matrix.

af_analysis.analysis.inter_chain_pae(data, fun=<function mean>)[source]

Read the PAE matrix and extract the average inter chain PAE.

Parameters:

dataAFData: object containing the data
funfunction: function to apply to the PAE scores

Returns:

None

af_analysis.analysis.ipSAE(data, pae_cutoff=10.0)[source]

Compute the ipSAE score from the PAE matrix.

Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py

Cite: .. [R2fcdeef135b2-1] Dunbrack RL Jr. Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it bioRxiv (2025).

Parameters:

dataAFData: object containing the dipSAE(ata
ref_dictdict: dictionary containing the reference PAE matrix for each query

Returns:

None: The dataframe is modified in place.

af_analysis.analysis.ipTM_d0(data)[source]

Compute the ipTM_d0 score from the PAE matrix.

Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py

Cite: .. [Rafe578d035f8-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).

Parameters:

dataAFData: object containing the data
ref_dictdict: dictionary containing the reference PAE matrix for each query

Returns:

None: The dataframe is modified in place.

af_analysis.analysis.ipTM_d0_interface(data)[source]

Compute the ipTM_d0 score from the PAE matrix.

Implementation is based on the ipTM_d0 function from the IPSAE package https://github.com/DunbrackLab/IPSAE/blob/main/ipsae.py

Cite: .. [R0cb857a1874f-1] Dunbrack RL Jr. “Rēs ipSAE loquunt: What’s wrong with AlphaFold’s ipTM score and how to fix it” bioRxiv (2025).

Parameters:

dataAFData: object containing the data
ref_dictdict: dictionary containing the reference PAE matrix for each query

Returns:

None: The dataframe is modified in place.

af_analysis.analysis.iplddt(data, sel='(resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and name CB) or (resname GLY and name CA) or (resname DA DC DG DT A T G C U and name P) or ions or (not resname ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL PTR and not resname DA DC DG DT A T G C U and noh)', cutoff=10.0)[source]

Compute the iplddt from the pdb file.

Parameters:

dataAFData: object containing the data
selstr: selection string for the atoms to consider in the distance calculation, default is TOKEN_SEL_IPLDDT
cutofffloat: distance cutoff to define interface residues, default is 10.0 A
Implementation was inspired from https://github.com/piercelab/alphafold_v2.2_customize/blob/master/get_interface_plddt.pl
If contact number is zero, the iplddt score is set to 0.

Returns:

None: The data.df dataframe is modified in place.

af_analysis.analysis.mpdockq(data)[source]

Compute the mpDockq [2] from the pdb file.

\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]

where:

\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]

\(L = 0.728\), \(x0 = 309.375\), \(k = 0.098\) and \(b = 0.262\).

Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py

Parameters:

dataAFData: object containing the data

Returns:

None: The log_pd dataframe is modified in place.

References

[2]

Bryant P, Pozzati G, Zhu W, Shenoy A, Kundrotas P & Elofsson A. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nature Communications. vol. 13, 6028 (2022) https://www.nature.com/articles/s41467-022-33729-4

af_analysis.analysis.pdockq(data)[source]

Compute the pDockq [1] from the pdb file.

\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]

where:

\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]

\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.

Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py

Parameters:

dataAFData: object containing the data

Returns:

None: The log_pd dataframe is modified in place.

References

[1]

Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w

af_analysis.analysis.pdockq2(data)[source]

Compute pdockq2 from the pdb file [3].

\[pDockQ_2 = \frac{L}{1 + exp [-k*(X_i-X_0)]} + b\]

with

\[X_i = \langle \frac{1}{1+(\frac{PAE_{int}}{d_0})^2} \rangle * \langle pLDDT \rangle_{int}\]

References:

[3]

: https://academic.oup.com/bioinformatics/article/39/7/btad424/7219714

af_analysis.analysis.read_ftdmp_raw_score(raw_path)[source]

Read raw ftdmp score files

Parameters:

raw_pathstr: Path to the raw score file

Returns:

raw_scorepandas.DataFrame: Dataframe containing the raw score data

af_analysis.format module

af_analysis.format package