Frequently Asked Questions
What are the main advantages of FuncLib?
- FuncLib designs combinations of highly epistatic active-site mutations that might be inaccessible to natural and laboratory evolution or require high throughput screening or selection methods.
- Funclib does not require high-throughput screens - we recommend testing the top 50 ranked designs.
- Funclib can be used without a ligand or transition-state analog to obtain a repertoire of functionally diverse enzymes that can be screened for a substrate of interest.
- If the target ligand or transition state analog position in the active site is known, it can be included in the calculations. See below.
What type of proteins can be submitted?
FuncLib calculations work best on soluble, single-domain proteins. If using a multimer, be careful when mutating interface residues.
It is highly recommended to work on a PROSS stabilized variant of your protein.
Can FuncLib be used without a structure?
FuncLib must receive an X-ray structure as input.
If using a model (e.g., homology, AlphaFold2) make sure to only mutate residues located at high confidence resigns.
Can I submit an NMR structure?
FuncLib does not support NMR structures. In order to use an NMR structure, tweak the NMR-based PDB file to look like it is based on a crystal structure (having all atoms appearing only once).
However, we strongly recommend avoiding NMR structures, as they are typically not accurate enough for this kind of calculations.
How long does a FuncLib calculation take?
FuncLib usually runs for a day or two. It might be longer when our servers are overloaded.
What do the result files include?
When the results are ready, you will get an email with a link to download your results. It will include a file named ReadMe.txt with a detailed explanation of the results. Please read the ReadMe file carefully.
A shorter description of the results:
- An excel (*.csv) file with the clustered designs sorted by Rosetta score(Rosetta Energy Units).
- Design clustering- we select the best designs (in terms of predicted Rosetta score) that are different from one another by N mutations (by default, N=2).
- When including a ligand, the excel file will also include the ligand's score. you may use it in combination with the Rosetta score to select the best mutants. In that case we advise to not cluster the result (i.e., N=1).
- A zip file with the structures of the top 50 ranked designs.
- A text file with the sequence space, e.g. the allowed residues in each position
- A text file with the parameters you used to create the run
- The PSSM file used in the design calculations
How to interpret the different variants' names?
One of the results files is the sequence space. This file contains a list of allowed amino acids at each diversified position (the WT identity is placed first). For example:
106 A PIKAA ICHLM
132 A PIKAA FL
217 A PIKAA MLLIR
217AA PIKAA L
In the other output files, the name of each mutant relates to the sequence space; each mutation is represented with two digits. For the sequence space shown above, a mutant with the name 04010202 has the following sequence: I106L, 132 is not mutated (numbered 01), L271I and M217L. The mutant that contains in its name only .01. symbols (0101010101) is the WT.
How to calculate mutants for a user-defined sequence space?
In some cases the user might want to test multi-point mutants with some specific mutations that were not included in the sequence space. Alternatively, you have some experimental mutational data you would like to calculate combinations of mutations for.
In both cases you should use the Sequence space file option.
If this is a re-run of FuncLib (i.e., you ran FuncLib once, edited the sequence space, and want to calculate the new multipoint mutants), in order to get comparable results between different runs you need to:
- For a PDB structure, upload the non-mutated variant (i.e., the 0101* mutant) you received in the results from the first FuncLib run
- Check the "do not refine the structure" box
How to proceed toward experimental validation?
- We strongly recommend ordering the full genes rather than inserting mutations one by one
- Select the designs for experimental testing- we recommend ordering the top 50 designs, but you can order more/fewer variants according to your screening capacity
- For each selected design, copy the WT protein's sequence, and change the needed positions according to the variant's name (see above How to interpret the different variants' names). After generating all sequences, align them all to the WT original sequence, make sure again the mutations are in the correct place and verify that there are no gaps.
- Structures with missing densities: If you have missing densities in your structure, you should align each mutant sequence to the full sequence of your protein, and fill in the missing residues. We highly recommend to check the attached PDB structures of the top 50 mutants to make sure the mutations were introduced in the intended positions.
- Back-translate the amino acid sequences to DNA sequences with dnaworks or EMBOSS websites.
How to proceed when receiving an error email?
Errors are most commonly due to wrong input.
In the error email you have a list of all the parameters you submitted to FuncLib. Go over each parameter and make sure it is correct.
Below is a list of common user errors:
- PDB id
- The desired PDB id contains the letter 'o' but you typed the digit 0 or vice versa
- Wrong ID - copy the id reported in the email and paste it in the rcsb web. Make sure it is your protein of interest
- The ID corresponds to an NMR structure - FuncLib is incompatible with NMR structures
- The PDB files contain negative residue numbers. FuncLib is incompatible with negative numbers.
- One of the residues has more than one conformation- edit the file manually and keep only one conformation.
- (rare) The PDB file contains a residue for which one or more of the backbone atoms is missing. Note that if an entire residue is missing, this is not a problem. However, if some atoms exist, all backbone atoms should also exist. To solve this problem remove all other atom lines related to the specific residue (i.e., turn it into a missing density).
- Chain identifier to design
- Missing identifier - one of the chain identifiers used does not appear in the PDB file
- Wrong identifier - not of a protein chain but of DNA or RNA chains
- Wrong identifier - the identifier is of a protein chain, but the wrong one and is incompatible with the rest of the parameters
- Numerical identifier - FuncLib is incompatible with numerical identifiers (1, 2...). You can change the identifiers to letters and resubmit the query using the upload files option
- You included a chain containing only non-amino acid residues (e.g., ligands and ions). Such chains should not be specified here. Instead, if you'd like to keep a ligand/ ion during simulation, use the ligands or ions to keep during simulations options.
- Amino acid positions to diversify
- One or more of the positions (a pair of residue number and chain, e.g. 106A) does not exist in the PDB file
- A missing comma leads to the merge of 2 numbers
- Using a numbering system from a paper or database that does not match the numbering system in the PDB file
- Trying to specify a position for which there is missing X-ray density
- Specifying a number of a nonnative amino acid
- Some residues positions are both in Amino acid positions to diversify and Essential amino acid residues
- Missing chain - one of the residues you specified is located on a chain not specified in the chain identifier option
- Using the amino acid instead of the chain - The list should be of residue positions and chain, not of positions and amino acid letter. For example, if you want to specify a Valine in residue 56 on chain B, you should specify 56B and not 56V
- We suggest opening the PDB file on PyMOL; then go over all the position numbers you entered here and make sure you can select them on the designed chain in PyMOL
- Essential amino acid residues
- Non-protein residues cannot be specified in this option(e.g., ligands, co-factors). If needed, specify the protein residues in contact with these other atoms
- Do not include here any of the amino acid positions to diversify, as they should be mutated. The essential amino acid residues will be kept fixed during simumlations
- See also in Amino acid positions to diversify
- Make sure you used a pair of the residue number and the chain (e.g., 1X)
- Do not use the three-letter name of the ligand in the PDB file
- Single atom ligands cannot be used in this option
- Do not designate a ligand and an ion in the same residue number. Please separate them into different residues.
- See also in Amino acid positions to diversify
- Sequence space
- Use the exact format for a sequence space (see example here)
- The sequence space includes more than 500,000 mutants. Remove some of the diversifiable positions or remove some of the suggested mutations in some of the positions
- Not all positions in the sequence space were mentioned in the Amino acid positions to diversify