ProbeMaker User Guide
- Introduction
- Definitions
- Program operation
- Input and output
- Installation
- User interface
- Appendices
- References
Introduction
ProbeMaker is a software framework for computer-assisted design and analysis of oligonucleotide probes, developed by Johan Stenberg of the Molecular Medicine research group at the Department of Genetics and Pathology, Uppsala University, Uppsala, Sweden
ProbeMaker is intended to be/become a general purpose platform for design and analysis of sets of oligonucleotide probe sequences. Focus is on the design of probes consisting of separate functional elements. Examples of such oligonucleotides are padlock probes, probes for the oligonucleotide ligation assay, selector probes, tagged mini-sequencing primers, etc.
The many currently available programs for oligonucleotide design define criteria for primer or probe selection, but most are rather limited in scope. The ProbeMaker framework is flexible to allow the design of many different types of probes, and is extensible. The idea is that application-specific extensions are created and incorporated into the framework as the need arises, thus increasing the utility of the software over time. ProbeMaker is free, open-source, software and all users are encouraged to contribute to its development.
A number of extensions of which most are intended for design of padlock probes and selectors, are provided with the first release of the software.
This document contains a description of the ProbeMaker software and how to install and use it. The description is fairly lengthy, but recommended, reading. See the tutorial for a guide to getting started.
Definitions
Some of the terms used throughout this text and within the software are explained here.
- Target:
- A target is a sequence for which a probe is to be designed. The target determines what the target-specific sequences will be by defining two subsequences of itself as templates.
- Probe:
- A probe is a nucleotide sequence made up of several subsequences, or blocks. Each block is either a target-specific sequence or a tag. Blocks are ordered from the 5’ to 3’ direction. Each probe has two target specific sequences (5’ and 3’) and a number of tags.
- Tag:
- A tag is a nucleotide sequence that may be incorporated into a probe. Tags generally represent functional probe elements, such as sequences for hybridization, priming of polymerase reactions, restriction digestion, et cetera.
- Target-specific sequence:
- A target-specific sequence (TSS) is a sequence determined by the sequence of a target. A TSS is designed to be complementary to a template defined by a target.
- Template:
- A template is a sequence defined by a target. The templates are generally some subsequences of the target sequence. Each target defines two templates, 5’ and 3’. One TSS will be designed to be complementary to each of the two templates.
Program operation
This section describes the operation of the ProbeMaker program.
Overview
ProbeMaker takes a set of targets and a number of sets of tags as input. A set of probes are then generated for the target set according to specified design parameters. This may all be done through the graphical user interface. Targets, tags, and parameter settings are kept together in a project. Only one such project can be open at a time.
Probe generation is performed in two steps, construction of target-specific sequences, and tag allocation. This is described in detail below.
Target sequences
The target sequences determine what type of probe will be generated, and what the target-specific sequences of each probe will be. Each target defined a 5’ and a 3’ template sequence. These sequences are used in the construction of the target-specific sequences. The choice of these templates will thus determine what type of probe will be generated.
The type of target is determined at input-time by the use of different input formats, as described in the Input and output section, below. Target types provided with the program include targets for padlock probes, gap-fill padlock probes/molecular inversion probes, and selector probes. Other probe types may be defined and used through the plug-in feature of the program.
Target sequences may include variations, such as single nucleotide polymorphisms (SNPs) or insertions/deletions (InDels). Targets containing such variations may be expanded, meaning that a new target is created for each possible variant of a selected variation, creating a group of targets. This way it is possible to design groups of probes for related target sequences, such as for SNP genotyping.
It is possible to edit target sequences through the user interface, and to change the selected target position or variation, thus changing the template definitions. Sequences may be defined as unchangeable at input-time.
Probes
The generation of probe sequences is the main purpose of the program. A ProbeMaker probe is made up of subsequences, or blocks. Each probe has at least two blocks, the two target-dependent parts of the probe (the TSSs). Tags may be put to the 5’ of, in between, or to the 3’ of the TSSs, as determined by setting the appropriate design parameters.
Tag sequence sets
Tag sequences are loaded in to the program in sets. Each tag set may be assigned to a certain tag position, ordered from 5’ to 3’. Each set of tags is also assigned a selection mode that determines how tags will be incorporated into probes during tag allocation. Five selection modes exist:
- Same mode:
- The same tag is used in all probes
- Unique mode:
- A different tag is used for every probe
- Target-specific mode:
- All probes that belong to the same group, as defined by the grouping of target sequences, use the same tag
- Variant-specific mode:
- Each probe in a group uses a different tag, but the same set of tags is reused in every group
- Any mode:
- Any tag may be used by a probe, regardless of the use in other probes.
Tag sequences may not be edited after input. If the same set is used in more than one position, the tag usage is not well defined.
Target-specific sequence construction
The first step of a probe design job is the construction of the target specific sequences. These are determined by the templates defined by the target sequences, the selected variants or positions of the targets, and the TSS design settings, determining maximum and minimum lengths and preferred melting temperature (Tm) of the TSS-target hybrid, for each of the two TSSs. The two TSSs are constructed independently of each other. Each template sequence defines an end to keep fixed when changing the length of the TSS. Optionally, both ends may be kept fixed, effectively locking the TSS sequence to a particular length. This is useful if the optimal lengths are already known e.g. if the sequences have been selected by an upstream program for target sequence selection or primer design.
TSS construction proceeds as follows. The appropriate template sequence is retrieved from the target, and a TSS created as the reverse complement of this sequence. The TSS is now at the maximum allowed length. The Tm is then calculated for the TSS-target hybrid, and one nucleotide is removed from the non-fixed end of the TSS until either the minimum length is reached, or the Tm closest to the preferred value is reached. When calculating the Tm of TSS hybridization to the target a nearest-neighbor (NN) model is used as described by e.g. Owczarzy et al., using parameters from SantaLucia et al. The NN-model implementation currently used supports calculation of Tm for sequences with a perfect match only and does not take dangling ends or other end effects into account, since the nature of these varies with the probe application and the tag sequences that will be selected.
Tag allocation
After TSS construction, probes that have acceptable TSS pairs will go through the next step of tag allocation. For each probe, candidates are generated for all valid tag combinations until a candidate is fully accepted, as determined by the current acceptor, or until all valid candidates have been tried. Then, if applicable, one candidate is selected, as determined by the current selector. The acceptor and selector concepts are described below.
Each tag position is allocated tags from a set of tags as specified by the user. The manner in which tags are allocated from a given set depends on the selection mode. For each probe, a number of possible candidates exist, depending on the number and sizes of tag sets and the selection modes used.
A special ‘spacer’ tag may be used if equally long probes are desired. This spacer is added in between two positions and its length is adjusted to make the probe be of the desired length. Spacers may be variable (all possible spacers of the appropriate length are generated) or be repeats of a given sequence.
During candidate testing, tags may be found unsuitable for use in a particular position. This is then noted, and these tags are not used for the generation of subsequent candidates, thus reducing the number of possible candidates.
Testing and evaluation
Each TSS pair and probe candidate is evaluated by performing a number of tests. These tests are performed in three stages. Each TSS pair is tested prior to tag allocation. Each candidate that is generated is then tested in two stages. Candidates that pass stage 1 are tested in stage 2. The user specified the tests to perform in each stage by allocating analysis modules, each of which defines one or more tests. The two stages of candidate testing allow the user to allocate tests in such a manner that the results of fast tests are evaluated and acted upon before more time-consuming tests are performed. Optionally, test results may be evaluated after each group of tests, so that fewer tests need be carried out, but resulting in less information being acquired for each probe.
Tests may be limited to the current probe candidate, or may involve the comparison of this candidate with other probes and targets. Tests are defined by analysis modules that may be created and added to the program as extensions.
The tests check for a number of possible problems with the probe. When possible problems are found, messages are added to the candidates. These messages are of different types, depending on the type of condition, e.g. TSS length, and each message has a description text, such as "5' TSS shorter than minimum length". Each message also has a level of severity; alert, warning, error, or fatal. The messages added to a candidate or TSS pair are used to calculate a quality value. This value is either undetermined (before it has been calculated), poor, fair, or good. Alert messages have no effect on the quality value but the more severe messages have. A probe with at least one fatal or error message gets a 'poor' quality value. A probe with at least one warning message gets a 'fair' value for quality and a probe with no warning, error or fatal messages gets a 'good' quality value.
Acceptors
After each round of testing, a TSS pair or probe candidate is evaluated by an acceptor. This acceptor determines whether the subject should be rejected or accepted, generally as a function of its quality value. A rejected TSS pair means that no candidates are generated for this probe, it is a failure. If a candidate is rejected, the next candidate is generated. In the last test stage, a probe candidate may also be temporarily accepted. If so, it is stored in a list while candidate generation and testing continues. If a candidate is accepted in the last stage, candidate generation stops, and the current candidate is added to the list of stored candidates.
The user selects what acceptor to use by selecting one from a list of available acceptors. More acceptors may be added as plug-ins. The provided acceptors include one that will accept candidates of good quality immediately while temporarily storing those of fair quality, and one that will store all candidates of fair or good quality. The latter acceptor will thus force the generation of all possible candidates, to allow selection of the best one.
A pair-wise acceptation mode may be used when designing probes that are in groups of two. In this case, an acceptable candidate is found for the first probe in the usual way. Candidates are then generated for the second probe until one is accepted. If no acceptable candidate is found for the second probe, then new candidates are generated for the first probe until one is accepted, and so on until acceptable candidates are found for both probes or there are no more candidates to try. In the latter case, both of the probes are failures. For all pairs of probes except the first pair, the second probe of the pair will have only one allowed tag combination once the first probe of the pair has been decided. This method will allow changing the first probe if that combination happens to be a bad one.
It is possible to declare that TSS pairs and probe candidates with warning should be accepted in the two first stages of testing. This declaration may be ignored or considered by the current acceptor.
Selectors
After the candidate generation and testing is completed, either because no more possible candidates exist, or because a candidate was accepted in the last stage, a probe is selected from the list of temporarily stored candidates. This may be done based on the quality value or any other properties of the candidates. The provided selector selects one of the candidates of the best quality available, and if more than one, selects the one that has the lowest secondary structure stability score (see below for details), if such scores have been determined.
Input and output
The input and output system of ProbeMaker has been designed to allow input of target and tag sequences from many different sources, and the output of probe sequences in different formats. Some formats are provided, and more may be defined and added as extensions.
Target input
Different target input formats are used to import sets of target sequences to ProbeMaker. These formats describe the type of sequence file format to read from (such as FASTA or text table files), any special annotation format used (such as [C/G] or (C/G) for SNPs), and the type of targets to generate (such as padlock probe targets or selector probe targets).
When importing targets from a file, a target input format is used to parse that file and create the appropriate type of target sequences. If a project template is used, this defines the target input filter, otherwise, the user will be asked for a target type, a file format, a sequence converter and a modifier. A target type and a file format are required. The file format (fasta format, text table format, …) defines how sequences are read from file, and the target type determines what target type is created from the sequence data read form file. Use a sequence converter if the input file uses unusual notation within the sequence data, such as parentheses or brackets. Use a modifier if you wish to manipulate the targets after loading, e.g. grouping or sorting the targets somehow, or to select a target polymorphism for each target.
After input, target sequences may (or may not, depending on the target type) be edited through the user interface. It may e.g. be possible to change the target position, or to introduce new variations.
Tag input
Currently, tag sets may be read from sequence files in FASTA format or text table format. Each set of tags should be in a separate file.
Probe output
Probe output formats define how probe sets are written to file. Provided formats include FASTA, text table, and a HTML format, which lists both sequence and other properties for each probe.
Saving and loading projects
Projects may be stored to and retrieved from disk. Projects are stored in a single text file that contains the project information, and that is possible to edit using a normal editor. When loading a project from file, it is necessary that the target and analysis module classes used in the stored project are available for the load operation to succeed.
Installation
In order to be able to run ProbeMaker on a computer, a Java Runtime Environment (JRE) must be installed. It is recommended that the latest version is used, and at least a version compliant with Java 1.4. JREs may be downloaded for free from Sun Microsystems' Java download site (http://java.com). Furthermore, the ProbeMaker class files and the MolTools class library must be installed. In the following examples, we will assume that the program is being installed on a computer running Microsoft Windows. On other systems, the procedure should be similar but may differ in details.
The ProbeMaker java archive file (probemaker.jar) should be downloaded and placed in a suitable directory of the file system, e.g. in 'C:\ProbeMaker\'. In the ProbeMaker directory, a subdirectory named 'log' should also be created (this is not required). The plug-in and project template configuration files supplied with the program should be put in the ProbeMaker directory. Assuming the PATH environment variable is properly set for the JRE and that the current directory is 'ProbeMaker', the following command from a command window should start the ProbeMaker application:
java -cp ProbeMaker.jar org.moltools.apps.probemaker.ProbeMaker
Double-clicking the ProbeMaker.jar icon should also start the application if the system is properly configured
The first time the program is run, the user should be asked to set the default configuration. This is done by selecting a number of files and directories as described below. The names are recommendations, not requirements.
- Home directory:
- Set this to be the ProbeMaker directory. This is the home directory of the program, default directory for storing and loading files.
- Settings directory:
- Set this to a subdirectory of the ProbeMaker directory named ‘settings’. This is the location where settings files are stored.
- Default settings file:
- Set this to a file named ‘default.par’ in the settings subdirectory. This file defines the default settings for new projects that you create.
- Plug-in file:
- Set this to a file named ‘plugins.txt’ in the settings subdirectory. This file defines which plug-in extensions should be loaded on start-up. If the file is not set, all available plug-ins will be loaded.
- Project template file:
- Set this to a file named ‘templates.xml’ in the settings subdirectory. Project templates that you create are stored here, and loaded when you restart the program.
- Log directory:
- Set this to a subdirectory of the ProbeMaker directory named ‘log’. Each design job will create a log file in this directory.
- Error log file:
- Set this to a file named ‘errors.log’ in the ProbeMaker directory. If any error occurs while running ProbeMaker, this will be logged the error log file. If this file is not set, error messages will be written to the console instead.
Before closing the preferences dialog, the save button should be clicked, saving these settings. The configuration can be changed by accessing the Preferences-> User preferences menu option.
The program may of course also be started by adding a shortcut or creating a batch file that provides the appropriate command.
Extensions
ProbeMaker uses plug-ins to extend the functionality of parts of the program without updating or re-installing the program itself. This plug-in feature facilitates the addition of new analysis modules, target input formats, acceptors and other things.
When a new extension (in the form of a Java class implementing the appropriate plug-in interface) has been created, the class file or a jar file containing the class should be added to the java interpreter’s class path when running ProbeMaker.
New extensions may be loaded while ProbeMaker is running through the Preferences->Plug-ins menu option.
User interface
ProbeMaker may be run from the command prompt or through a graphical user interface (GUI), which facilitates the choice of targets and tags and the setting of parameters. It also provides the possibility to display and search among the resulting probes. The GUI consists of a menu bar with file handling options, and four tabbed panels, named 'Targets', 'Tags', 'Settings', and 'Probes'.
The 'Targets' and 'Probes' panels each consist of a table showing the sequences, and buttons for specific functions. In the sequence tables, additional sequence-specific functions may be called up through a pop-up menu (right-click on Windows systems). The 'Tags' panel allows importing tag sets and assigning selection modes and tag positions. The 'Settings' panel is used to set the design parameters and configure and allocate analysis modules.
Menu and Toolbar
The menus have options for creating new projects, saving projects, opening projects, changing the configuration, exiting the program, and for calling up the log viewer. The log viewer is a separate window which can show HTML files, intended for the log files that are created during design, but which can of course also be used for viewing other HTML files.
Some of the above options are also available from the toolbar. Probe design is started by using one of the two design buttons, on the toolbar, either deleting any old probes or appending the new probes to any old ones.
Targets tab
As target sequences are loaded from files, they are displayed in the target table. Before designing a probe for a target sequence, the target variant has to be set. Any position can be set from the menu that is made available by left-clicking the 'variant' column of the table entry. For variable target sequences, the menu also contains the different possible variants (single nucleotide variations and insertions/deletions are supported). For variable targets, probes are usually designed for each of the sequence variants. To do this, use the 'Expand targets' option. This will expand any target sequence with exactly one variable position to new target sequences, one for each of the possible variants (usually two variants). If a target sequence contains more than one variable position, one of these has to be selected prior to using the expand option. The expand function will keep the variants of a target together in a group, which is important during tag allocation.
Tags tab
Tag sets that are loaded appear in the 'available libraries' box, where they can be viewed by double-clicking. A tag position is assigned to a tag set by dragging the set from the available-box and dropping it in the appropriate position. The selection mode is then set in the combo-box. A tag library is unassigned by clicking on the position box. Tag libraries may be reset between design jobs (this clears all information on which tags have been used). Information about which tags have been used is stored in the project file, not in the tag library files. The tag settings are applied to the current project by pressing the 'Apply' button.
Settings tab
On the 'Settings' panel, it is possible to view and set the different parameters and other settings that are used in a design job. Settings are stored in the current project when the 'Apply' button is pressed and may be saved for reuse in later projects by using the 'Save parameters' button. The default parameters file can be set to contain the current settings by pressing the 'Set as default' button and be loaded again with the 'Reset' button (This requires that the default parameters file has been set in the configuration file).
The list of available analysis modules display all default and all plugged-in analysis modules. By clicking on a modules name, a description window appears. If a module is dragged to one of the analysis stage lists, a list of the tests performed by this module will appear. It is then possible to set which tests to perform. Clicking the name of the module will now bring up a parameter settings dialog for the analysis module.
Probes tab
From the 'Probes' panel it is possible to view probe data in different ways, and to reanalyze them with new parameters. This function can also be used to evaluate probes from other sources. To get proper results from such an analysis, the parameters and tag settings have to be carefully set to reflect the intended pattern of tags. When importing probes from other sources, target sequence data may not be available and some tests thus not possible to carry out.
Appendices
Examples
To test ProbeMaker, create a target file (targets.lib) and a tag library file (tags.lib) in FASTA format using any text editor or the sequence library tool. The files might look as shown below. Target 1 contains a variable nucleotide, S, which is read as C/G. Target 2 is an insertion/deletion (indel) target with the insertion surrounded by brackets.
targets.lib:
>Target 1 ACGAGCGACGGCAGACTACTATCTCGAAGCGAGCGCAGGASTGCATGCGACTATCTACT TACGGACTACTATC >Target 2 TTAGAGAGAGAGTATATATATCGAGGCAGCTACTAAAA[CCCC]AGACGGACGACTATC TACGACTAGCATCGAGC
tags.lib:
>Tag 1 TTTAGTCGTTTGCCCGAGGC >Tag 2 GAGTAGCCTTCCCGAGCATT >Tag 3 AGACGGACTTACCGCGTATG >Tag 4 CTTAACTATTAGCACTAAAA >Tag 5 TGTCTACCTTTCCGTCAAGA >Tag 6 AAACCATCGACTCACGGGAT >Tag 7 ATTGACCAAACTGCGGTGCG >Tag 8 ATTAACTCGACTGCCGCGTG >Tag 9 ATTTGATCGTAACTCGGGTG >Tag 10 AACAACGATGAGACCGGGCT
Now start the program as described above, this may take a few seconds while all the Java classes are loaded. Create a new project by selecting the appropriate option in the File menu. Name the project 'test'. On the ‘Target’ tab, select 'Import targets' and open the 'targets.lib' file using the “Fasta format. Select first polymorphism, padlock targets” input format. The two target sequences should now appear in the target table. Now, expand the targets to their different variants (they are both variable sequences as mentioned above) by clicking the 'Expand targets' button at the top of the panel. Now select the 'Tag' tab and load the tag library 'tags.lib' by using the 'Add libraries' button. Select that library for the first tag position and set it in 'Use per target' mode in the appropriate combo box. Also select the library for the second position and set it in 'Use per variant' mode. Now click the 'Apply' button and go back to the 'Targets' panel. Click the 'Design new probes' button and press OK in the two dialog boxes. The first design job should now start and will select probes specific for each of the two variants of each of the two targets in the file.
Time issues
The time required for ProbeMaker to complete a design job depends on the total number of candidates that are generated and the time required for the selected tests to be performed on each generated candidate.
The maximum number of candidates generated depends on the size and selection mode of the tag sets used in the design. Consider allocating tags to a set of probes for 50 SNP targets (two probes per target) in the following pattern:
Tag position | Library size | Selection mode |
---|---|---|
1 | 4 | Any |
2 | 1 | Same |
3 | 2 | Per variant |
4 | 200 | Per target |
The number of possible candidates to test for the first probe is 4 * 1 * 2 * 200 = 1600, while for the second probe, being designed for the other variant of the same target, only 4 possible candidates exist as the target- and variant-specific tags are determined by the tag selection made for the first probe. With 100 pairs of probes, this yields a maximum number of candidates of 160400. If probe design is successful for every probe pair, the number of available target-specific tags is reduced by one for the next pair, reducing the number of possible candidates to 120800. The design constraints chosen, the nature of target and tag sequences, and the acceptor scheme used determine how many of the possible candidates have to be generated before an acceptable one is found. Using the acceptor scheme that will only accept candidates for the temporary list will force the generation of all possible candidates, while use of other schemes may reduce the number of candidates tried, depending on the nature of targets and tags and thus of the generated candidates. The pair-wise acceptation scheme will result in a total of 640000 candidates being tested in the worst case. The choice of acceptation scheme is thus a trade-off between design time and quality of designed probes.
The time required to test each candidate depends on how many probes are being designed, which tests that are performed and how the tests are divided between stages 1 and 2. If the most time-consuming tests are done in stage 2, these are avoided for candidates that fail already in stage 1. The tests that are most time-consuming among the ones described in this text are the secondary structure estimation, which includes doing an alignment of a long sequence, and the tag occurrence tests which compare each tag in a candidate to all other probes by doing alignments. To reduce the design time, tags that have been found unsuitable for use in a probe, because they interfere with probe-target hybridization or because the tag sequence or a similar sequence is already present in another probe, are not used when generating new candidates for that probe, thus avoiding testing candidates that are predicted to fail. In the above example this will reduce the number of possible candidates by approximately 40 %.
Also, the tag occurrence test can be skipped for some or all tag sets during design, instead performing the test after completion of the design. For example, the likelihood of a 20-mer sequence to appear on random in a set of 100 probes each 100 nucleotides long is less than 1 in 100 million, and even if a few mismatches are allowed, the risk is very low.
References
Owczarzy, R., Vallone, P.M., Gallo, F.J., Paner, T.M., Lane, M.J. and Benight, A.S. (1997) Predicting sequence-dependent melting stability of short duplex DNA oligomers. Biopolymers, 44, 217-239.
SantaLucia, J. Jr. and Hicks, D. (2004) The Thermodynamics of DNA structural Motifs. Annu rev Biophys Biomol Struct, 33, 415-440.