DNA sequencing using the fluoresence based Sanger method comprises interpretation of a sequence of signal peaks of varying size whose colour indicates the presence of a base. We have established that the ability to predict the variations effectively makes available novel error correction information which will improve sequencing efficacy. Our experiments so far have used basic models of the Sanger reaction chemistry and machine learning techniques. These have enabled us to make base calls just using context information, specfically ignoring the peak data at the base calling position. The 80% success rate of our blind experiments is striking, and will be improved by a more accurate model of trace behaviour. To this end, and to integrate the information into mainstream basecalling, we wish to develop an enzyme kinetics model susceptible to calibration of its component rates such that trace data can be accurately predicted. We describe DNA sequencing trace data, outline the trace prediction problem requirements on the model, and discuss model construction and calibration issues.
pubs.doc.ic.ac.uk: built & maintained by Ashok Argent-Katwala.