Friday, 16 May 2014

SBOL for gene design?

The Synthetic Biology Open Language (SBOL) is, in their own words, a data exchange standard for descriptions of genetic parts, devices, modules, and systems. More details on the format are available at their website.

I've started to support SBOL in some recent work of mine, which involves the development of a web application for designing and optimising oligomers for gene design. The application, GeneGenie, is running and the work has recently been published in Nucleic Acids Research.

Typically, synthetic genes are constructed from a number of shorter single-chain oligomers, of alternating forward and reverse strand, which can be assembled together through overlapping sequences, as illustrated in the following screen capture from GeneGenie (figure 1):

Figure 1: Synthetic gene assembly from short single-stranded oligomers. See full interactive results.

(As an aside, the reason that this method of assembly of large synthetic genes from smaller oligomers is employed is that, typically, manufacturers will supply oligomers in the region of 60mers. Synthesis of longer oligomers is less reliable with increased likelihood of introducing errors as the sequence length increases.)

The above system, taken from the image, can be exported in SBOL, thanks to the very nifty libSBOLj library, and doing so produces this SBOL file. The SBOL is valid, and can be visualised in SBOL Designer as follows (figure 2):

Figure 2: SBOL Designer view of the same synthetic gene assembly.

This is close to what I'm wishing to capture but not close enough. I was lucky enough to discuss this at the recent Harmony 2014 meeting with Chris Myers, Neil Wipat, Chris Madsen and Goksel Misirli, and we came up with the following conclusions:

  • Although the SBOL (as generated by GeneGenie) does capture the concept of alternating forward and reverse strands, the representation above captures coding sequences on alternating forward and reverse strands of double stranded DNA.
  • What I'm attempting to capture is alternating forward and reverse strands of single stranded DNA, with overlapping segments of double stranded DNA, as in Figure 1.

Ideally, I'd then like to superimpose features on top of this structure (such as overhangs, promoters, coding sequences, etc.). I consider this can of worms separately in the next post.

The question is: can such a system be captured by SBOL in its current form? If so, how? And if not, then what do we have to do to make it do so?

Although the GeneGenie system is new, the basic concept of synthesising large genes from shorter oligos is a common use case (see also the existing (since 2002) web application, DNAWorks). I therefore think that it would be useful for SBOL if this could be handled.

Oh, and finally, welcome SBOL to the COMBINE community!

No comments:

Post a Comment