Next, you will write a function longestORF(DNA)
that takes a DNA string and returns the sequence of the longest open reading frame on it, in any of the three possible frames. This function will not consider the reverse complement of DNA.
It shouldn’t take much work to write longestORF
given that you’ve already written oneFrame
.
Consider the one sequence example from above:
>>> String DNA="CAGCTCCAATGTTTTAACCCCCCCC";
We can look at the three frames of this sequence by slicing off 0, 1 or 2 base pairs at the start:
>>> oneFrame(DNA) [] >>> oneFrame(DNA.substring(1)) // offset multiple by one and doesn't include the first character [] >>> oneFrame(DNA.substring(2)) // offset multiple by two and doesn't include the first two characters ['ATGTTT']
Each call to oneFrame
will produce an ArrayList or Array (up to you, but stay consistent. if you’re using ArrayLists then always use ArrayLists). You can then keep track of the longest one that you have found so fast and eventually return it.
Remember that if you have a string and want to know its length, you can use the built-in .length()
function.
Here are some examples of longestORF
:
longestORF('ATGAAATAG') 'ATGAAA' longestORF('CATGAATAGGCCCA') 'ATGAATAGGCCCA' longestORF('CTGTAA') '' longestORF('ATGCCCTAACATGAAAATGACTTAGG') 'ATGAAAATGACT'
So what exactly did we do here? Well, we are still looking at multiples of three, but we’re now looking at different substrings that start a different spots. In the first example, we have a simple ORF. In the second example we have an ORF that start at a character that is not a multiple of three. Then in the last one, we have two ORF’s on that starts at a multiple of three and another that doesn’t.
How should you approach this? A .substring() will definitely help! Make sure to use oneFrame() also.
Let’s take a sequence ‘CTATTTCATG’ as an example. What would be the reverse compliment of this sequence?
'GATAAAGTAC'// this is the compliment
'CATGAAATAG'// this is the reverse compliment
We’ve already written a createCompliment() function before (which works with 3 strings at a time). You can use the same strategy to make a reverseCompliment() method.
A gene might appear on this strand or its reverse complement. Thus, our next function is a very short one called longestORFBothStrands(DNA)
. This function takes a DNA string as input and finds the longest ORF on that DNA string or its reverse complement. You can use the longestORF
function you have already written.
DNA
reverseComplement
method to find the reverse complement). For example,
>>> longestORFBothStrands('CTATTTCATG') 'ATGAAA'
Make sure to use the examples above to show that you have passed the necessary tests for these functions.