Topic 1 - Variables and Data Types
Topic 2 - Conditionals and Strings
Topic 3 - Loops
Topic 4 - Arrays
Topic 5 - File Handling
Semester 1 Projects
Topic 6 - Classes/Objects and Methods
Topic 7 - ArrayLists
Semester Projects

Part 2c: Longest ORF and longestORFBothStrands

longestORF

Next, you will write a function longestORF(DNA) that takes a DNA string and returns the sequence of the longest open reading frame on it, in any of the three possible frames. This function will not consider the reverse complement of DNA.

It shouldn’t take much work to write longestORF given that you’ve already written oneFrame.

Consider the one sequence example from above:

>>> String DNA="CAGCTCCAATGTTTTAACCCCCCCC";

We can look at the three frames of this sequence by slicing off 0, 1 or 2 base pairs at the start:

>>> oneFrame(DNA)
[]
>>> oneFrame(DNA.substring(1)) // offset multiple by one and doesn't include the first character
[]
>>> oneFrame(DNA.substring(2)) // offset multiple by two and doesn't include the first two characters
['ATGTTT']

Each call to oneFrame will produce an ArrayList or Array (up to you, but stay consistent. if you’re using ArrayLists then always use ArrayLists). You can then keep track of the longest one that you have found so fast and eventually return it.

Remember that if you have a string and want to know its length, you can use the built-in .length() function.

Here are some examples of longestORF:

longestORF('ATGAAATAG')
'ATGAAA'
longestORF('CATGAATAGGCCCA')
'ATGAATAGGCCCA'
longestORF('CTGTAA')
''
longestORF('ATGCCCTAACATGAAAATGACTTAGG')
'ATGAAAATGACT'

So what exactly did we do here? Well, we are still looking at multiples of three, but we’re now looking at different substrings that start a different spots. In the first example, we have a simple ORF. In the second example we have an ORF that start at a character that is not a multiple of three. Then in the last one, we have two ORF’s on that starts at a multiple of three and another that doesn’t.

How should you approach this? A .substring() will definitely help! Make sure to use oneFrame() also.

longestORFBothStrands

Let’s take a sequence ‘CTATTTCATG’ as an example. What would be the reverse compliment of this sequence?

'GATAAAGTAC'// this is the compliment
'CATGAAATAG'// this is the reverse compliment

We’ve already written a createCompliment() function before (which works with 3 strings at a time). You can use the same strategy to make a reverseCompliment() method.

 A gene might appear on this strand or its reverse complement. Thus, our next function is a very short one called longestORFBothStrands(DNA). This function takes a DNA string as input and finds the longest ORF on that DNA string or its reverse complement. You can use the longestORF function you have already written.

  1. Ask it for the longest ORF in the given DNA 
  2. Then ask it for the longest ORF on its reverse complement (use your reverseComplement method to find the reverse complement).
  3. The longer of those two is the longest ORF possible (break ties arbitrarily).

For example,

>>> longestORFBothStrands('CTATTTCATG')
'ATGAAA'

Testing

Make sure to use the examples above to show that you have passed the necessary tests for these functions.