Topic 1 - Variables and Data Types
Topic 2 - Conditionals and Strings
Topic 3 - Loops
Topic 4 - Arrays
Topic 5 - File Handling
Semester 1 Projects
Topic 6 - Classes/Objects and Methods
Topic 7 - ArrayLists
Semester Projects

Part 2b: restOfORF and oneFrame

restOfORF

Recall that an open reading frame (ORF) is the stretch of sequence between a start codon (with the sequence “ATG”) and the next in frame stop codon (“TAG”, “TAA”, or “TGA”). By “in frame” we mean that it is in a position that is a multiple of 3 nucleotides away.

For example, take a look at the string:

ATGCATAATTAGCT

There is an “ATG” at the beginning. Then there are two codons, “CAT” and “AAT”, and then a stop codon “TAG”. So “ATGCATAAT” is an open reading frame in this string. You might think for a moment that “ATGCATAA” is also an answer, but it’s not an open reading frame because the “TAA” at the end of it is not “in frame” with the start codon “ATG” – that is, it’s not a multiple of three nucleotides away from the leading “ATG”.

Your first challenge is to write a function that finds the open reading frame given a sequence that begins with an “ATG”. For now, you need only operate on the forward sequence, and not consider its reverse complement. We’ll get to the complement in a short bit.

In the GeneFinder class:

Write a function called restOfORF(DNA) that takes as input a DNA sequence as a string. It assumes that this DNA sequence begins with a start codon “ATG”. It then finds the next in-frame stop codon and returns the ORF from the start to that stop codon. The sequence that is returned should include the start codon but NOT the stop codon. If there is no in-frame stop codon, restOfORF should assume that the reading frame extends through the end of the sequence and simply return the entire sequence as a string.

To this end, you will need to determine if a particular codon is a stop codon. Imagine that you have a string named codon and you wish to test if it is a stop codon, that is one of 'TAG''TAA', or 'TGA'. You could do this:

if (codon == 'TAG' || codon == 'TAA' || codon == 'TGA')
   blah, blah, blah

Or, better yet, you could use in this way:

arraylist = ['TAG', 'TAA', 'TGA'];
if (arraylist.contains(codon))
   blah, blah, blah

Here are some examples of restOfORF:

>>> restOfORF("ATGTGAA")
'ATG'
>>> restOfORF("ATGAGATAAG")
'ATGAGA'
>>> restOfORF("ATGAGATAGG")
'ATGAGA'
>>> restOfORF("ATGAGATAGGGGTAA")
'ATGAGA'
>>> restOfORF("ATGAAATT")
'ATGAAATT'

Note that in the last example there is no in frame stop codon, so we got back the whole string.

Your restOfORF can be written with a for loop. It will be a nice short function.

oneFrame

Write a function called oneFrame(DNA) that takes a DNA string as input. It searches that string from left to right in multiples of 3 nucleotides–that is, in a single reading frame. When it hits a start codon “ATG” it calls restOfORF on the slice of the string beginning at that codon to get back an ORF. That ORF is added to an ArrayList(or array) and then the function skips ahead in the DNA string to the point right after the ORF that we just found. This is repeated until we’ve traversed the entire DNA string. A while loop will be very convenient here! Make sure that this function returns an ArrayList or Array, up to you.

Here’s an example of this function in action:

>>> oneFrame("AATGCCATGTGAATGCCCTAA") #returns an arraylist or array
['ATG', 'ATGCCC']  #this is the contents of the arraylist or array returned

Note that the first 'ATG' did not not get returned here. This is because that ATG is not in the frame that oneFrame is searching. Remeber that we’re searching in multiples of three, so the string above would be search as ‘AAT’ then ‘GCC’ then ‘ATG’ then ‘TGA’. (When we do gene finding, we will search those other frames by additional calls to oneFrame in the other two reading frames).

Here’s another example of oneFrame:

>>> oneFrame("ATGCCCATGGGGAAATTTTGACCC")
['ATGCCCATGGGGAAATTT']

In this case, there is a second 'ATG' in the sequence. This 'ATG' is part of an open reading frame which ends in the same stop codon as the first open reading (that is, it is a smaller nested open reading frame). oneFrame skipped this second ORF when it jumped ahead to the end of the first. This is actually what we want oneFrame to do–here we’re focusing on large open reading frames and will skip small nested ones.

Testing

Use the examples above to test your two functions, restOfORF and oneFrame. You should show these function passing those tests in your final video.