Recall that an open reading frame (ORF) is the stretch of sequence between a start codon (with the sequence “ATG”) and the next in frame stop codon (“TAG”, “TAA”, or “TGA”). By “in frame” we mean that it is in a position that is a multiple of 3 nucleotides away.
For example, take a look at the string:
ATGCATAATTAGCT
There is an “ATG” at the beginning. Then there are two codons, “CAT” and “AAT”, and then a stop codon “TAG”. So “ATGCATAAT” is an open reading frame in this string. You might think for a moment that “ATGCATAA” is also an answer, but it’s not an open reading frame because the “TAA” at the end of it is not “in frame” with the start codon “ATG” – that is, it’s not a multiple of three nucleotides away from the leading “ATG”.
Your first challenge is to write a function that finds the open reading frame given a sequence that begins with an “ATG”. For now, you need only operate on the forward sequence, and not consider its reverse complement. We’ll get to the complement in a short bit.
In the GeneFinder class:
Write a function called restOfORF(DNA)
that takes as input a DNA sequence as a string. It assumes that this DNA sequence begins with a start codon “ATG”. It then finds the next in-frame stop codon and returns the ORF from the start to that stop codon. The sequence that is returned should include the start codon but NOT the stop codon. If there is no in-frame stop codon, restOfORF
should assume that the reading frame extends through the end of the sequence and simply return the entire sequence as a string.
To this end, you will need to determine if a particular codon is a stop codon. Imagine that you have a string named codon
and you wish to test if it is a stop codon, that is one of 'TAG'
, 'TAA'
, or 'TGA'
. You could do this:
if (codon == 'TAG' || codon == 'TAA' || codon == 'TGA') blah, blah, blah
Or, better yet, you could use in
this way:
arraylist = ['TAG', 'TAA', 'TGA']; if (arraylist.contains(codon)) blah, blah, blah
Here are some examples of restOfORF
:
>>> restOfORF("ATGTGAA") 'ATG' >>> restOfORF("ATGAGATAAG") 'ATGAGA' >>> restOfORF("ATGAGATAGG") 'ATGAGA' >>> restOfORF("ATGAGATAGGGGTAA") 'ATGAGA' >>> restOfORF("ATGAAATT") 'ATGAAATT'
Note that in the last example there is no in frame stop codon, so we got back the whole string.
Your restOfORF
can be written with a for
loop. It will be a nice short function.
Write a function called oneFrame(DNA)
that takes a DNA string as input. It searches that string from left to right in multiples of 3 nucleotides–that is, in a single reading frame. When it hits a start codon “ATG” it calls restOfORF
on the slice of the string beginning at that codon to get back an ORF. That ORF is added to an ArrayList(or array) and then the function skips ahead in the DNA string to the point right after the ORF that we just found. This is repeated until we’ve traversed the entire DNA string. A while
loop will be very convenient here! Make sure that this function returns an ArrayList or Array, up to you.
Here’s an example of this function in action:
>>> oneFrame("AATGCCATGTGAATGCCCTAA") #returns an arraylist or array ['ATG', 'ATGCCC'] #this is the contents of the arraylist or array returned
Note that the first 'ATG'
did not not get returned here. This is because that ATG is not in the frame that oneFrame is searching
. Remeber that we’re searching in multiples of three, so the string above would be search as ‘AAT’ then ‘GCC’ then ‘ATG’ then ‘TGA’. (When we do gene finding, we will search those other frames by additional calls to oneFrame
in the other two reading frames).
Here’s another example of oneFrame
:
>>> oneFrame("ATGCCCATGGGGAAATTTTGACCC") ['ATGCCCATGGGGAAATTT']
In this case, there is a second 'ATG'
in the sequence. This 'ATG'
is part of an open reading frame which ends in the same stop codon as the first open reading (that is, it is a smaller nested open reading frame). oneFrame
skipped this second ORF when it jumped ahead to the end of the first. This is actually what we want oneFrame
to do–here we’re focusing on large open reading frames and will skip small nested ones.
Use the examples above to test your two functions, restOfORF and oneFrame. You should show these function passing those tests in your final video.