Another thing a user might want to know is the coordinates of a gene in the original DNA sequence. For this purpose, we provide you the following function. Cut and paste it into your project file for GeneFinder:
public Coordinate getCoordinates(String gene, String DNA) { String geneToUse = gene; int start = DNA.indexOf(geneToUse); if (start == -1) { geneToUse = reverseCompliment(gene); start = DNA.indexOf(geneToUse); } int end = start + gene.length(); return new Coordinate(start, end); }
getCoordinates
follows the convention of reporting everything in forward strand coordinates (even if the ORF is actually on the reverse complemented strand).
Here are some examples of how getCoordinates
is used, using some simple inputs:
>>> getCoordinates("GTT", "ACGTTCGA") [2, 5] // Coordinate.getStart() and Coordinate.getEnd() >>> getCoordinates("CGAA", "ACGTTCGA") [3, 7] // Coordinate.getStart() and Coordinate.getEnd()
Finally, Write a function called geneFinder(DNA, minLen)
that identifies ORFs longer than minLen
, and returns a list with information about each.
geneFinder
should first call findORFsBothStrands
to obtain a list of ORFs in the input DNA
. It should then run through this list, keeping only those ORFs which are longer than minLen
.
For each ORF which is long enough, geneFinder
should calculate
DNA
using getCoordinates
createProtein()
. However, before you use createProtein(), you’ll need to modify it so that it can take a String as opposed to an ArrayList (or you can turn the ORF into an ArrayList of codons!)These should then be placed in a Arraylist:
[beginningCoord, endCoord, proteinSequence] // int, int, string
There will be an ArrayList like this for every ORF that is long enough. You will collect these ArrayLists in another Arraylist (an ArrayList of ArrayLists).
Say our final ArrayList of Arraylists is called finalOutputList
. Its elements are ArrayLists that hold: [beginningCoord, endCoord, proteinSequence]
.
Creating your own class that can hold all of this information will be helpful.