Topic 1 - Variables and Data Types
Topic 2 - Conditionals and Strings
Topic 3 - Loops
Topic 4 - Arrays
Topic 5 - File Handling
Semester 1 Projects
Topic 6 - Classes/Objects and Methods
Topic 7 - ArrayLists
Semester Projects

Part 2f: Using the Gene Finder

Now its time to apply our method to data and see what we get.

First In Main, read X73525.fna as we saw previously. You can find this code in Main under the PART 2 section. Also notice that we created a GeneFinder object for you called myFinder.

Now, apply our gene-finding strategy to it.

The first step is to decide on a threshold for geneFinder. Our strategy is to determine what the longest ORF we see in noncoding sequence is, and then to define ORFs longer than this as putative genes. Run myFinder.longestORFNoncoding(X73525,1500). This should be a relatively conservative way to pick a threshold.

Now run geneFinder using the threshold value you get for minLen.

>>> ArrayList geneList = myfinder.geneFinder(X73525, put_your_minLen_value_here )

Next write a short function printGenes(geneList) which prints geneList in a nice human-readable form. (e.g. this should print out the coordinates and the protein sequence for each gene in a way that’s easy for a user to read). Use it to print your results.

Note that our gene-finding strategy is very simple, and is also relatively conservative. As a result, we are likely to miss some true genes which are short. But we hope that the genes our method does find are real ones.

Next, Pick one of the genes from your output and blast its protein sequence. To start, open this link in a separate window: NCBI Blast. (On a mac, you can do that by holding down the “control” key while clicking on the link. You’ll then have the option to open the link in a new window.)

  1. Go down to Basic Blast and follow the link shown below:

2. There should be a large box into which you can now input a protein sequence. You can put the whole string from X73525.fna in this box.

3. You then specify the coordinate for the gene that you found.

4. Then scroll down to the bottom of the page and click the “Blast” button.

When you blast a sequence, the application carries out a search through its databases for known sequences that either match your sequence or are close to it, and returns those to you in a list. This might take 30 or more seconds.

The blast output page will contain some graphics at the top. Scroll down past these to the section labelled “Descriptions”. The first links in this section will be the closest sequences blast could find to your original input. Clicking on one of these will take you to a page with information about that sequence, including the name and what organism it can be found in. The right pane on that page has many useful links. Based on what you learn from these top blast hits, briefly describe the likely function of your gene in your evaluation file.