The perl programming language would be perfect for this task. Install active perl on your pc and write a perl script to extract numbers from the text file and perform your calculation.
A perl module that calculates GC content of a DNA sequence,Output the full oligo sequence and its GC content to the screen (5 points). This ensures that students can use the print statement and, with the assistance from part a, be able to perform a couple of operations (i.e. Calculate GC content) and output that as well. 1 point for the print statement (which is essentially given in the skeleton code). Feb 02, 2015 Just so you know, the task is to open and read a.fasta file (I think I've finally nailed something pretty well, hallelujah!), read each sequence, compute the relative G+C nucleotide content, and then write to a TABDelimited file and the names of the genes and their respective G+C content.
using Carp::Assert and throws_ok():Let's write a perl module GC.pm that contains a subroutine to calculate the GC content of a DNA sequence, and test it.
package GC;
use strict;
use warnings;
use Math::Round; # has the nearest() function
use Carp::Assert; # has the assert() function
use Scalar::Util qw(looks_like_number);
use base 'Exporter';
our @EXPORT_OK = qw( gc_content_one_seq );
sub gc_content_one_seq
{
my $seq = $_[0];
my $g_or_c = 0;
my $gc;
# throw an exception if the sequence is uninitialised (undefined), or an empty string:
throw Error::Simple('sequence not defined') if (!(defined($seq)));
throw Error::Simple('sequence does not exist') if ($seq eq ');
# calculate the GC content:
$seq =~ tr/[a-z]/[A-Z]/; # convert the sequence to uppercase
$g_or_c = ($seq =~ tr/G|C//); # counts number of Gs or Cs in the sequence
$gc = $g_or_c*100/length($seq);
$gc = nearest(0.01, $gc); # round to 0.01 precision
# die if the GC content is not between 0 and 100:
assert($gc >= 0.00 && $gc <= 100.00); # this should never happen, program will die if it does
# die if the GC content is not numeric:
assert(looks_like_number($gc)); # this should never happen, program will die if it does
return $gc;
}
1;
Testing the perl module using ok(), use_ok(), can_ok(), and throws_ok():
Then you can use the testing script GC.t to test the subroutines in module GC.pm:
#!perl
use strict;
use warnings;
use Test::More tests => 7;
use Error; # has Error::Simple
use Test::Exception; # has throws_ok()
# Specify the subroutines to import:
my @subs = qw ( gc_content_one_seq );
# Check we can import the subroutines:
use_ok( 'GC', @subs);
can_ok( __PACKAGE__, 'gc_content_one_seq');
# Test the gc_content_one_seq() subroutine:
my $seq = 'AAAAAAAAAAGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAGGGGGGGGGGAAAAAAAAAA';
my $gc_seq = GC::gc_content_one_seq($seq);
ok ($gc_seq 33.33, 'Test 3: check gc_content_one_seq() correctly gives GC=33.33');
$seq = 'AAAAAAAAAAGGGGGGGGGGAAAAAAAAAAAA';
$gc_seq = GC::gc_content_one_seq($seq);
ok ($gc_seq 31.25, 'Test 4: check gc_content_one_seq() correctly gives GC=31.25');
$seq = 'AAAAAAAAA';
$gc_seq = GC::gc_content_one_seq($seq);
ok ($gc_seq 0.00, 'Test 5: check gc_content_one_seq() correctly gives GC=0.00');
$seq = '; # Check an error is thrown if the sequence is an empty string:
throws_ok { $gc_seq = GC::gc_content_one_seq($seq) } 'Error::Simple','Test 6: sequence not defined';
Perl Program To Calculate Gc Content Inventory
my $seq2; # Check an error is thrown if the sequence is undefined:throws_ok { $gc_seq = GC::gc_content_one_seq($seq2) } 'Error::Simple','Test 7: sequence does not exist';
Perl Program To Calculate Gc Content Of Product
When you run the tests you see:
% prove GC.t
GC.t .. ok
All tests successful.
Files=1, Tests=7, 0 wallclock secs ( 0.04 usr 0.01 sys + 0.04 cusr 0.02 csys = 0.11 CPU)
Result: PASS
% prove -v GC.t
GC.t ..
1..7
ok 1 - use GC;
ok 2 - main->can('gc_content_one_seq')
ok 3 - Test 3: check gc_content_one_seq() correctly gives GC=33.33
ok 4 - Test 4: check gc_content_one_seq() correctly gives GC=31.25'
ok 5 - Test 5: check gc_content_one_seq() correctly gives GC=0.00
ok 6 - Test 6: sequence not defined
ok 7 - Test 7: sequence does not exist
ok
All tests successful.
Files=1, Tests=7, 0 wallclock secs ( 0.04 usr 0.01 sys + 0.02 cusr 0.02 csys = 0.09 CPU)
Result: PASS
Using the perl module in a perl script:
We can use the subroutine gc_content_one_seq() in a perl script gc.pl like this:#!/usr/bin/perl
use strict;
use warnings;
use GC;
my $gc = GC::gc_content_one_seq('ACGT');
print 'GC = $gcn';
% perl -w gc.pl
GC = 50
Thanks to my colleagues Daria Gordon and Bhavana Harsha for lots of helpful discussion about this.
PrevNext
In this part of the Perl Tutorial we are going to talkabout the for loop in Perl. Some people also call it the C-style for loop,but this construct is actually available in many programming languages.
Perl for loop
The for keyword in Perl can work in two different ways.It can work just as a foreach loop works and it can actas a 3-part C-style for loop. It is called C-style thoughit is available in many languages.
I'll describe how this works although I prefer to write the foreachstyle loop as described in the section about perl arrays.
The two keywords for and foreach can be used as synonyms.Perl will work out which meaning you had in mind.
The C-style for loop has 3 parts in the controlling section.In general it looks like this code, though you can omit any ofthe 4 parts.
As an example see this code:
The INITIALIZE part will be executed once when the execution reaches that point.
Then, immediately after that the TEST part is executed. If this is false,the whole loop is skipped. If the TEST part is true then the BODY is executed followed bythe STEP part.
(For the real meaning of TRUE and FALSE, check the boolean values in Perl.)
Calculate Gc Content Of Sequence
Then comes the TEST again and it goes on and on, as long as the TEST executes to some true value.So it looks like this:
foreach
The above loop - going from 0 to 9 can be also written in a foreach loopand I think the intention is much clearer:
As I wrote the two are actually synonyms so some people use the for keywordbut write foreach style loop like this:
The parts of the perl for loop
INITIALIZE is of course to initialize some variable. It is executed exactly once.
TEST is some kind of boolean expression that tests if the loop should stop or if it should go on.It is executed at least once. TEST is executed one more time than either BODY or STEP are.
BODY is a set of statements, usually that's what we want to do repeatedtimes though in some cases an empty BODY can also make sense.Well, probably all those cases can be rewritten in some nice way.
STEP is another set of action usually used to increment or decrement some kind of an index.This too can be left empty if, for example, we make that change inside the BODY.
Infinite loop
You can write an infinite loop using the for loop:
People usually write it with a while statement such as:
It is described in the partabout the while loop in perl.
perldoc
You can find the official description of the for-loop in theperlsyn section of thePerl documentation.
Published on 2013-03-26