Debugging Techniques
On this page, I will post aides and tools that Perl provides which allow you to more efficently debug your Perl code. I will post updates as we cover material necessary for understanding the tools mentioned.
BenchmarkAs you know by now, one of Perl's mottos is "There's More Than One Way To Do It" (TMTOWTDI ©). This is usually a Good Thing, but can occasionally lead to confusion. One of the most common forms of confusion that Perl's verstaility causes is wondering which of multiple ways one should use to get the job done most quickly.
Analyzing two or more chunks of code to see how they compare time-wise is known as "Benchmarking". Perl provides
a standard module that will Benchmark your code for you. It is named, unsurprisingly, Benchmark.
Benchmark provides several helpful subroutines, but the most common is called cmpthese().
This subroutine takes two arguments: The number of iterations to run each method, and a hashref containing
the code blocks (subroutines) you want to compare, keyed by a label for each block. It will run each subroutine
the number of times specified, and then print out statistics telling you how they compare.
For example, four different students might think up four different
ways of creating a two dimensional array. Which one of these ways is "best"? Let's have
Benchmark tell us:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark 'cmpthese';
sub explicit {
my @two_d = ([ ('x') x 10 ],
[ ('x') x 10 ],
[ ('x') x 10 ],
[ ('x') x 10 ],
[ ('x') x 10 ]);
}
sub new_per_loop {
my @two_d;
for (0..4){
my @inner = ('x') x 10;
push @two_d, \@inner;
}
}
sub anon_ref_per_loop {
my @two_d;
for (0..4){
push @two_d, [ ('x') x 10 ];
}
}
sub nested {
my @two_d;
for my $i (0..4){
for my $j (0..9){
$two_d[$i][$j] = 'x';
}
}
}
cmpthese (10_000, {
'Explicit' => \&explicit,
'New Array Per Loop' => \&new_per_loop,
'Anon. Ref Per Loop' => \&anon_ref_per_loop,
'Nested Loops' => \&nested,
}
);
The above code will print out the following statistics (numbers may be slightly off, of course):
Benchmark: timing 10000 iterations of Anon. Ref Per Loop, Explicit, Nested Loops, New Array Per Loop...
Anon. Ref Per Loop: 2 wallclock secs ( 1.53 usr + 0.00 sys = 1.53 CPU) @ 6535.95/s (n=10000)
Explicit: 1 wallclock secs ( 1.24 usr + 0.00 sys = 1.24 CPU) @ 8064.52/s (n=10000)
Nested Loops: 4 wallclock secs ( 4.01 usr + 0.00 sys = 4.01 CPU) @ 2493.77/s (n=10000)
New Array Per Loop: 2 wallclock secs ( 1.76 usr + 0.00 sys = 1.76 CPU) @ 5681.82/s (n=10000)
Rate Nested Loops New Array Per Loop Anon. Ref Per Loop Explicit
Nested Loops 2494/s -- -56% -62% -69%
New Array Per Loop 5682/s 128% -- -13% -30%
Anon. Ref Per Loop 6536/s 162% 15% -- -19%
Explicit 8065/s 223% 42% 23% --
The benchmark first tells us how many iterations of which subroutines it's running. It then tells us how long
each method took to run the given number of iterations. Finally, it prints out the statistics table, sorted
from slowest to fastest. The Rate column tells us how many iterations each subroutine was able to perform
per second. The remaining colums tells us how fast each method was in comparison to each of the other methods.
(For example, 'Explicit' was 223% faster than 'Nested Loops', while 'New Array Per Loop' is 13% slower than
'Anon. Ref Per Loop'). From the above, we can see that 'Explicit' is by far the fastest of the four methods.
It is, however, only 23% faster than 'Ref Per Loop', which requires far less typing and is much more easily
maintainable (if your boss suddenly tells you he'd rather have the two-d array be 20x17, and each cell init'ed to 'X'
rather than 'x', which of the two would you rather had been used?).
You can, of course, read more about this module, and see its other options, by reading: perldoc Benchmark
Perl provides several command-line options which make it possible to
write very quick and very useful "one-liners". For more information
on all the options available, refer to perldoc perlrun
-e- This option takes a string and evaluates the Perl code within. This
is the primary means of executing a one-liner
perl -e'print qq{Hello World\n};'(In windows, you may have to use double-quotes rather than single. Either way, it's probably better to use q// and qq// within your one liner, rather than remembering to escape the quotes). -l- This option has two distinct effects that work in conjunction.
First, it sets $\ (the output record terminator) to the current
value of $/ (the input record separator). In effect, this means
that every print statement will automatically have a newline
appended. Secondly, it auto-chomps any input read via the <>
operator, saving you the typing necessary to do it.
perl -le 'while (<>){ $_ .= q{testing}; print; }'The above would automatically chomp $_, and then add the newline back on at the print statement, so that "testing" appears on the same line as the entered string. -w- This is the standard way to enable warnings in your one liners.
This saves you from having to type
use warnings; -M- This option auto-
uses a given module.
perl -MData::Dumper -le'my @foo=(1..10); print Dumper(\@foo);'
-n- This disturbingly powerful option wraps your entire one-liner in
a
while (<>) { ... }loop. That is, your one-liner will be executed once for each line of each file specified on the command line, each time setting $_ to the current line and $. to current line number.
perl -ne 'print if /^\d/' foo.txt beta.txt
The above one-line of code would loop through foo.txt and beta.txt, printing out all the lines that start with a digit. ($_ is assigned via the implicitwhile (<>)loop, and both print and m// operate on $_ if an explict argument isn't given). -p- This is essentially the same thing as
-n, except that it places acontinue { print; }block after thewhile (<>) { ... }loop in which your code is wrapped. This is useful for reading through a list of files, making some sort of modification, and printing the results.
perl -pe 's/Paul/John/' email.txt
Open the file email.txt, loop through each line, replacing any instance of "Paul" with "John", and print every line (modified or not) to STDOUT -i- This one sometimes astounds people that such a thing is possible
with so little typing. -i is used in conjunction with either -n
or -p. It causes the files specified on the command line to be
edited "in-place", meaning that while you're looping through the
lines of the files, all print statements are directed back to the
original files. (That goes for both explicit
prints, as well as theprintin the continue block added by -p.)
If you give -i a string, this string will be used to create a back-up copy of the original file. Like so:
perl -pi.bkp -e's/Paul/John/' email.txt msg.txt
The above opens email.txt, replaces each line's instance of "Paul" with "John", and prints the results back to email.txt. The original email.txt is saved as email.txt.bkp. The same is then done for msg.txt
Remember that any of the command-line options listed here can also
be given at the end of the shebang in non-oneliners. (But please do
not start using -w in your real programs - use warnings;
is still preferred because of its lexical scope and configurability).
Data::Dumper
The standard Data::Dumper module is very useful for examining
exactly what is contained in your data structure (be it hash,
array, or object (when we come to them) ). When you
use this module, it exports one function, named
Dumper. This function takes a reference to a data
structure and returns a nicely formatted description of what that
structure contains.
#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my @foo = (5..10); #add one element to the end of the array #do you see the error? $foo[@foo+1] = 'last'; print Dumper(\@foo);
When run, this program shows you exactly what is inside @foo:
$VAR1 = [
5,
6,
7,
8,
9,
10,
undef,
'last'
];
__DATA__ & <DATA>Perl uses the __DATA__ marker as a pseudo-datafile. You can use this marker to write quick tests which would involve finding a file name, opening that file, and reading from that file. If you just want to test a piece of code that requires a file to be read (but don't want to test the actual file opening and reading), place the data that would be in the input file under the __DATA__ marker. You can then read from this pseudo-file using <DATA>, without bothering to open an actual file:
#!/usr/bin/env perl
use strict;
use warnings;
while (my $line = <DATA>) {
chomp $line;
print "Size of line $.: ", length $line, "\n";
}
__DATA__
hello world
42
abcde
The above program would print:
Size of line 1: 11 Size of line 2: 2 Size of line 3: 5
$.The $. variable keeps track of the line numbers of
the file currently being processed via a while (<$fh>) {
... } loop. More explicitly, it is the number of the last
line read of the last file read.
__FILE__ & __LINE__These are two special markers that return, respectively, the name of the file Perl is currently executing, and the Line number where it resides. These can be used in your own debugging statements, to remind yourself where your outputs were in the source code:
print "On line " . __LINE__ . " of file " . __FILE__ . ", \$foo = $foo\n";
Note that neither of these markers are variables, so they cannot be interpolated in a double-quoted string
$!The special $! variable contains the last error
reported by the operating system. If a system call (such as opening
a file, or changing a directory) fails, this variable will contain
a message informing the user of why it failed.
This variable should be included in all
warn/die messages having to do with system
calls.
open my $fh, '<', $file or die "Could not open file:
$!\n";
will print out, for example: "Could not open file: Permission
denied"
warn() & die()These are the most basic of all debugging techniques.
warn() takes a list of strings, and prints them to
STDERR. If the last element of the list does not end in a newline,
warn() will also print the current filename and line
number on which the warning occurred. Execution then proceeds as
normal.
die() is identical to warn(), with
one major exception - the program exits after printing the list of
strings.
All debugging statements should make use of either
warn() or die() rather than
print(). This will insure you see your debugging
output even if STDOUT has been redirected, and will give you the
helpful clues of exactly where in your code the warning
occurred.
