Perl

Menu

Debugging Techniques

On this page, I will post aides and tools that Perl provides which allow you to more efficently debug your Perl code. I will post updates as we cover material necessary for understanding the tools mentioned.

Benchmark

As you know by now, one of Perl's mottos is "There's More Than One Way To Do It" (TMTOWTDI ©). This is usually a Good Thing, but can occasionally lead to confusion. One of the most common forms of confusion that Perl's verstaility causes is wondering which of multiple ways one should use to get the job done most quickly.

Analyzing two or more chunks of code to see how they compare time-wise is known as "Benchmarking". Perl provides a standard module that will Benchmark your code for you. It is named, unsurprisingly, Benchmark. Benchmark provides several helpful subroutines, but the most common is called cmpthese(). This subroutine takes two arguments: The number of iterations to run each method, and a hashref containing the code blocks (subroutines) you want to compare, keyed by a label for each block. It will run each subroutine the number of times specified, and then print out statistics telling you how they compare.

For example, four different students might think up four different ways of creating a two dimensional array. Which one of these ways is "best"? Let's have Benchmark tell us:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark 'cmpthese';

sub explicit {
    my @two_d = ([ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ]);
}

sub new_per_loop {
    my @two_d;
    for (0..4){
        my @inner = ('x') x 10;
        push @two_d, \@inner;
    }
}

sub anon_ref_per_loop {
    my @two_d;
    for (0..4){
        push @two_d, [ ('x') x 10 ];
    }
}

sub nested {
    my @two_d;
    for my $i (0..4){
        for my $j (0..9){
            $two_d[$i][$j] = 'x';
        }
    }
}
cmpthese (10_000, {
                 'Explicit'           => \&explicit,
                 'New Array Per Loop' => \&new_per_loop,
                 'Anon. Ref Per Loop' => \&anon_ref_per_loop,
                 'Nested Loops'       => \&nested,
             }
      );
The above code will print out the following statistics (numbers may be slightly off, of course):
Benchmark: timing 10000 iterations of Anon. Ref Per Loop, Explicit, Nested Loops, New Array Per Loop...
Anon. Ref Per Loop:  2 wallclock secs ( 1.53 usr +  0.00 sys =  1.53 CPU) @ 6535.95/s (n=10000)
Explicit:  1 wallclock secs ( 1.24 usr +  0.00 sys =  1.24 CPU) @ 8064.52/s (n=10000)
Nested Loops:  4 wallclock secs ( 4.01 usr +  0.00 sys =  4.01 CPU) @ 2493.77/s (n=10000)
New Array Per Loop:  2 wallclock secs ( 1.76 usr +  0.00 sys =  1.76 CPU) @ 5681.82/s (n=10000)
                     Rate Nested Loops New Array Per Loop Anon. Ref Per Loop Explicit
Nested Loops       2494/s           --               -56%               -62%     -69%
New Array Per Loop 5682/s         128%                 --               -13%     -30%
Anon. Ref Per Loop 6536/s         162%                15%                 --     -19%
Explicit           8065/s         223%                42%                23%       --

The benchmark first tells us how many iterations of which subroutines it's running. It then tells us how long each method took to run the given number of iterations. Finally, it prints out the statistics table, sorted from slowest to fastest. The Rate column tells us how many iterations each subroutine was able to perform per second. The remaining colums tells us how fast each method was in comparison to each of the other methods. (For example, 'Explicit' was 223% faster than 'Nested Loops', while 'New Array Per Loop' is 13% slower than 'Anon. Ref Per Loop'). From the above, we can see that 'Explicit' is by far the fastest of the four methods. It is, however, only 23% faster than 'Ref Per Loop', which requires far less typing and is much more easily maintainable (if your boss suddenly tells you he'd rather have the two-d array be 20x17, and each cell init'ed to 'X' rather than 'x', which of the two would you rather had been used?).

You can, of course, read more about this module, and see its other options, by reading: perldoc Benchmark

Command-line options

Perl provides several command-line options which make it possible to write very quick and very useful "one-liners". For more information on all the options available, refer to perldoc perlrun

-e
This option takes a string and evaluates the Perl code within. This is the primary means of executing a one-liner
perl -e'print qq{Hello World\n};'
(In windows, you may have to use double-quotes rather than single. Either way, it's probably better to use q// and qq// within your one liner, rather than remembering to escape the quotes).
-l
This option has two distinct effects that work in conjunction. First, it sets $\ (the output record terminator) to the current value of $/ (the input record separator). In effect, this means that every print statement will automatically have a newline appended. Secondly, it auto-chomps any input read via the <> operator, saving you the typing necessary to do it.
perl -le 'while (<>){ $_ .= q{testing};  print; }'
The above would automatically chomp $_, and then add the newline back on at the print statement, so that "testing" appears on the same line as the entered string.
-w
This is the standard way to enable warnings in your one liners. This saves you from having to type use warnings;
-M
This option auto-uses a given module.
perl -MData::Dumper -le'my @foo=(1..10); print Dumper(\@foo);'
-n
This disturbingly powerful option wraps your entire one-liner in a while (<>) { ... } loop. That is, your one-liner will be executed once for each line of each file specified on the command line, each time setting $_ to the current line and $. to current line number.
perl -ne 'print if /^\d/' foo.txt beta.txt
The above one-line of code would loop through foo.txt and beta.txt, printing out all the lines that start with a digit. ($_ is assigned via the implicit while (<>) loop, and both print and m// operate on $_ if an explict argument isn't given).
-p
This is essentially the same thing as -n, except that it places a continue { print; } block after the while (<>) { ... } loop in which your code is wrapped. This is useful for reading through a list of files, making some sort of modification, and printing the results.
perl -pe 's/Paul/John/' email.txt
Open the file email.txt, loop through each line, replacing any instance of "Paul" with "John", and print every line (modified or not) to STDOUT
-i
This one sometimes astounds people that such a thing is possible with so little typing. -i is used in conjunction with either -n or -p. It causes the files specified on the command line to be edited "in-place", meaning that while you're looping through the lines of the files, all print statements are directed back to the original files. (That goes for both explicit prints, as well as the print in the continue block added by -p.)
If you give -i a string, this string will be used to create a back-up copy of the original file. Like so:
perl -pi.bkp -e's/Paul/John/' email.txt msg.txt
The above opens email.txt, replaces each line's instance of "Paul" with "John", and prints the results back to email.txt. The original email.txt is saved as email.txt.bkp. The same is then done for msg.txt

Remember that any of the command-line options listed here can also be given at the end of the shebang in non-oneliners. (But please do not start using -w in your real programs - use warnings; is still preferred because of its lexical scope and configurability).

Data::Dumper

The standard Data::Dumper module is very useful for examining exactly what is contained in your data structure (be it hash, array, or object (when we come to them) ). When you use this module, it exports one function, named Dumper. This function takes a reference to a data structure and returns a nicely formatted description of what that structure contains.

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my @foo = (5..10);
#add one element to the end of the array
#do you see the error?
$foo[@foo+1] = 'last';

print Dumper(\@foo);

When run, this program shows you exactly what is inside @foo:

$VAR1 = [
          5,
          6,
          7,
          8,
          9,
          10,
          undef,
          'last'
        ];
__DATA__ & <DATA>

Perl uses the __DATA__ marker as a pseudo-datafile. You can use this marker to write quick tests which would involve finding a file name, opening that file, and reading from that file. If you just want to test a piece of code that requires a file to be read (but don't want to test the actual file opening and reading), place the data that would be in the input file under the __DATA__ marker. You can then read from this pseudo-file using <DATA>, without bothering to open an actual file:

#!/usr/bin/env perl
use strict;
use warnings;

while (my $line = <DATA>) {
  chomp $line;
  print "Size of line $.:  ", length $line, "\n";
}

__DATA__
hello world
42
abcde

The above program would print:

Size of line 1: 11
Size of line 2: 2
Size of line 3: 5
$.

The $. variable keeps track of the line numbers of the file currently being processed via a while (<$fh>) { ... } loop. More explicitly, it is the number of the last line read of the last file read.

__FILE__ & __LINE__

These are two special markers that return, respectively, the name of the file Perl is currently executing, and the Line number where it resides. These can be used in your own debugging statements, to remind yourself where your outputs were in the source code:

  print "On line " . __LINE__ . " of file " . __FILE__ . ", \$foo = $foo\n";
    

Note that neither of these markers are variables, so they cannot be interpolated in a double-quoted string

$!

The special $! variable contains the last error reported by the operating system. If a system call (such as opening a file, or changing a directory) fails, this variable will contain a message informing the user of why it failed. This variable should be included in all warn/die messages having to do with system calls.

open my $fh, '<', $file or die "Could not open file: $!\n";
will print out, for example: "Could not open file: Permission denied"

warn() & die()

These are the most basic of all debugging techniques. warn() takes a list of strings, and prints them to STDERR. If the last element of the list does not end in a newline, warn() will also print the current filename and line number on which the warning occurred. Execution then proceeds as normal.

die() is identical to warn(), with one major exception - the program exits after printing the list of strings.

All debugging statements should make use of either warn() or die() rather than print(). This will insure you see your debugging output even if STDOUT has been redirected, and will give you the helpful clues of exactly where in your code the warning occurred.

Perl Quotes
Perl Quotes