Javadoc-style Perl Documentation Generator

I’m not a huge fan of Java, but I really do appreciate their standards for code comments. I use them in my PHP, C+\+, and Perl code. There is obviously some changing that needs to happen becuase those languages don’t all comment the same, but for the most part it works really well.

Today I needed to write up a document on how one of my larger scripts/programs worked. I wanted to include the script architecture, but didn’t have a good way to do it. Then I remembered something one of my favorite open source projects does. MediaWiki is doing continuous integration and so they use (as I know other OSS projects do) Jenkins to do post-commit validation. Specifically relating to this post, they use the Jenkins scripts to verify that the comments for each function are in the right format and contain the right data types, etc. In application to my project at hand, in my Perl scripts this would look something like…​

# This subroutine does something cool
# @param $_[0] string This is a test parameter
# @param $_[1] array This is an array reference of mic checks
# @return bool Success or failure of this function's awesomeness

The commit validation scripts Jenkins uses would check if the subroutine definition did in fact require two parameters and that the function returned boolean. Granted, since Perl isn’t strongly typed, this has to be a bit looser than it would for other languages (C+\+, C#, etc), but you get the idea. This documentation style is still awesome (at least, I think it is)

What I needed today though was a script that parsed my other scripts, read in all the subroutines (Perl, remember?), parsed out the comments for each one, and returned HTML using inline styles so I could copy it into a Word (well, LibreOffice Writer) doc without losing formatting. That said, here’s the quick and dirty.

Note: Ironically, I just realized that this script isn’t commented.

#!/usr/bin/env perl
use warnings;
use strict;

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <>.

if( scalar( @ARGV ) < 1 ) {
  print "\nPlease specify a file to parse.\n\n";
  exit( 0 );

main( @ARGV );

sub main {
  my $path = $_[0];
  # Open our file and do some science!
  open FILE, $path or die $!;
  my @lines = <FILE>;
  close( FILE );
  my @subs;
  my $body = '';
  for( my $i = 0; $i < scalar( @lines ); $i++ ) {
    my $line = $lines[$i];
    # Remove leading spaces
    $line =~ s/^[\t\s]+//;
    # Remove  multiple inner space
    $line =~ s/[\t\s]+/ /;
    if( $line =~ /^sub ([\d\w_-]+)[\s{]+$/ ) {
      my $h2 = "<h2 style=\"margin:0px; padding:0px; display:inline; font-size:1.2em; color:#444;\">";
      $body .= '<br />' . $h2 . $1 . "()</h2>\n";
      # We've found one!
      my $comments = '';
      # Now we go backwards, nabbing the comments as we go
      for( my $n = $i - 1; $n > 0; $n-- ) {
        if( $lines[$n] =~ /#[\w\d\s\t]*/ ) {
          # Becase we're now reading backwards,
          # we need to prepend
          $comments = lineToHtml( $lines[$n] ) . $comments;
        } else {
          # Exit and continue
          $n = 0;
      my $pStyle = "<p style=\"display:block; background-color:#eee; margin:0px;";
      $pStyle .= "padding:5px; border:1px dashed #aaa; width:90%; font-size:9pt;\">";
      $comments = $pStyle . $comments . "</p>\n";
      $body .= $comments;
  $body .= "\n\n";
  print bodyToHtml( $body );
  exit( 0 );

sub bodyToHtml {
  my $body = $_[0];
  my $bodyHeader = '<!DOCTYPE html />';
  $bodyHeader .= '<html><head>';
  $bodyHeader .= '</head><body style="font-family:sans-serif;">';

  my $bodyFooter = '</body></html>';
  return $bodyHeader . $body . $bodyFooter;

sub lineToHtml {
  my $line = $_[0];

  my $formatted = $line;
  $formatted =~ s/^[#\s\t]+//;
  $formatted =~ s/\n+//;
  if( $formatted =~ /^\@param/ ) {
    $formatted =~ s/\@param/<strong>\@param<\/strong>/;
    $formatted = '<br /><span style="display:block; color:#499;">' . $formatted . '</span>';
  } elsif( $formatted =~ /^\@return/ ) {
    $formatted =~ s/\@return/<strong>\@return<\/strong>/;
    $formatted = '<br /><span style="display:block; color:#494; margin-top:10px;">' . $formatted . '</span>';
  $formatted =~ s/ (int|hash|array|string|boolean|bool) / <span style="color:#949; font-style:italic;">$1<\/span> /i;
  $formatted .= "\n";
  return $formatted;