Newseum: Today’s Front Pages

The other day, I caught a post on Signal Vs. Noise about using Automator in OS X to grab your favorite newspaper’s front pages from Newseum as PDFs and join them. This sounded like a great idea, so I set out to do just that.

Unfortunately for me, this particular workflow only works in OS X 10.5, and I have yet to upgrade my iMac past 10.4.11.

Undaunted, I replicated the same thing in perl. It turns out that the “Combine PDF Pages” automator action is a simple Python script. It looked semi-useful on its own, so I copied it from its normal home (/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/) to my personal bin directory for good measure. Perl source follows… please note that the >’s are rendering as literal >’s. Silly Code Highlighter!

#! /usr/bin/perl -w

use locale;
use strict;
use warnings;

#############################
#  [01-31]      [yyyy-mm-dd]
my ($day,       $date,      $tmpdir,              $tojoin) =
   (`date +%d`, `date +%F`, ‘/Users/shelton/tmp’, ”);
chomp($day); chomp($date);

my ($join) =
   (”python /Users/shelton/bin/join.py -o ‘/Users/shelton/Desktop/$date.pdf’”);

########################
# Define PDF Links Here
#
my (%papers) = (
    # Boston Globe
    0=>”http://www.newseum.org/media/dfp/pdf$day/MA_BG.pdf”,
    # Chicago Tribune
    1=>”http://www.newseum.org/media/dfp/pdf$day/IL_CT.pdf”,
    # Buffalo News
    2=>”http://www.newseum.org/media/dfp/pdf$day/NY_BN.pdf”,
    # NY Times
    3=>”http://www.newseum.org/media/dfp/pdf$day/NY_NYT.pdf”,
    # Wall Street Journal
    4=>”http://www.newseum.org/media/dfp/pdf$day/WSJ.pdf”
    );

######################################
# Loop through %papers
#  -> Determine tmp output file
#  -> Download the File
#  -> Add file to string for join cmd
#
foreach my $page (sort(keys %papers)) {
    my $output = “” . $tmpdir . “/” . $page . “.pdf”;
    `wget -q -O $output $papers{$page}`;
    $tojoin .= “‘” . $output . “‘ “;
}

##################
# Join The Files!
#
`$join $tojoin`;

#############################
# Delete All Downloaded PDFs
#
foreach my $file (sort(keys %papers)) {
    my $rm = “” . $tmpdir . “/” . $file . “.pdf”;
        `rm -f $rm`;
}

I suppose I could have picked more interesting papers (like, for instance, the Amazônia Hoje from Belém, Brazil). I stuck to papers from my current and previous home towns (and the NYC area, because they have interesting papers). Boring? I know…

I run this daily from cron at about 7am. Of note, I also redirect all output of the script to /dev/null because the python script throws a ton of meaningless errors that don’t mar the output.