The other day, I caught a post on Signal Vs. Noise about using Automator in OS X to grab your favorite newspaper’s front pages from Newseum as PDFs and join them. This sounded like a great idea, so I set out to do just that.
Unfortunately for me, this particular workflow only works in OS X 10.5, and I have yet to upgrade my iMac past 10.4.11.
Undaunted, I replicated the same thing in perl. It turns out that the “Combine PDF Pages” automator action is a simple Python script. It looked semi-useful on its own, so I copied it from its normal home (/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/) to my personal bin directory for good measure. Perl source follows… please note that the >’s are rendering as literal >’s. Silly Code Highlighter!
#! /usr/bin/perl -w
use locale;
use strict;
use warnings;
#############################
# [01-31] [yyyy-mm-dd]
my ($day, $date, $tmpdir, $tojoin) =
(`date +%d`, `date +%F`, '/Users/shelton/tmp', '');
chomp($day); chomp($date);
my ($join) =
("python /Users/shelton/bin/join.py -o '/Users/shelton/Desktop/$date.pdf'");
########################
# Define PDF Links Here
#
my (%papers) = (
# Boston Globe
0=>"http://www.newseum.org/media/dfp/pdf$day/MA_BG.pdf",
# Chicago Tribune
1=>"http://www.newseum.org/media/dfp/pdf$day/IL_CT.pdf",
# Buffalo News
2=>"http://www.newseum.org/media/dfp/pdf$day/NY_BN.pdf",
# NY Times
3=>"http://www.newseum.org/media/dfp/pdf$day/NY_NYT.pdf",
# Wall Street Journal
4=>"http://www.newseum.org/media/dfp/pdf$day/WSJ.pdf"
);
######################################
# Loop through %papers
# -> Determine tmp output file
# -> Download the File
# -> Add file to string for join cmd
#
foreach my $page (sort(keys %papers)) {
my $output = "" . $tmpdir . "/" . $page . ".pdf";
`wget -q -O $output $papers{$page}`;
$tojoin .= "'" . $output . "' ";
}
##################
# Join The Files!
#
`$join $tojoin`;
#############################
# Delete All Downloaded PDFs
#
foreach my $file (sort(keys %papers)) {
my $rm = "" . $tmpdir . "/" . $file . ".pdf";
`rm -f $rm`;
}
I suppose I could have picked more interesting papers (like, for instance, the Amazônia Hoje from Belém, Brazil). I stuck to papers from my current and previous home towns (and the NYC area, because they have interesting papers). Boring? I know…
I run this daily from cron at about 7am. Of note, I also redirect all output of the script to /dev/null because the python script throws a ton of meaningless errors that don’t mar the output.