Newseum: Today's Front Pages

The other day, I caught a post on Signal Vs. Noise about using Automator in OS X to grab your favorite newspaper's front pages from Newseum as PDFs and join them. This sounded like a great idea, so I set out to do just that.

Unfortunately for me, this particular workflow only works in OS X 10.5, and I have yet to upgrade my iMac past 10.4.11.

Undaunted, I replicated the same thing in perl. It turns out that the "Combine PDF Pages" automator action is a simple Python script. It looked semi-useful on its own, so I copied it from its normal home (/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/) to my personal bin directory for good measure. Perl source follows... please note that the >'s are rendering as literal >'s. Silly Code Highlighter!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#! /usr/bin/perl -w

use locale;
use strict;
use warnings;

#############################
#  [01-31]      [yyyy-mm-dd]
my ($day,       $date,      $tmpdir,              $tojoin) =
(`date +%d`, `date +%F`, '/Users/shelton/tmp', '');
chomp($day); chomp($date);

my ($join) =
("python /Users/shelton/bin/join.py -o '/Users/shelton/Desktop/$date.pdf'");

########################
# Define PDF Links Here
#
my (%papers) = (
# Boston Globe
0=>"http://www.newseum.org/media/dfp/pdf$day/MA_BG.pdf",
# Chicago Tribune
1=>"http://www.newseum.org/media/dfp/pdf$day/IL_CT.pdf",
# Buffalo News
2=>"http://www.newseum.org/media/dfp/pdf$day/NY_BN.pdf",
# NY Times
3=>"http://www.newseum.org/media/dfp/pdf$day/NY_NYT.pdf",
# Wall Street Journal
4=>"http://www.newseum.org/media/dfp/pdf$day/WSJ.pdf"
);

######################################
# Loop through %papers
#  -> Determine tmp output file
#  -> Download the File
#  -> Add file to string for join cmd
#
foreach my $page (sort(keys %papers)) {
my $output = "" . $tmpdir . "/" . $page . ".pdf";
`wget -q -O $output $papers{$page}`;
$tojoin .= "'" . $output . "' ";
}

##################
# Join The Files!
#
`$join $tojoin`;

#############################
# Delete All Downloaded PDFs
#
foreach my $file (sort(keys %papers)) {
my $rm = "" . $tmpdir . "/" . $file . ".pdf";
`rm -f $rm`;
}

I suppose I could have picked more interesting papers (like, for instance, the Amazônia Hoje from Belém, Brazil). I stuck to papers from my current and previous home towns (and the NYC area, because they have interesting papers). Boring? I know...

I run this daily from cron at about 7am. Of note, I also redirect all output of the script to /dev/null because the python script throws a ton of meaningless errors that don't mar the output.

Apr 20th, 2008

Comments