Posted on Apr 20, 2008

Newseum: Today’s Front Pages

The other day, I caught a post on Signal Vs. Noise about using Automator in OS X to grab your favorite newspaper’s front pages from Newseum as PDFs and join them. This sounded like a great idea, so I set out to do just that.

Unfortunately for me, this particular workflow only works in OS X 10.5, and I have yet to upgrade my iMac past 10.4.11.

Undaunted, I replicated the same thing in perl. It turns out that the “Combine PDF Pages” automator action is a simple Python script. It looked semi-useful on its own, so I copied it from its normal home (/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/) to my personal bin directory for good measure. Perl source follows… please note that the >’s are rendering as literal >’s. Silly Code Highlighter!

#! /usr/bin/perl -w

use locale;
use strict;
use warnings;

#############################
#  [01-31]      [yyyy-mm-dd]
my ($day,       $date,      $tmpdir,              $tojoin) =
(`date +%d`, `date +%F`, '/Users/shelton/tmp', '');
chomp($day); chomp($date);

my ($join) =
("python /Users/shelton/bin/join.py -o '/Users/shelton/Desktop/$date.pdf'");

########################
# Define PDF Links Here
#
my (%papers) = (
# Boston Globe
0=>"http://www.newseum.org/media/dfp/pdf$day/MA_BG.pdf",
# Chicago Tribune
1=>"http://www.newseum.org/media/dfp/pdf$day/IL_CT.pdf",
# Buffalo News
2=>"http://www.newseum.org/media/dfp/pdf$day/NY_BN.pdf",
# NY Times
3=>"http://www.newseum.org/media/dfp/pdf$day/NY_NYT.pdf",
# Wall Street Journal
4=>"http://www.newseum.org/media/dfp/pdf$day/WSJ.pdf"
);

######################################
# Loop through %papers
#  -> Determine tmp output file
#  -> Download the File
#  -> Add file to string for join cmd
#
foreach my $page (sort(keys %papers)) {
my $output = "" . $tmpdir . "/" . $page . ".pdf";
`wget -q -O $output $papers{$page}`;
$tojoin .= "'" . $output . "' ";
}

##################
# Join The Files!
#
`$join $tojoin`;

#############################
# Delete All Downloaded PDFs
#
foreach my $file (sort(keys %papers)) {
my $rm = "" . $tmpdir . "/" . $file . ".pdf";
`rm -f $rm`;
}

I suppose I could have picked more interesting papers (like, for instance, the Amazônia Hoje from Belém, Brazil). I stuck to papers from my current and previous home towns (and the NYC area, because they have interesting papers). Boring? I know…

I run this daily from cron at about 7am. Of note, I also redirect all output of the script to /dev/null because the python script throws a ton of meaningless errors that don’t mar the output.

Posted on Jan 28, 2008

Testing Perl Support for Google SyntaxHighlighter

I’ve been enjoying playing with Google’s SyntaxHighlighter. Another user came up with a Perl brush file, so I thought I’d give it a shot. One note from the author of the brush file is that -> gets converted to -> rather than being translated properly. It’s definitely the highlighter that does it, too, because when you disable the brush it shows up properly.

#!/usr/bin/perl

use strict;
use Mail::Box::Manager;

#################################
# User Variables
#
# Set $password to your password (encoded):
# . only the low four bits of each character count
# . character 0x3f gets split into 0x?3 and 0x?f,
# . fill in the ?'s with anything you like
# . man ascii =)
#

my $server = 'pop.server.name';
my $username = 'username';
my $password = 'tSvx6!f>v7V%d=fU'; # the word "ChangeMe" encoded

#################################
# Get em!
#

my $mgr = Mail::Box::Manager->new;
my $pop = $mgr->open(type => 'pop3',
username => $username,
password => decodePassword($password),
server_name => $server);

my @messages = $pop->messages;

print
"33[00;30;1m:33[00;36m:33[00;36;1m: ", scalar @messages, " messages in mailbox :33[00;36m:33[00;30;1m:33[mn";

foreach my $message (@messages) {
	my $fr = $message->get('From');
	$fr =~ s/s+< .*$//; $fr =~ s/"//g;

	printf "33#733[1m %-28.28s %-45.45s33[0mn", $fr, $message->get('Subject');
	my $lbls = $message->labels;
	while( my ($k, $v) = each %$lbls ) {
		print "key: $k, value: $v.n";
	}
}

$pop->close;

#################################
# methods
#

sub decodePassword($) {
	my ($password_split) = @_;
	my $password = "";
	my ($high, $low);

	$password_split = reverse($password_split);
	while (length($password_split) > 1) {
		$high = (ord(chop($password_split)) & 0xf) < < 4;
		$low = ord(chop($password_split)) & 0xf;
		$password .= chr($high|$low);
	}

	return $password;
}

About three years ago I was living in the shell. I had a script aliased to ‘m’ that would run a perl script for each of my IMAP accounts and show me all of the unread messages. The script had been originally written by a fellow Help Desk supervisor at UB and then given colorization support by another fellow supervisor, Doug, who has since become a perl and python guru.

The trouble then was that, while this worked for my home email server, I had to change some of the socket code to work on all of the other IMAP servers I used. That wasn’t too tough since IMAP is a relatively straightforward protocol. Getting POP support to work looked to be a lot harder, however. I couldn’t find anyone that had written something similar for me to hack, so I wrote my own.

This script above uses the perl module Mail::Box which installs straight out of CPAN. It’s probably been updated since this was written, and I don’t even know if the script still works (who still uses POP??)