Bloated RailsConf Presentation Downloader 2

Posted by Lee Marlow Mon, 21 May 2007 22:49:00 GMT

I’ve updated my downloader from earlier to include all sorts of fancy options. It no longer requires wget, it just uses open-uri. It can give the files a fancy name. It can be told where to download the files to. It will skip files that won’t download for some reason. It will even butter your toast if you can find the correct command line switch.

It’s about 3 times bigger than the previous one. But maybe you can learn a little more about optparse, hpricot, file handling, and error handling along the way.

Here it is:

#!/usr/bin/env ruby

require 'optparse'

OPTIONS = { :Verbose => false,
            :Force => false,
            :DownloadDir => '.',
            :DescriptiveFilenames => true
          }
OptionParser.new do |opts|
  opts.banner = "Usage: #{$0} [options]"

  opts.on("-v", "--[no-]verbose", "Run verbosely, default #{OPTIONS[:Verbose]}") do |verbose|
    OPTIONS[:Verbose] = verbose
  end
  opts.on("-f", "--[no-]force", "Force downloads, default #{OPTIONS[:Force]}") do |force|
    OPTIONS[:Force] = force
  end
  opts.on("-d", "--[no-]descriptive", "Use long descriptive filenames, default #{OPTIONS[:DescriptiveFilenames]}") do |long|
    OPTIONS[:DescriptiveFilenames] = long
  end
  opts.on("-p", "--path PATH", "Path to download to, default #{OPTIONS[:DownloadDir]}") do |path|
    OPTIONS[:DownloadDir] = path
  end
  opts.on_tail("-h", "--help", "Print help message") do |help|
    puts opts
    exit
  end
end.parse!

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'fileutils'

BASE_URL = 'http://www.web2expo.com'

def log(str)
  puts str if OPTIONS[:Verbose]
end

def download(href, filename)
  url = "#{BASE_URL}#{URI.escape(href)}"
  download_file = File.join(OPTIONS[:DownloadDir], filename)
  if OPTIONS[:Force] || !File.exists?(download_file)
    log "downloading #{File.basename(href)}..."
    begin
      File.open(download_file, 'w') { |f| f.write(open(url).read)}
      log "\tsaved as #{download_file}"
    rescue Object => e
      FileUtils.rm(download_file)
      $stderr.puts "ERROR downloading #{url}: #{e.message}"
    end
  else
    log "skipping #{File.basename(href)}... already downloaded as #{download_file}"
  end
end

FileUtils.mkdir_p(OPTIONS[:DownloadDir])
h = Hpricot(open("#{BASE_URL}/pub/w/51/presentations.html"))
h.search('div.presentation').each do |presentation_node|
  href = presentation_node.at('a[@href^="/presentations/rails2007/"]')[:href]
  if OPTIONS[:DescriptiveFilenames]
    name = presentation_node.at('b a').inner_text.strip
    text = presentation_node.inner_text
    speaker = text[/Speaker\(s\):\s+(.*)\s*$/, 1]
    date = Date.parse(text[/Presentation Date:\s+(.*)\s*$/, 1])
    filename = [speaker, date, name, File.basename(href)].compact.map { |s| s.to_s.strip.gsub(/[^\w\.]/, '_').squeeze('_') }.join('-')
  else
    File.basename(href)
  end
  download(href, filename)
end
Comments

Leave a response

  1. Solomon Tue, 22 May 2007 16:23:21 GMT

    Hmm… Thought you were ruby’d out? Find your second wind? :)

  2. Nicholas Wright Thu, 24 May 2007 02:27:06 GMT

    Wow, when I saw the word bloated I assumed you were joking! I’m pleased to see you coding on some side projects, even if you will only use them a few times a year! Lee, your mother and I are proud of you this day.

Comments