Railscasts crawler (Download all screencasts easily)

Bookmark this on Digg
Bookmark this on Delicious
Share on Facebook
Post to Google Buzz
Bookmark this on Yahoo Bookmark
Share on FriendFeed
[`tweetmeme` not found]

I wrote this script, as I felt really annoying to download each screencast everytime I needed them. So I ended up writing script to automatically download all screencasts without hassle i.e crawler.

Its in Ruby of course 🙂 and requires simple Hpricot gem.
If you not having it. Just type this command in your terminal —

$gem install hpricot

–Rest include this script in your /lib folder
–Change the path in the script where you want to download all the screencasts
–Go to your projects development environment (script/console) and run the script by these commands–
video = Railscasts.new #new Railscasts object
video.start #will start downloading all screencasts from Railscasts
Note:
  1. If you stop the script in between manually, it will not download those screencasts that are already in your hard disk.
  2. All logs are maintained in Railsproject/log/railscasts.log.
  3. Handling all exceptions raised

Improvements/Suggestions  are appreciated.

Thanks
And yes script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
</pre>
<pre># Author : Akshay Gupta</pre>
<pre>#file: railscasts.rb
# First check you have all gems installed. Place the script in /lib folder and run the script.
# I don't have expertize in ruby, please tell how it can be improved.
# change the path accordingly, where you want to save path
# My working env is on MacOS, one need to make some changes if running on Windows
<code>
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'logger'
$log = Logger.new('log/railscasts.log')
$path = "/Users/akshaygupta/railsvideo/railscasts/"
$stop = false

class Railscasts
attr_accessor :url

def initialize
  @@page = 1
  @@url  = "http://railscasts.com/episodes?page="
  start
end

def url
  @url = @@url+@@page.to_s
end

def start
  url
  build_doc
  screencasts_links
  download_screencasts
  next_page
  if !$stop
    start
  else
    puts "Successfully done :) Enjy all the screencasts"
  end
end

def build_doc
  begin
    $log.info("*********Fetching #{@url}***********")
    @doc = Hpricot(open(@url))
  rescue Exception =&gt; e
    $log.debug("Problem fetching #{e}")
  end
end

def screencasts_links
  begin
    @download_links =
      (@doc/".download/a[1]").collect {|a| (a.search("[@href]").first[:href])}
    $log.info(" All Download links on this page :\n #{@download_links}")
  rescue
    $log.info("Problem in download links")
  end
end

def download_screencasts
  @download_links.each do |mov|
    begin
      file = mov.split('/').last
      res = `cd #{$path}; ls | grep "#{file}"`
      if !res
        $log.info("Now downloading file #{file}")
        result = `cd #{$path}; wget "#{mov}"`
        if result
          $log.info("Successfully Downloaded #{file}")
        end
      else
        $log.info("Already downloaded #{file}")
      end
    rescue Exception =&gt; e
      $log.info("problem downloding file #{e}")
    end
  end
end

  def next_page
    if @@page &lt; 17
      @@page += 1
    else
      $log.info("All screencasts downloaded :-), Mission accomplished!!")
      $stop = true
    end
  end
end</code></pre>
<pre><span style="font-family: monospace;">

3 thoughts on “Railscasts crawler (Download all screencasts easily)

  1. Simple and effectiveI just changed the "if !res" line to "if res.empty?" because the script was telling me that I already have all the casts!thank you for the script, and keep up the good work!

Comments are closed.