Railscasts crawler (Download all screencasts easily)
I wrote this script, as I felt really annoying to download each screencast everytime I needed them. So I ended up writing script to automatically download all screencasts without hassle i.e crawler.
If you not having it. Just type this command in your terminal –
$gem install hpricot
–Rest include this script in your /lib folder
–Change the path in the script where you want to download all the screencasts
–Go to your projects development environment (script/console) and run the script by these commands–
video = Railscasts.new #new Railscasts object
video.start #will start downloading all screencasts from Railscasts
Note:
- If you stop the script in between manually, it will not download those screencasts that are already in your hard disk.
- All logs are maintained in Railsproject/log/railscasts.log.
- Handling all exceptions raised
Improvements/Suggestions are appreciated.
Thanks
And yes script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91 </pre>
<pre># Author : Akshay Gupta</pre>
<pre>#file: railscasts.rb
# First check you have all gems installed. Place the script in /lib folder and run the script.
# I don't have expertize in ruby, please tell how it can be improved.
# change the path accordingly, where you want to save path
# My working env is on MacOS, one need to make some changes if running on Windows
<code>
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'logger'
$log = Logger.new('log/railscasts.log')
$path = "/Users/akshaygupta/railsvideo/railscasts/"
$stop = false
class Railscasts
attr_accessor :url
def initialize
@@page = 1
@@url = "http://railscasts.com/episodes?page="
start
end
def url
@url = @@url+@@page.to_s
end
def start
url
build_doc
screencasts_links
download_screencasts
next_page
if !$stop
start
else
puts "Successfully done :) Enjy all the screencasts"
end
end
def build_doc
begin
$log.info("*********Fetching #{@url}***********")
@doc = Hpricot(open(@url))
rescue Exception => e
$log.debug("Problem fetching #{e}")
end
end
def screencasts_links
begin
@download_links =
(@doc/".download/a[1]").collect {|a| (a.search("[@href]").first[:href])}
$log.info(" All Download links on this page :\n #{@download_links}")
rescue
$log.info("Problem in download links")
end
end
def download_screencasts
@download_links.each do |mov|
begin
file = mov.split('/').last
res = `cd #{$path}; ls | grep "#{file}"`
if !res
$log.info("Now downloading file #{file}")
result = `cd #{$path}; wget "#{mov}"`
if result
$log.info("Successfully Downloaded #{file}")
end
else
$log.info("Already downloaded #{file}")
end
rescue Exception => e
$log.info("problem downloding file #{e}")
end
end
end
def next_page
if @@page < 17
@@page += 1
else
$log.info("All screencasts downloaded :-), Mission accomplished!!")
$stop = true
end
end
end</code></pre>
<pre><span style="font-family: monospace;">