require 'flickraw'
FlickRaw.api_key="api_key"
FlickRaw.shared_secret="shared_secret"
photos = flickr.interestingness.getList( :per_page => 500 )
frob = flickr.auth.getFrob
auth_url = FlickRaw.auth_url :frob => frob, :perms => 'read'
photos.each do |pic|
photo_info = flickr.photos.getInfo(:photo_id => pic.id)
photo_url = FlickRaw.url_b(photo_info)
puts "Downloading #{photo_url}"
open("flickr/" + pic.id + ".jpg", "wb") { |file|
file.write(Net::HTTP.get_response(URI.parse(photo_url)).body)
}
end
This is a personal web page. Things said here do not represent the position of my employer.
Saturday, August 06, 2011
Flickr interestingness downloader in Ruby
And this time this is the Ruby code using Flickraw gem to download large size versions of Flickr interesting photos.
S3 file bucket downloader in Ruby
Today I wanted to download files from a website that I happened to find out that stored all files in S3. By accessing the website root, I realized that it was just the response of a S3 ListBucket API call. For instance:
In order to download all files more quickly, I wrote the following Ruby program that downloads all files from this website, and I hope it can be useful for others:
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>foo.com</Name>
<Prefix/>
<Marker/>
<MaxKeys>1000</MaxKeys>
<IsTruncated>true</IsTruncated>
<Contents>
<Key>file/1</Key>
<LastModified>2011-06-09T06:29:02.000Z</LastModified>
<ETag>"5cb3930839817ff4a5c1ddf08e3fea1e"</ETag>
<Size>1440231</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>
<Contents>
<Key>file/2</Key>
<LastModified>2011-06-09T06:29:18.000Z</LastModified>
<ETag>"96fdc94d14b6d9817f80ac1e9e2049b4"</ETag>
<Size>1310</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>
</ListBucketResult>
In order to download all files more quickly, I wrote the following Ruby program that downloads all files from this website, and I hope it can be useful for others:
require 'net/http'
require 'rexml/document'
baseurl = 'foo.com'
# get the XML data as a string
xml_data = Net::HTTP.get_response(URI.parse("http://" + baseurl)).body
# extract event information
doc = REXML::Document.new(xml_data)
titles = []
links = []
Net::HTTP.start(baseurl) do |http|
doc.elements.each('ListBucketResult/Contents/Key') do |ele|
puts "Downloading " + ele.text
resp = http.get("/" + ele.text)
open("images/" + ele.text.gsub("/", "_") + ".jpg", "wb") { |file|
file.write(resp.body)
}
end
end
puts "Done"
Subscribe to:
Posts (Atom)