Sunday, September 07, 2008

flickr: Download photos from a group pool in bulk

It is sometimes useful to download all photos from a group pool in one fell swoop. Rather than clicking through all the photos by hand in a web browser, we can use the Flickr API to grab the photos quickly. Using the flickrapi python package, this looks something like the following:

# 
#  flickr_groupdump.py
#  Download photos from the Flickr group pool in bulk
#  
#  Created by Jakob van Santen on 2008-09-07.
# 
import flickrapi, os, re, urllib

# api key and secret
api_key = 'your api key'
api_secret = 'your api key secret'
flickr_username = 'your flickr (yahoo) username'

# the url of the group pool to be dumped
group_url = 'http://www.flickr.com/groups/876344@N22/pool/'

# initialize and get authentication token
flickr = flickrapi.FlickrAPI(api_key,api_secret,username=flickr_username)
(token,frob) = (token, frob) = flickr.get_token_part_one(perms='read')
if not token: raw_input("Press ENTER after you authorized this program")
flickr.get_token_part_two((token, frob))

# look up the group
group = flickr.urls_lookupGroup(url=group_url)
group_id = group.group[0]['id']
group_name = group.group[0].groupname[0].text

# get all the photos in the pool
page = 1
pages = page+1
group_list = []
while page <= pages:
    photos = flickr.groups_pools_getPhotos(group_id=group_id,extras='date_taken,original_format',page=page)
    page = int(photos.photos[0]['page'])
    pages = int(photos.photos[0]['pages'])
    print 'Got page', page, 'of', pages
    page += 1
    photolist = photos.photos[0].photo
    group_list += [p.attrib for p in photolist]
    
# classify the photo list by user
owners = {}
for photo in group_list:
    o = photo['ownername']
    if owners.has_key(o):
        owners[o].append(photo)
    else:
        owners[o] = []

# for each user who uploaded photos to the pool:
for owner_name in owners.keys():
    owners[owner_name].sort(lambda x,y: cmp(x['datetaken'],y['datetaken']))
    target = owners[owner_name]
    try:
        os.makedirs(group_name + '/' + owner_name)
    except:
        None
    # dump every photo in the pool to a file
    for index,photo in enumerate(target):
        existing_fname = filter(lambda fn: re.match("^%s .*" % photo['id'],fn),os.listdir(group_name + '/' + owner_name))
        if existing_fname == []: #photo doesn't yet exist, so download it
            sizes = flickr.photos_getSizes(photo_id=photo['id'])
            biggest = sizes.sizes[0].size[-1].attrib
            url = biggest['source']
            format = re.match(".*\.(\w{3})$",url).group(1)
            fname = group_name + '/' + owner_name + '/' + '%s %s (%s).%s' % (photo['id'],photo['title'],owner_name,format)
            def reporter(block_count,block_size,total_size):
                if block_count == 0:
                    reporter.datasize = total_size
            urllib.urlretrieve(url,fname,reporter)
            print os.path.basename(fname), 'downloaded', '%.3f MB' % (reporter.datasize/2.0**20)
        else: # rename the file for giggles
            new_fname = re.sub("^(%s) .* (\(%s\))(\.\w+)$" % (photo['id'],owner_name),r'\1 %s \2\3' % photo['title'],existing_fname[0])
            os.rename(group_name + '/' + owner_name + '/' + existing_fname[0], group_name + '/' + owner_name + '/' + new_fname)
            print os.path.basename(new_fname), 'exists'

You’ll need to get your own API key from flickr and insert it at the top of the script. Next, paste in the URL of the group photo pool. On the first run, you’ll have to authorize the key to access your flickr account if you haven’t already done so.

The script looks up the group based on its URL and builds a list of all photos in the pool. Then, it classifies the photos by owner. For each photo, the script fetches the largest available size and downloads it. Photos from each user are put in a different subfolder. The file name for each photo includes the photo ID from flickr, so if the photo already exists, it is skipped. If you run the script multiple times, it will only fetch the newly-added photos from the group pool.

The photo-gathering section can easily be modified to download photos from a particular user or set instead of a group.

No comments: