I maintain a couple of small "planet" sites. If you are not familiar with planets, they are sites that aggregate RSS/Atom feeds for a group of people related somehow. It makes for a nice, single, thematic feed. Recently, when changing them from one server to another, everything broke. Old posts were new, feeds that had not been updated in 2 years were always with all its posts on top... a disaster. I could have gone to the old server, and started debugging why rawdog was doing that, or switch to planet, or look for other software, or use an online aggregator. Instead, I started thinking... I had written a few RSS aggregators in the past... Feedparser is again under active development... rawdog and planet seem to be pretty much abandoned... how *hard* could it be to implement the minimal planet software? Well, not all that hard, that's how hard it was. Like it took me 4 hours, and was not even difficult. One reason why this was easier than what planet and rawdog achieved is that I am not doing a static site generator, because `I already have one `_ so all I need this program (I called it Smiljan) to do is: * Parse a list of feeds and store it in a database if needed. * Download those feeds (respecting etag and modified-since). * Parse those feeds looking for entries (feedparser does that). * Load those entries (or rather, a tiny subset of their data) in the database. * Use the entries to generate a set of files to feed Nikola * Use nikola to generate and deploy the site. So, here is the final result: http://planeta.python.org.ar which still needs theming and a lot of other stuff, but works. I implemented Smiljan as 3 doit tasks, which makes it very easy to integrate with Nikola (if you know Nikola: add "from smiljan import \*" in your dodo.py and a feeds file with the feed list in rawdog format) and voilá, running this updates the planet:: doit load_feeds update_feeds generate_posts deploy Here is the code for smiljan.py, currently at the "gross hack that kinda works" stage. Enjoy! .. code-block:: python # -*- coding: utf-8 -*- import codecs import datetime import glob import os import sys import feedparser import peewee class Feed(peewee.Model): name = peewee.CharField() url = peewee.CharField(max_length = 200) last_status = peewee.CharField() etag = peewee.CharField(max_length = 200) last_modified = peewee.DateTimeField() class Entry(peewee.Model): date = peewee.DateTimeField() feed = peewee.ForeignKeyField(Feed) content = peewee.TextField(max_length = 20000) link = peewee.CharField(max_length = 200) title = peewee.CharField(max_length = 200) guid = peewee.CharField(max_length = 200) Feed.create_table(fail_silently=True) Entry.create_table(fail_silently=True) def task_load_feeds(): def add_feed(name, url): f = Feed.create( name=name, url=url, etag='caca', last_modified=datetime.datetime(1970,1,1), ) f.save() feed = name = None for line in open('feeds'): line = line.strip() if line.startswith('feed'): feed = line.split(' ')[2] if line.startswith('define_name'): name = ' '.join(line.split(' ')[1:]) if feed and name: f = Feed.select().where(name=name, url=feed) if not list(f): yield { 'name': name, 'actions': ((add_feed,(name, feed)),), 'file_dep': ['feeds'], } name = feed = None def task_update_feeds(): def update_feed(feed): modified = feed.last_modified.timetuple() etag = feed.etag parsed = feedparser.parse(feed.url, etag=etag, modified=modified ) try: feed.last_status = str(parsed.status) except: # Probably a timeout # TODO: log failure return if parsed.feed.get('title'): print parsed.feed.title else: print feed.url feed.etag = parsed.get('etag', 'caca') modified = tuple(parsed.get('date_parsed', (1970,1,1)))[:6] print "==========>", modified modified = datetime.datetime(*modified) feed.last_modified = modified feed.save() # No point in adding items from missinfg feeds if parsed.status > 400: # TODO log failure return for entry_data in parsed.entries: print "=========================================" date = entry_data.get('updated_parsed', None) if date is None: date = entry_data.get('published_parsed', None) if date is None: print "Can't parse date from:" print entry_data return False date = datetime.datetime(*(date[:6])) title = "%s: %s" %(feed.name, entry_data.get('title', 'Sin título')) content = entry_data.get('description', entry_data.get('summary', 'Sin contenido')) guid = entry_data.get('guid', entry_data.link) link = entry_data.link print repr([date, title]) entry = Entry.get_or_create( date = date, title = title, content = content, guid=guid, feed=feed, link=link, ) entry.save() for feed in Feed.select(): yield { 'name': feed.name.encode('utf8'), 'actions': ((update_feed,(feed,)),), } def task_generate_posts(): def generate_post(entry): meta_path = os.path.join('posts',str(entry.id)+'.meta') post_path = os.path.join('posts',str(entry.id)+'.txt') with codecs.open(meta_path, 'wb+', 'utf8') as fd: fd.write(u'%s\n' % entry.title.replace('\n', ' ')) fd.write(u'%s\n' % entry.id) fd.write(u'%s\n' % entry.date.strftime('%Y/%m/%d %H:%M')) fd.write(u'\n') fd.write(u'%s\n' % entry.link) with codecs.open(post_path, 'wb+', 'utf8') as fd: fd.write(u'.. raw:: html\n\n') content = entry.content if not content: content = 'Sin contenido' for line in content.splitlines(): fd.write(u' %s' % line) for entry in Entry.select().order_by(('date', 'desc')): yield { 'name': entry.id, 'actions': ((generate_post, (entry,)),), }