1 # Copyright (C) 2013 Arun Persaud <apersaud@lbl.gov>
2 # W. Trevor King <wking@tremily.us>
4 # This file is part of rss2email.
6 # rss2email is free software: you can redistribute it and/or modify it under
7 # the terms of the GNU General Public License as published by the Free Software
8 # Foundation, either version 2 of the License, or (at your option) version 3 of
11 # rss2email is distributed in the hope that it will be useful, but WITHOUT ANY
12 # WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
13 # A PARTICULAR PURPOSE. See the GNU General Public License for more details.
15 # You should have received a copy of the GNU General Public License along with
16 # rss2email. If not, see <http://www.gnu.org/licenses/>.
18 """Simple example for a post-process filter in rss2email
20 A post-process call can be used to change the content of each entry
21 before rss2email sends the email out. Using this you can add filters to
22 rss2email that, for example, remove advertising or links to
23 Facebook/Google+ or other unwanted information. Or you could add those
24 links in case you want them. ;)
26 A hook is added by defining the variable ``post-process`` in the
27 config file. It takes two arguments, the module and the function to
30 post-process = rss2email.post_process.prettify process
32 There's nothing special about the ``rss2email.post_process`` package.
33 If you write your own post-processing hooks, you can put them in any
34 package you like. If Python can run::
36 from some.package import some_hook
40 post-process = some.package some_hook
42 This means that your hook can live in any package in your
43 ``PYTHONPATH``, in a package in your per-user site-packages
46 The hook function itself has 5 arguments: ``feed``, ``parsed``,
47 ``entry``, ``guid``, ``message`` and needs to return a ``message`` or
48 ``None`` to skip the feed item.
50 The post-process variable can be defined globally or on a per-feed basis.
52 Examples in this file:
55 a filter that prettifies the html
58 the actual post_process function that you need to call in
63 # import modules you need
64 from bs4 import BeautifulSoup
68 """Use BeautifulSoup to pretty-print the html
70 A very simple function that decodes the entry into a unicode
71 string and then calls BeautifulSoup on it and afterwards encodes
75 encoding = message.get_charsets()[0]
76 content = str(message.get_payload(decode=True), encoding)
79 soup = BeautifulSoup(content)
80 content = soup.prettify()
82 # BeautifulSoup uses unicode, so we perhaps have to adjust the encoding.
83 # It's easy to get into encoding problems and this step will prevent
85 encoding = rss2email.email.guess_encoding(content, encodings=feed.encodings)
87 # clear CTE and set message. It can be important to clear the CTE
88 # before setting the payload, since the payload is only re-encoded
89 # if CTE is not already set.
90 del message['Content-Transfer-Encoding']
91 message.set_payload(content, charset=encoding)
95 def process(feed, parsed, entry, guid, message):
96 message = pretty(message)
97 # you could add several filters in here if you want to
99 # we need to return the message, if we return False,
100 # the feed item will be skipped