summaryrefslogtreecommitdiff
path: root/test
diff options
context:
space:
mode:
authorlnu <lnu@f70e237a-67f3-0310-a06c-d2b8a7116972>2005-09-21 12:38:57 +0000
committerlnu <lnu@f70e237a-67f3-0310-a06c-d2b8a7116972>2005-09-21 12:38:57 +0000
commit37baef8f604bd509047d580b5dfb33d9c3f38031 (patch)
tree850e4449b96a9ad8865f61940bee397df1c333b9 /test
parent07c53c0c1d03721c68b3cf292f0fa8402c82c006 (diff)
downloadfeed2imap-37baef8f604bd509047d580b5dfb33d9c3f38031.tar.gz
feed2imap-37baef8f604bd509047d580b5dfb33d9c3f38031.tar.bz2
feed2imap-37baef8f604bd509047d580b5dfb33d9c3f38031.zip
fix another problem with escaped html
git-svn-id: svn+ssh://svn.gna.org/svn/feed2imap/trunk/feed2imap@67 f70e237a-67f3-0310-a06c-d2b8a7116972
Diffstat (limited to 'test')
-rw-r--r--test/parserdata/bmc_escapedhtml.output16
-rw-r--r--test/parserdata/gnomefr_escapedhtml.output4
-rw-r--r--test/parserdata/hadess.output184
-rw-r--r--test/parserdata/hadess.xml209
-rw-r--r--test/parserdata/rss091_utf8_dirtyhtml_advogato.output4
-rw-r--r--test/parserdata/rss1_makii.output4
-rw-r--r--test/parserdata/rss1_utf8_html_planet.output20
-rwxr-xr-xtest/tc_converters_text2html.rb11
-rwxr-xr-xtest/tc_parser.rb1
9 files changed, 429 insertions, 24 deletions
diff --git a/test/parserdata/bmc_escapedhtml.output b/test/parserdata/bmc_escapedhtml.output
index 8e3b998..4b37ec7 100644
--- a/test/parserdata/bmc_escapedhtml.output
+++ b/test/parserdata/bmc_escapedhtml.output
@@ -9,17 +9,17 @@ Creator: Bryan Cantrill
Subject:
Category: Solaris
Content:
-<p>So MIT's
-<a href="<a href="http://www.techreview.com/">http://www.techreview.com/</a>">Technology Review</a> has named me as one of their
-<a href="<a href="http://www.technologyreview.com/articles/05/10/issue/feature_tr35.asp">http://www.technologyreview.com/articles/05/10/issue/feature_tr35.asp</a>">TR35</a> -- the top 35 innovators under the age of thirty-five. It's a great honor, especially because the other
+So MIT's
+<a href="http://www.techreview.com/">Technology Review</a> has named me as one of their
+<a href="http://www.technologyreview.com/articles/05/10/issue/feature_tr35.asp">TR35</a> -- the top 35 innovators under the age of thirty-five. It's a great honor, especially because the other
honorees are <i>actually</i> working on things like
-<a href="<a href="http://www.wi.mit.edu/research/fellows/brummelkamp.html">http://www.wi.mit.edu/research/fellows/brummelkamp.html</a>">cures for cancer</a>
+<a href="http://www.wi.mit.edu/research/fellows/brummelkamp.html">cures for cancer</a>
and
-<a href="<a href="http://www.pw.utc.com/shock-system/popsci.html">http://www.pw.utc.com/shock-system/popsci.html</a>">rocket science</a> -- domains
+<a href="http://www.pw.utc.com/shock-system/popsci.html">rocket science</a> -- domains
that I have known only as rhetorical flourish.
Should you like to hear me make a jackass out of myself on the subject, you might
want to check out
-<a href="<a href="http://blogs.sun.com/roller/page/rgiles">http://blogs.sun.com/roller/page/rgiles</a>">Richard Giles</a>'s
-<a href="<a href="http://blogs.sun.com/roller/page/rgiles?entry=i_o_podcast_0003_bryan">http://blogs.sun.com/roller/page/rgiles?entry=i_o_podcast_0003_bryan</a>">latest I/O podcast</a>,
+<a href="http://blogs.sun.com/roller/page/rgiles">Richard Giles</a>'s
+<a href="http://blogs.sun.com/roller/page/rgiles?entry=i_o_podcast_0003_bryan">latest I/O podcast</a>,
in which he interviewed me about the award.
- </p>
+
diff --git a/test/parserdata/gnomefr_escapedhtml.output b/test/parserdata/gnomefr_escapedhtml.output
index 36c001c..3b532e7 100644
--- a/test/parserdata/gnomefr_escapedhtml.output
+++ b/test/parserdata/gnomefr_escapedhtml.output
@@ -9,8 +9,8 @@ Creator:
Subject:
Category:
Content:
-<p><div>How are you supposed to trust these guys ?<img src="<a href="http://members.cox.net/vnoel/weblog/uploaded_images/Screenshot-Flight%20Details-798819.png">http://members.cox.net/vnoel/weblog/uploaded_images/Screenshot-Flight%20Details-798819.png</a>" />
-</div></p>
+<div>How are you supposed to trust these guys ?<img src="http://members.cox.net/vnoel/weblog/uploaded_images/Screenshot-Flight%20Details-798819.png" />
+</div>
--------------------------------
Title: Vincent Noel: nautilus and gworkspace
Link: http://members.cox.net/vnoel/weblog/2005/09/nautilus-and-gworkspace.html
diff --git a/test/parserdata/hadess.output b/test/parserdata/hadess.output
new file mode 100644
index 0000000..e3ef52a
--- /dev/null
+++ b/test/parserdata/hadess.output
@@ -0,0 +1,184 @@
+Title: Bastien's Blog
+Link: http://hadess.net
+
+--------------------------------
+Title: Work In Progress
+Link: http://hadess.net/?start=577
+Date: Wed Sep 21 01:16:39 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+Yay! Got <a href="http://pilot-link.org/">pilot-link</a> to sync over Bluetooth, without the crappy 'Set up a PPP server' bit. Now to download <a href="http://palmsource.palmgear.com/index.cfm?fuseaction=software.showsoftware&prodID=52957">BtSync</a>.
+
+--------------------------------
+Title: Student again
+Link: http://hadess.net/?start=576
+Date: Mon Sep 19 23:04:58 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+First day at <a href="http://www.surrey.ac.uk/">Uni</a>, registration, seeing the faces of my new classmates. Definitely the eldest one, the Frenchest one (not the foreignest one, there are 2 Russian girls in the class), and the only one doing it part-time. So I started skipping this afternoon (<i>Welcome to the bestest Uni in England</i> and <i>Where do I do sports</i> that's mostly interesting to 18-year olds who've never left home, like, hmm, most of them). Only the fresher's drinks to go to tomorrow, so I can try and get over the flu I picked up this week-end.<p>
+Done a bit of work on Bluetooth, mainly research, and general updating. <a href="http://noring.nocrew.org/">Fredrik Noring</a> has joined in to help with the ever-distant <a href="/?start=566">gnome-bluetooth-manager</a>. Busy adding new features to libbtctl for now, the fun work will come a bit later.<p>
+Watched <a href="http://us.imdb.com/title/tt0372588/">Team America</a>. Would
+be a funny film if it didn't feel so true.<p>
+
+
+--------------------------------
+Title: /summon
+Link: http://hadess.net/?start=575
+Date: Wed Sep 14 23:22:35 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+Lazy Web, oh Lazy Web, can you tell me a good Obex FTP client for Palm OS?<p>
+PS: <a href="http://www.imdb.com/title/tt0072271/">Texas Chainsaw Massacre</a> sucks. And <a href="http://blogs.gnome.org/view/calum?reverse=1">Calum</a> will be happy we only managed <a href="http://news.bbc.co.uk/sport1/hi/football/europe/4234188.stm">a draw</a>. Glory hunter, my arse.<p>
+
+--------------------------------
+Title: Won it!
+Link: http://hadess.net/?start=574
+Date: Mon Sep 12 22:16:44 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+You'd think that Brits would get behind the England and Wales team <a href="http://news.bbc.co.uk/sport1/hi/cricket/england/4237610.stm">winning the Ashes</a>, but <a href="http://blogs.gnome.org/view/calum/2005/09/12/0">no</a>. Calum, you're cheap. I hope Motherwell loses (again)[1].<p>
+All hail to McGratthie!<p>
+[1]: it's low, but you deserved it.
+
+--------------------------------
+Title: Film Feast
+Link: http://hadess.net/?start=573
+Date: Mon Sep 12 00:04:51 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+Could have been a fest, was a feast. Good and bad together.<br>
+<a href="http://www.imdb.com/title/tt0364569/">Oldboy</a>, as advised by <a href="http://jimmac.musichall.cz/weblog.php/Movies/OldBoy.php">Jakub</a>, a real treat, and a reminescence of Shinobi.<br>
+<a href="http://us.imdb.com/title/tt0107977/">Robin Hood: Men In Tights</a>, another Mel Brooks, not as funny as I remembered when I watched it more than 10 years ago.<br>
+<a href="http://www.imdb.com/title/tt0094625/">Akira</a>, which I'm probably one of the few to have watched after <a href="http://www.imdb.com/title/tt0293416/">Metropolis</a>, definitely one of the good mangas, although not really my style (I'd choose <a href="http://www.imdb.com/title/tt0113568/">Ghost In The Shell</a> or <a href="http://www.imdb.com/title/tt0245429/">Spirited Away</a> over <i>Akira</i>).<br>
+<a href="http://www.imdb.com/title/tt0357413/">Anchorman</a>, you can't believe the romance for a minute, and the jokes are rubbish.<br>
+<a href="http://us.imdb.com/title/tt0145653/">Angela's Ashes</a>, very much an inspiration for later films like <a href="http://www.imdb.com/title/tt0249462/">Billy Elliot</a> or <a href="http://www.imdb.com/title/tt0298845/">In America</a>.<p>
+I also took some time to get DAAP sharing working with Rhythmbox, thanks <a href="http://ishamael.tunkeymicket.com/">ish</a> for the work, I can now stream good music over the waves. Some more <a href="http://usefulinc.com/software/phonemgr">g-p-m</a> love, and we might be ready for another release soon.<p>
+
+<table align="center"><tr><td align=center>
+<img
+src="http://www.bbc.co.uk/weather/images/symbols/fiveday_sym/3.gif" align="center" border=0><br>
+<p align="center"><i>Can England fail to play in that nice weather?</i></p>
+</td></tr>
+</table><p>
+
+
+--------------------------------
+Title: McGratthie's on a roll
+Link: http://hadess.net/?start=572
+Date: Thu Sep 08 21:23:39 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+He got the 10 wickets out yesterday already! (Cricket lovers, don't despair, it's a private joke).<p>
+
+<table align="center"><tr><td align=center>
+<img
+src="/blog/images/30-08-05_2210.jpg" align="center" border=0><br>
+<p align="center"><i>Since <a href="http://www.fluendo.com">Fluendo</a>'s inception, xine lovers have gone underground</i></p>
+</td></tr>
+</table><p>
+
+Breakthrough in <a href="http://usefulinc.com/software/phonemgr/">gnome-phone-manager</a> hacking. No need for a phone, I can have a dummy one! Implemented the dummy backend, which allowed me to fix one bug, and reproduce another one. Now to bug Ross about the contact-lookup-applet's widget disabling itself when there's no completion in Evolution, and waiting for a restart to enable itself.<p>
+
+<table align="center"><tr><td>
+<img
+src="/blog/images/07-09-05_1930.jpg" align="middle" border=0><br>
+<p align="center"><i>No, Alan Cox isn't the Creeper, he just plays in it</i></p>
+</td></tr>
+</table><p>
+
+
+--------------------------------
+Title: It's all about timing
+Link: http://hadess.net/?start=571
+Date: Tue Sep 06 20:06:18 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+And time! It's about time that I finish blogging about my <a href="">trip to Barcelona</a>. Day 4, nice, got to visit the <a href="/files/photos/Barcelona_2005_day_4/">Sagrada Familia</a>. Very nice, convoluted, interweaved, and really overdone. After that, it all goes downhill.<p>
+
+<table align="center"><tr><td>
+<img
+src="/files/photos/Barcelona_2005_day_4/lq/img-2.jpg" align="center" border=0>
+</td></tr>
+<tr><td>
+<p align="center"><i>Over-engineered</i></p>
+</tr></td></table><p>
+
+Puketastic on the Thursday, don't know if it's the Barcelona water, or Christian's fridge contents, but the result wasn't anything for the weak.<p>
+I started my Friday recovery with the <a href="http://www.fluendo.com/">Fluendo</a> guys. I left, shortly after a failed attempt at setting a projector: if you can't get a room dark, a projector isn't very useful. Spent the evening with Andy and his bunch of cosmopolitan housemates. Meeting Cyril was funny, I was speaking English, and him Spanish before Andy, in his well-known style, blurted out <i>"But you're both French, you idiots"</i>. Out to <a href="http://www.bcn-nightlife.com/en/clubs/67">Danzatoria</a>: I can find Radio 2 DJs that can mix hip-hop better than the guy there. Good to know that young Israelis make jokes about the Holocaust. That still doesn't allow you to repeat the anti-semitic jokes your grand-father Helmut was telling you when you were a kid.<p>
+Saturday, footie with Christian, in an Irish pub managed by Frenchmen. England won it by the skin of their teeth. Then a good BBQ at <a href="http://thomas.apestaart.org/log">Thomas</a>', he's got the gear, but you can pick on him for using xmms on his entertainment systems. Welcome to the 1990's. And time to say good bye to everyone, on my way back on the Sunday.<p>
+I wouldn't live in Barcelona, but it's certainly a good place to have friends in, thanks guys.<p>
+<b>Back to the routine</b><p>
+Or not quite. I've received my welcome pack for <a href="http://www.surrey.ac.uk/">UniS</a>, starts Monday 19th. And I'll be going to Boston for the <a href="http://live.gnome.org/Boston2005">Summit</a> (thanks Havoc for sorting it out), and after that to Raleigh to meet my new boss.<p>
+<b>Random bits</b><p>
+Watched <a href="http://us.imdb.com/title/tt0094012/">Spaceballs</a> (full of one-liners, but was certainly more entertaining to watch in the '80s) and <a href="http://www.imdb.com/title/tt0386588/">Hitch</a> (got no excuses, again). And Christian, in addition to having a <a href="http://blogs.gnome.org/view/uraeus/2005/08/27/0">Crazy Frog Background</a>, also has the <a href="http://www.linuxrising.org/screenshots/rootbert.png">Crazy Frog music in his collection</a>.
+
+--------------------------------
+Title: Ouch
+Link: http://hadess.net/?start=570
+Date: Thu Sep 01 03:50:00 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+<p>So the evening was a tiny bit more eventful than prepared. First, Andrew Patrick (the good guy Wingo's mom name) got us down to a very nice tapas bar. Then, it got more furious. We ended up in Moog, and on the way I got offered unaldurated sex (for a fee), hachish, my balls grabbed (gratis), and 20 euros nicked out of my backpocket (gratis, again). The club was good, easy 80's upstairs, and hardcore noughts downstairs. The skinny really white guy standing next to me near the end was, well, Scottish. Would you believe. It's close to 6, and the second part of the night didn't go through. Shame, time to sleep now. Tada.
+</p>
+--------------------------------
+Title: Can cause excessive fan speed
+Link: http://hadess.net/?start=569
+Date: Tue Aug 30 16:17:51 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+After my problems with plane tickets, I still managed to make it to sunny Bacelona. I say sunny, but it's an understatement. For some reason, I'm a sweaty guy, and whether or not the thermometer in <a href="http://blogs.gnome.org/view/uraeus?reverse=1">Christian</a>'s flat tells the truth (stuck as it is on 30 celsius), it is damn hot, and my two showers a day compulsory if I want to go into the world.<p>
+I started my integration on Sunday, when after a few sips at a beach bar, we headed to an Irish pub, to watch <a href="http://news.bbc.co.uk/sport1/hi/football/eng_prem/4168930.stm">Newcastle v. ManU</a>. My home studies (read: the FIFA tournament on Friday evenings) came to use when I called Luque a <i>Tronco</i>, and the barkeep laughed. Headed towards a Thai restaurant (the original plan having failed, see below).<p>
+<table align="center"><tr><td>
+<img
+src="/files/photos/Barcelona_2005_day_3/lq/img-5.jpg" align="center" border=0>
+</td></tr>
+<tr><td>
+<p align="center"><i>Totum, not Totem, and it was closed. So close Andy...</i></p>
+</tr></td></table><p>
+Next day, woke up late, tried to go to the Picasso museum. Closed on Mondays. My debit card to buy a <i>Metro</i> travelcard, nearly eaten by the machine (saved by the keys, McGyver-style). Instead, I hung out at the <a href="http://fluendo.com">Fluendo</a> offices, and made a trip to the aquarium. I couldn't find how to disable the flash on my damn camera, so ended up taking only one picture. The other picture I got was from me standing next to a guy in a shark suit. It's like Disneyland in Catalan.<p>
+<table align="center"><tr><td>
+<img
+src="/files/photos/Barcelona_2005_day_3/lq/img-21.jpg" align="center" border=0>
+</td></tr>
+<tr><td>
+<p align="center"><i>Not a nudie booth</i></p>
+</tr></td></table><p>
+Yesterday evening, we went to a Mexican (apparently a bad one, it wasn't actually so bad), and then to see <a href="http://imdb.com/title/tt0399201/">La Isla</a>. I found the film really bad, and Thomas not so much. I guess Thomas has the culinary taste, and I have the film one.<p>
+<table align="center"><tr><td>
+<img
+src="/files/photos/Barcelona_2005_day_3/lq/img-24.jpg" align="center" border=0>
+</td></tr>
+<tr><td>
+<p align="center"><i>Well high</i></p>
+</tr></td></table><p>
+Today was a late one again, and I strolled to <i>Monjuic</i> via the <i>Teleferic</i> after lunch at the Italian restaurant.<p>
+Another couple of days still, so it's going to be fun. Much thanks to Christian for the hosting, and the Fluendo guys (even the Aussies!) for company.
+
+--------------------------------
+Title: I hate booking plane tickets
+Link: http://hadess.net/?start=568
+Date: Sat Aug 27 10:39:30 UTC 2005
+Creator: Bastien Nocera
+Subject:
+Category:
+Content:
+It's happened to me twice that I bought my tickets for the wrong week. One week off on the booking, and you end up paying more because you're not careful. I manage to top it this time, by booking my trip to Barcelona in the wrong direction. 160 quid later, I'm going in the right direction...<p>
+
diff --git a/test/parserdata/hadess.xml b/test/parserdata/hadess.xml
new file mode 100644
index 0000000..7536603
--- /dev/null
+++ b/test/parserdata/hadess.xml
@@ -0,0 +1,209 @@
+<?xml version="1.0"?><rdf:RDF
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns="http://purl.org/rss/1.0/"
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+>
+<channel rdf:about="http://www.hadess.net/blog/index.php">
+ <title>Bastien's Blog</title>
+ <link>http://hadess.net</link>
+ <description>Day In - Rock Out</description>
+ <dc:date>2005-09-20T20:16:39-05:00</dc:date>
+ <language>en</language>
+ <items>
+ <rdf:Seq>
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ <rdf:li rdf:resource="http://hadess.net" />
+ </rdf:Seq>
+ </items>
+</channel>
+<item rdf:about="http://hadess.net/?start=577">
+<title>Work In Progress
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-09-20T20:16:39-05:00</dc:date>
+<dc:link>http://hadess.net/?start=577</dc:link>
+<link>http://hadess.net/?start=577</link>
+<description>Yay! Got &lt;a href=&quot;http://pilot-link.org/&quot;&gt;pilot-link&lt;/a&gt; to sync over Bluetooth, without the crappy 'Set up a PPP server' bit. Now to download &lt;a href=&quot;http://palmsource.palmgear.com/index.cfm?fuseaction=software.showsoftware&amp;prodID=52957&quot;&gt;BtSync&lt;/a&gt;.
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=576">
+<title>Student again
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-09-19T18:04:58-05:00</dc:date>
+<dc:link>http://hadess.net/?start=576</dc:link>
+<link>http://hadess.net/?start=576</link>
+<description>First day at &lt;a href=&quot;http://www.surrey.ac.uk/&quot;&gt;Uni&lt;/a&gt;, registration, seeing the faces of my new classmates. Definitely the eldest one, the Frenchest one (not the foreignest one, there are 2 Russian girls in the class), and the only one doing it part-time. So I started skipping this afternoon (&lt;i&gt;Welcome to the bestest Uni in England&lt;/i&gt; and &lt;i&gt;Where do I do sports&lt;/i&gt; that's mostly interesting to 18-year olds who've never left home, like, hmm, most of them). Only the fresher's drinks to go to tomorrow, so I can try and get over the flu I picked up this week-end.&lt;p&gt;
+Done a bit of work on Bluetooth, mainly research, and general updating. &lt;a href=&quot;http://noring.nocrew.org/&quot;&gt;Fredrik Noring&lt;/a&gt; has joined in to help with the ever-distant &lt;a href=&quot;/?start=566&quot;&gt;gnome-bluetooth-manager&lt;/a&gt;. Busy adding new features to libbtctl for now, the fun work will come a bit later.&lt;p&gt;
+Watched &lt;a href=&quot;http://us.imdb.com/title/tt0372588/&quot;&gt;Team America&lt;/a&gt;. Would
+be a funny film if it didn't feel so true.&lt;p&gt;
+
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=575">
+<title>/summon
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-09-14T18:22:35-05:00</dc:date>
+<dc:link>http://hadess.net/?start=575</dc:link>
+<link>http://hadess.net/?start=575</link>
+<description>Lazy Web, oh Lazy Web, can you tell me a good Obex FTP client for Palm OS?&lt;p&gt;
+PS: &lt;a href=&quot;http://www.imdb.com/title/tt0072271/&quot;&gt;Texas Chainsaw Massacre&lt;/a&gt; sucks. And &lt;a href=&quot;http://blogs.gnome.org/view/calum?reverse=1&quot;&gt;Calum&lt;/a&gt; will be happy we only managed &lt;a href=&quot;http://news.bbc.co.uk/sport1/hi/football/europe/4234188.stm&quot;&gt;a draw&lt;/a&gt;. Glory hunter, my arse.&lt;p&gt;
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=574">
+<title>Won it!
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-09-12T17:16:44-05:00</dc:date>
+<dc:link>http://hadess.net/?start=574</dc:link>
+<link>http://hadess.net/?start=574</link>
+<description>You'd think that Brits would get behind the England and Wales team &lt;a href=&quot;http://news.bbc.co.uk/sport1/hi/cricket/england/4237610.stm&quot;&gt;winning the Ashes&lt;/a&gt;, but &lt;a href=&quot;http://blogs.gnome.org/view/calum/2005/09/12/0&quot;&gt;no&lt;/a&gt;. Calum, you're cheap. I hope Motherwell loses (again)[1].&lt;p&gt;
+All hail to McGratthie!&lt;p&gt;
+[1]: it's low, but you deserved it.
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=573">
+<title>Film Feast
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-09-11T19:04:51-05:00</dc:date>
+<dc:link>http://hadess.net/?start=573</dc:link>
+<link>http://hadess.net/?start=573</link>
+<description>Could have been a fest, was a feast. Good and bad together.&lt;br&gt;
+&lt;a href=&quot;http://www.imdb.com/title/tt0364569/&quot;&gt;Oldboy&lt;/a&gt;, as advised by &lt;a href=&quot;http://jimmac.musichall.cz/weblog.php/Movies/OldBoy.php&quot;&gt;Jakub&lt;/a&gt;, a real treat, and a reminescence of Shinobi.&lt;br&gt;
+&lt;a href=&quot;http://us.imdb.com/title/tt0107977/&quot;&gt;Robin Hood: Men In Tights&lt;/a&gt;, another Mel Brooks, not as funny as I remembered when I watched it more than 10 years ago.&lt;br&gt;
+&lt;a href=&quot;http://www.imdb.com/title/tt0094625/&quot;&gt;Akira&lt;/a&gt;, which I'm probably one of the few to have watched after &lt;a href=&quot;http://www.imdb.com/title/tt0293416/&quot;&gt;Metropolis&lt;/a&gt;, definitely one of the good mangas, although not really my style (I'd choose &lt;a href=&quot;http://www.imdb.com/title/tt0113568/&quot;&gt;Ghost In The Shell&lt;/a&gt; or &lt;a href=&quot;http://www.imdb.com/title/tt0245429/&quot;&gt;Spirited Away&lt;/a&gt; over &lt;i&gt;Akira&lt;/i&gt;).&lt;br&gt;
+&lt;a href=&quot;http://www.imdb.com/title/tt0357413/&quot;&gt;Anchorman&lt;/a&gt;, you can't believe the romance for a minute, and the jokes are rubbish.&lt;br&gt;
+&lt;a href=&quot;http://us.imdb.com/title/tt0145653/&quot;&gt;Angela's Ashes&lt;/a&gt;, very much an inspiration for later films like &lt;a href=&quot;http://www.imdb.com/title/tt0249462/&quot;&gt;Billy Elliot&lt;/a&gt; or &lt;a href=&quot;http://www.imdb.com/title/tt0298845/&quot;&gt;In America&lt;/a&gt;.&lt;p&gt;
+I also took some time to get DAAP sharing working with Rhythmbox, thanks &lt;a href=&quot;http://ishamael.tunkeymicket.com/&quot;&gt;ish&lt;/a&gt; for the work, I can now stream good music over the waves. Some more &lt;a href=&quot;http://usefulinc.com/software/phonemgr&quot;&gt;g-p-m&lt;/a&gt; love, and we might be ready for another release soon.&lt;p&gt;
+
+&lt;table align=&quot;center&quot;&gt;&lt;tr&gt;&lt;td align=center&gt;
+&lt;img
+src=&quot;http://www.bbc.co.uk/weather/images/symbols/fiveday_sym/3.gif&quot; align=&quot;center&quot; border=0&gt;&lt;br&gt;
+&lt;p align=&quot;center&quot;&gt;&lt;i&gt;Can England fail to play in that nice weather?&lt;/i&gt;&lt;/p&gt;
+&lt;/td&gt;&lt;/tr&gt;
+&lt;/table&gt;&lt;p&gt;
+
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=572">
+<title>McGratthie's on a roll
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-09-08T16:23:39-05:00</dc:date>
+<dc:link>http://hadess.net/?start=572</dc:link>
+<link>http://hadess.net/?start=572</link>
+<description>He got the 10 wickets out yesterday already! (Cricket lovers, don't despair, it's a private joke).&lt;p&gt;
+
+&lt;table align=&quot;center&quot;&gt;&lt;tr&gt;&lt;td align=center&gt;
+&lt;img
+src=&quot;/blog/images/30-08-05_2210.jpg&quot; align=&quot;center&quot; border=0&gt;&lt;br&gt;
+&lt;p align=&quot;center&quot;&gt;&lt;i&gt;Since &lt;a href=&quot;http://www.fluendo.com&quot;&gt;Fluendo&lt;/a&gt;'s inception, xine lovers have gone underground&lt;/i&gt;&lt;/p&gt;
+&lt;/td&gt;&lt;/tr&gt;
+&lt;/table&gt;&lt;p&gt;
+
+Breakthrough in &lt;a href=&quot;http://usefulinc.com/software/phonemgr/&quot;&gt;gnome-phone-manager&lt;/a&gt; hacking. No need for a phone, I can have a dummy one! Implemented the dummy backend, which allowed me to fix one bug, and reproduce another one. Now to bug Ross about the contact-lookup-applet's widget disabling itself when there's no completion in Evolution, and waiting for a restart to enable itself.&lt;p&gt;
+
+&lt;table align=&quot;center&quot;&gt;&lt;tr&gt;&lt;td&gt;
+&lt;img
+src=&quot;/blog/images/07-09-05_1930.jpg&quot; align=&quot;middle&quot; border=0&gt;&lt;br&gt;
+&lt;p align=&quot;center&quot;&gt;&lt;i&gt;No, Alan Cox isn't the Creeper, he just plays in it&lt;/i&gt;&lt;/p&gt;
+&lt;/td&gt;&lt;/tr&gt;
+&lt;/table&gt;&lt;p&gt;
+
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=571">
+<title>It's all about timing
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-09-06T15:06:18-05:00</dc:date>
+<dc:link>http://hadess.net/?start=571</dc:link>
+<link>http://hadess.net/?start=571</link>
+<description>And time! It's about time that I finish blogging about my &lt;a href=&quot;&quot;&gt;trip to Barcelona&lt;/a&gt;. Day 4, nice, got to visit the &lt;a href=&quot;/files/photos/Barcelona_2005_day_4/&quot;&gt;Sagrada Familia&lt;/a&gt;. Very nice, convoluted, interweaved, and really overdone. After that, it all goes downhill.&lt;p&gt;
+
+&lt;table align=&quot;center&quot;&gt;&lt;tr&gt;&lt;td&gt;
+&lt;img
+src=&quot;/files/photos/Barcelona_2005_day_4/lq/img-2.jpg&quot; align=&quot;center&quot; border=0&gt;
+&lt;/td&gt;&lt;/tr&gt;
+&lt;tr&gt;&lt;td&gt;
+&lt;p align=&quot;center&quot;&gt;&lt;i&gt;Over-engineered&lt;/i&gt;&lt;/p&gt;
+&lt;/tr&gt;&lt;/td&gt;&lt;/table&gt;&lt;p&gt;
+
+Puketastic on the Thursday, don't know if it's the Barcelona water, or Christian's fridge contents, but the result wasn't anything for the weak.&lt;p&gt;
+I started my Friday recovery with the &lt;a href=&quot;http://www.fluendo.com/&quot;&gt;Fluendo&lt;/a&gt; guys. I left, shortly after a failed attempt at setting a projector: if you can't get a room dark, a projector isn't very useful. Spent the evening with Andy and his bunch of cosmopolitan housemates. Meeting Cyril was funny, I was speaking English, and him Spanish before Andy, in his well-known style, blurted out &lt;i&gt;&quot;But you're both French, you idiots&quot;&lt;/i&gt;. Out to &lt;a href=&quot;http://www.bcn-nightlife.com/en/clubs/67&quot;&gt;Danzatoria&lt;/a&gt;: I can find Radio 2 DJs that can mix hip-hop better than the guy there. Good to know that young Israelis make jokes about the Holocaust. That still doesn't allow you to repeat the anti-semitic jokes your grand-father Helmut was telling you when you were a kid.&lt;p&gt;
+Saturday, footie with Christian, in an Irish pub managed by Frenchmen. England won it by the skin of their teeth. Then a good BBQ at &lt;a href=&quot;http://thomas.apestaart.org/log&quot;&gt;Thomas&lt;/a&gt;', he's got the gear, but you can pick on him for using xmms on his entertainment systems. Welcome to the 1990's. And time to say good bye to everyone, on my way back on the Sunday.&lt;p&gt;
+I wouldn't live in Barcelona, but it's certainly a good place to have friends in, thanks guys.&lt;p&gt;
+&lt;b&gt;Back to the routine&lt;/b&gt;&lt;p&gt;
+Or not quite. I've received my welcome pack for &lt;a href=&quot;http://www.surrey.ac.uk/&quot;&gt;UniS&lt;/a&gt;, starts Monday 19th. And I'll be going to Boston for the &lt;a href=&quot;http://live.gnome.org/Boston2005&quot;&gt;Summit&lt;/a&gt; (thanks Havoc for sorting it out), and after that to Raleigh to meet my new boss.&lt;p&gt;
+&lt;b&gt;Random bits&lt;/b&gt;&lt;p&gt;
+Watched &lt;a href=&quot;http://us.imdb.com/title/tt0094012/&quot;&gt;Spaceballs&lt;/a&gt; (full of one-liners, but was certainly more entertaining to watch in the '80s) and &lt;a href=&quot;http://www.imdb.com/title/tt0386588/&quot;&gt;Hitch&lt;/a&gt; (got no excuses, again). And Christian, in addition to having a &lt;a href=&quot;http://blogs.gnome.org/view/uraeus/2005/08/27/0&quot;&gt;Crazy Frog Background&lt;/a&gt;, also has the &lt;a href=&quot;http://www.linuxrising.org/screenshots/rootbert.png&quot;&gt;Crazy Frog music in his collection&lt;/a&gt;.
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=570">
+<title>Ouch
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-08-30T22:50:00-05:00</dc:date>
+<dc:link>http://hadess.net/?start=570</dc:link>
+<link>http://hadess.net/?start=570</link>
+<description>So the evening was a tiny bit more eventful than prepared. First, Andrew Patrick (the good guy Wingo's mom name) got us down to a very nice tapas bar. Then, it got more furious. We ended up in Moog, and on the way I got offered unaldurated sex (for a fee), hachish, my balls grabbed (gratis), and 20 euros nicked out of my backpocket (gratis, again). The club was good, easy 80's upstairs, and hardcore noughts downstairs. The skinny really white guy standing next to me near the end was, well, Scottish. Would you believe. It's close to 6, and the second part of the night didn't go through. Shame, time to sleep now. Tada.
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=569">
+<title>Can cause excessive fan speed
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-08-30T11:17:51-05:00</dc:date>
+<dc:link>http://hadess.net/?start=569</dc:link>
+<link>http://hadess.net/?start=569</link>
+<description>After my problems with plane tickets, I still managed to make it to sunny Bacelona. I say sunny, but it's an understatement. For some reason, I'm a sweaty guy, and whether or not the thermometer in &lt;a href=&quot;http://blogs.gnome.org/view/uraeus?reverse=1&quot;&gt;Christian&lt;/a&gt;'s flat tells the truth (stuck as it is on 30 celsius), it is damn hot, and my two showers a day compulsory if I want to go into the world.&lt;p&gt;
+I started my integration on Sunday, when after a few sips at a beach bar, we headed to an Irish pub, to watch &lt;a href=&quot;http://news.bbc.co.uk/sport1/hi/football/eng_prem/4168930.stm&quot;&gt;Newcastle v. ManU&lt;/a&gt;. My home studies (read: the FIFA tournament on Friday evenings) came to use when I called Luque a &lt;i&gt;Tronco&lt;/i&gt;, and the barkeep laughed. Headed towards a Thai restaurant (the original plan having failed, see below).&lt;p&gt;
+&lt;table align=&quot;center&quot;&gt;&lt;tr&gt;&lt;td&gt;
+&lt;img
+src=&quot;/files/photos/Barcelona_2005_day_3/lq/img-5.jpg&quot; align=&quot;center&quot; border=0&gt;
+&lt;/td&gt;&lt;/tr&gt;
+&lt;tr&gt;&lt;td&gt;
+&lt;p align=&quot;center&quot;&gt;&lt;i&gt;Totum, not Totem, and it was closed. So close Andy...&lt;/i&gt;&lt;/p&gt;
+&lt;/tr&gt;&lt;/td&gt;&lt;/table&gt;&lt;p&gt;
+Next day, woke up late, tried to go to the Picasso museum. Closed on Mondays. My debit card to buy a &lt;i&gt;Metro&lt;/i&gt; travelcard, nearly eaten by the machine (saved by the keys, McGyver-style). Instead, I hung out at the &lt;a href=&quot;http://fluendo.com&quot;&gt;Fluendo&lt;/a&gt; offices, and made a trip to the aquarium. I couldn't find how to disable the flash on my damn camera, so ended up taking only one picture. The other picture I got was from me standing next to a guy in a shark suit. It's like Disneyland in Catalan.&lt;p&gt;
+&lt;table align=&quot;center&quot;&gt;&lt;tr&gt;&lt;td&gt;
+&lt;img
+src=&quot;/files/photos/Barcelona_2005_day_3/lq/img-21.jpg&quot; align=&quot;center&quot; border=0&gt;
+&lt;/td&gt;&lt;/tr&gt;
+&lt;tr&gt;&lt;td&gt;
+&lt;p align=&quot;center&quot;&gt;&lt;i&gt;Not a nudie booth&lt;/i&gt;&lt;/p&gt;
+&lt;/tr&gt;&lt;/td&gt;&lt;/table&gt;&lt;p&gt;
+Yesterday evening, we went to a Mexican (apparently a bad one, it wasn't actually so bad), and then to see &lt;a href=&quot;http://imdb.com/title/tt0399201/&quot;&gt;La Isla&lt;/a&gt;. I found the film really bad, and Thomas not so much. I guess Thomas has the culinary taste, and I have the film one.&lt;p&gt;
+&lt;table align=&quot;center&quot;&gt;&lt;tr&gt;&lt;td&gt;
+&lt;img
+src=&quot;/files/photos/Barcelona_2005_day_3/lq/img-24.jpg&quot; align=&quot;center&quot; border=0&gt;
+&lt;/td&gt;&lt;/tr&gt;
+&lt;tr&gt;&lt;td&gt;
+&lt;p align=&quot;center&quot;&gt;&lt;i&gt;Well high&lt;/i&gt;&lt;/p&gt;
+&lt;/tr&gt;&lt;/td&gt;&lt;/table&gt;&lt;p&gt;
+Today was a late one again, and I strolled to &lt;i&gt;Monjuic&lt;/i&gt; via the &lt;i&gt;Teleferic&lt;/i&gt; after lunch at the Italian restaurant.&lt;p&gt;
+Another couple of days still, so it's going to be fun. Much thanks to Christian for the hosting, and the Fluendo guys (even the Aussies!) for company.
+</description>
+</item>
+<item rdf:about="http://hadess.net/?start=568">
+<title>I hate booking plane tickets
+</title>
+<dc:creator>Bastien Nocera</dc:creator>
+<dc:date>2005-08-27T05:39:30-05:00</dc:date>
+<dc:link>http://hadess.net/?start=568</dc:link>
+<link>http://hadess.net/?start=568</link>
+<description>It's happened to me twice that I bought my tickets for the wrong week. One week off on the booking, and you end up paying more because you're not careful. I manage to top it this time, by booking my trip to Barcelona in the wrong direction. 160 quid later, I'm going in the right direction...&lt;p&gt;
+</description>
+</item>
+
+</rdf:RDF> \ No newline at end of file
diff --git a/test/parserdata/rss091_utf8_dirtyhtml_advogato.output b/test/parserdata/rss091_utf8_dirtyhtml_advogato.output
index 8ed6e73..778e239 100644
--- a/test/parserdata/rss091_utf8_dirtyhtml_advogato.output
+++ b/test/parserdata/rss091_utf8_dirtyhtml_advogato.output
@@ -68,7 +68,7 @@ Creator:
Subject:
Category:
Content:
-<p><a href="<a href="http://www.benandjerrys.com/">http://www.benandjerrys.com/</a>" >Ben &amp; Jerry's</a> annual Free Cone Day. Tuesday, April 27 12pm-8pm. Follow the link to find out where.</p>
+<a href="http://www.benandjerrys.com/" >Ben &amp; Jerry's</a> annual Free Cone Day. Tuesday, April 27 12pm-8pm. Follow the link to find out where.
--------------------------------
Title: 21 Apr 2004
Link: http://www.advogato.org/person/sopwith/diary.html?start=5
@@ -100,7 +100,7 @@ Creator:
Subject:
Category:
Content:
-<p><a href="<a href="http://people.redhat.com/sopwith/">http://people.redhat.com/sopwith/</a>" >My home page</a></p>
+<a href="http://people.redhat.com/sopwith/" >My home page</a>
--------------------------------
Title: 12 Apr 2001
Link: http://www.advogato.org/person/sopwith/diary.html?start=2
diff --git a/test/parserdata/rss1_makii.output b/test/parserdata/rss1_makii.output
index 8aeb78d..843e86d 100644
--- a/test/parserdata/rss1_makii.output
+++ b/test/parserdata/rss1_makii.output
@@ -9,7 +9,7 @@ Creator:
Subject:
Category:
Content:
-Graficznie <b>Final Lap R</b> nareszcie nic nie można zarzucić dzięki pracy <b>ElSemi</b>\'ego.
+<p>Graficznie <b>Final Lap R</b> nareszcie nic nie można zarzucić dzięki pracy <b>ElSemi</b>\'ego.</p>
--------------------------------
Title: VBA Smooth v6.4
Link: http://emu.makii.pl/posthead.php3?shownews=12583
@@ -18,4 +18,4 @@ Creator:
Subject:
Category:
Content:
-Klon <b>VisualBoy Advance</b> potrafiący wykorzystać dobrodziejstwo plugin\'ów graficznych.
+<p>Klon <b>VisualBoy Advance</b> potrafiący wykorzystać dobrodziejstwo plugin\'ów graficznych.</p>
diff --git a/test/parserdata/rss1_utf8_html_planet.output b/test/parserdata/rss1_utf8_html_planet.output
index d06c7fb..46ee1dd 100644
--- a/test/parserdata/rss1_utf8_html_planet.output
+++ b/test/parserdata/rss1_utf8_html_planet.output
@@ -20,12 +20,12 @@ Creator:
Subject:
Category:
Content:
-<p><img src="<a href="http://planet.gnome.org/heads/hub.png">http://planet.gnome.org/heads/hub.png</a>" align="right" alt=""><ul>
-<li>I'm <a href="<a href="http://www.figuiere.net/hub/blog/?2005/02/09/83-all-mailers-suck">http://www.figuiere.net/hub/blog/?2005/02/09/83-all-mailers-suck</a>">ranting about Evolution, its speed and its antispam</a>, I'm <a href="<a href="http://www.advogato.org/person/robertc/diary.html?start=31">http://www.advogato.org/person/robertc/diary.html?start=31</a>">not alone</a>. Someone should really start to replace this antispam by what we find in Thunderbird / Mozilla, including flagging as <code>\Junk</code> over IMAP (I couldn't find any reference in RFCs and couldn't find the code in Thunderbird, anyone has an idea?) <a href="<a href="http://bugzilla.ximian.com/show_bug.cgi?id=72547">http://bugzilla.ximian.com/show_bug.cgi?id=72547</a>">see bug 72547</a>. For reference, marking one message as junk on my laptop take 15 to 20sec, even if the message is local.</li>
+<img src="http://planet.gnome.org/heads/hub.png" align="right" alt=""><ul>
+<li>I'm <a href="http://www.figuiere.net/hub/blog/?2005/02/09/83-all-mailers-suck">ranting about Evolution, its speed and its antispam</a>, I'm <a href="http://www.advogato.org/person/robertc/diary.html?start=31">not alone</a>. Someone should really start to replace this antispam by what we find in Thunderbird / Mozilla, including flag