You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RSS feed that the Seafax sample pulls from by default (BBC Technology news) can contain some unicode characters. Specifically, I noticed that smart single quotes \u2019 show up displayed within the headlines. For example, at the time of reporting this, the feed contains this:
<item>
<title><![CDATA[Google Pixel’s face-altering photo tool sparks AI manipulation debate]]></title>
This is then displayed on-screen as:
Google Pixel\u2019s face-altering photo tool sparks AI manipulation debate
I've hacked a couple of simple (but horrible) solutions that addresses this single situation.
Either around line 125:
# Populate our result dictiftop_taginaccept_tags:
current[top_tag.decode("utf-8")] =text.decode("utf-8").replace("\u2019","'")
# this replaces unicode RIGHT_SINGLE_QUOTATION_MARK with basic mark
An alternative which may be better(?) is to do the replacement in the get_rss() function instead (this is the one I'm using for now).
There's probably a better way to clean up the string data and handle other potential Unicode invaders, but so far the smart right quotation mark is the only one I'm seeing in the RSS data. It is also not consistent, as elsewhere in the BBC feed I'm seeing basic single quotes in the exact same context as the smart quote in this situation.
I'm happy to submit a PR if that would be useful, there's a good chance that this is such a niche case that it's not warranted.
The text was updated successfully, but these errors were encountered:
The RSS feed that the Seafax sample pulls from by default (BBC Technology news) can contain some unicode characters. Specifically, I noticed that smart single quotes
\u2019
show up displayed within the headlines. For example, at the time of reporting this, the feed contains this:This is then displayed on-screen as:
I've hacked a couple of simple (but horrible) solutions that addresses this single situation.
Either around line 125:
An alternative which may be better(?) is to do the replacement in the
get_rss()
function instead (this is the one I'm using for now).There's probably a better way to clean up the string data and handle other potential Unicode invaders, but so far the smart right quotation mark is the only one I'm seeing in the RSS data. It is also not consistent, as elsewhere in the BBC feed I'm seeing basic single quotes in the exact same context as the smart quote in this situation.
I'm happy to submit a PR if that would be useful, there's a good chance that this is such a niche case that it's not warranted.
The text was updated successfully, but these errors were encountered: