Minor bug in Seafax example #80

andypiper · 2023-10-22T13:59:34Z

The RSS feed that the Seafax sample pulls from by default (BBC Technology news) can contain some unicode characters. Specifically, I noticed that smart single quotes \u2019 show up displayed within the headlines. For example, at the time of reporting this, the feed contains this:

    <item>
      <title><![CDATA[Google Pixel’s face-altering photo tool sparks AI manipulation debate]]></title>

This is then displayed on-screen as:

Google Pixel\u2019s face-altering photo tool sparks AI manipulation debate

I've hacked a couple of simple (but horrible) solutions that addresses this single situation.

Either around line 125:

                # Populate our result dict
                if top_tag in accept_tags:
                    current[top_tag.decode("utf-8")] = text.decode("utf-8").replace("\u2019","'")
                    # this replaces unicode RIGHT_SINGLE_QUOTATION_MARK with basic mark

An alternative which may be better(?) is to do the replacement in the get_rss() function instead (this is the one I'm using for now).

def get_rss():
    try:
        stream = urequest.urlopen(URL)
        output = list(parse_xml_stream(stream, [b"title", b"description", b"guid", b"pubDate"], b"item"))

        # replace smart quotes with basic ones in titles
        for dict in output:
            for key,value in dict.items():
                if key == "title":
                    dict[key] = value.replace("\u2019","'")
                    
        return output

    except OSError as e:
        print(e)
        return False

There's probably a better way to clean up the string data and handle other potential Unicode invaders, but so far the smart right quotation mark is the only one I'm seeing in the RSS data. It is also not consistent, as elsewhere in the BBC feed I'm seeing basic single quotes in the exact same context as the smart quote in this situation.

I'm happy to submit a PR if that would be useful, there's a good chance that this is such a niche case that it's not warranted.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor bug in Seafax example #80

Minor bug in Seafax example #80

andypiper commented Oct 22, 2023 •

edited

Loading

Minor bug in Seafax example #80

Minor bug in Seafax example #80

Comments

andypiper commented Oct 22, 2023 • edited Loading

andypiper commented Oct 22, 2023 •

edited

Loading