feedparser icon indicating copy to clipboard operation
feedparser copied to clipboard

Can't get images from enclosure element

Open Dynamix72 opened this issue 4 years ago • 5 comments

I use the feed: https://www.nu.nl/rss/Algemeen

The image are in an enclosure element. Can this be solved in a new release?

<atom:link href="https://www.nu.nl/rss/Algemeen" rel="self"/> <language>nl-nl</language> <copyright>Copyright (c) 2021, NU</copyright> <lastBuildDate>Mon, 20 Dec 2021 16:39:45 +0100</lastBuildDate> <ttl>60</ttl> <atom:logo>https://www.nu.nl/algemeenstatic/img/atoms/images/logos/rss-logo-250x40.png</atom:logo> <item> <title>Novavax-vaccin kan volgens CBG sommige prikweigeraars alsnog overtuigen</title> <link>https://www.nu.nl/coronavirus/6174197/novavax-vaccin-kan-volgens-cbg-sommige-prikweigeraars-alsnog-overtuigen.html</link> <description>Het nieuwe coronavaccin van Novavax, dat &lt;a href="https://www.nu.nl/coronavirus/6174179/coronavaccin-van-novavax-als-vijfde-goedgekeurd-voor-gebruik-in-europa.html" target="_blank"&gt;maandag&lt;/a&gt; een positieve beoordeling van het Europese geneesmiddelenbureau (EMA) kreeg, kan helpen om ongevaccineerden alsnog over de streep te trekken. Het vaccin is namelijk gebaseerd op een oude techniek, met eiwitten. Dat zegt Ton de Boer, voorzitter van het College ter Beoordeling van Geneesmiddelen (CBG).</description> <pubDate>Mon, 20 Dec 2021 15:51:55 +0100</pubDate> <guid isPermaLink="false">https://www.nu.nl/-/6174197/</guid> **<enclosure length="0" type="image/jpeg" url="https://media.nu.nl/m/rnlxvbaa9mz9_sqr256.jpg/novavax-vaccin-kan-volgens-cbg-sommige-prikweigeraars-alsnog-overtuigen.jpg"/>** <category>Algemeen</category> <category>Binnenland</category> <category>Coronavirus</category> <dc:creator>NU.nl/ANP</dc:creator> <dc:rights>copyright photo: EPA</dc:rights> <atom:link href="https://nu.nl/coronavirus/6174179/coronavaccin-van-novavax-als-vijfde-goedgekeurd-voor-gebruik-in-europa.html" rel="related" title="Coronavaccin van Novavax als vijfde goedgekeurd voor gebruik in Europa" type="text/html"/> <atom:link href="https://nu.nl/coronavirus/6174077/speciale-omikronvaccins-komen-eraan-maar-onzeker-of-ze-worden-ingezet.html" rel="related" title="Speciale omikronvaccins komen eraan, maar onzeker of ze worden ingezet" type="text/html"/> <atom:link href="https://nu.nl/coronavirus/6173729/mensen-bellen-massaal-telefoonlijn-voor-vaccintwijfelaars-limiet-bijna-bereikt.html" rel="related" title="Mensen bellen massaal telefoonlijn voor vaccintwijfelaars, limiet bijna bereikt" type="text/html"/> </item>

Dynamix72 avatar Dec 22 '21 10:12 Dynamix72

Bump on this. Currently setting up my first RSS feed and sadly the feed is using <enclosure> tag rather than an image field.

BastienM avatar Mar 25 '22 16:03 BastienM

Hi Dynmix,

I have adjusted the sensor.py file under custom component and it seems to be working for me. I had checked your RSS feed and it seems to work.

"""Feedparser sensor"""
from __future__ import annotations

import asyncio
import re
from datetime import timedelta

import homeassistant.helpers.config_validation as cv
import voluptuous as vol
from dateutil import parser
from homeassistant.components.sensor import PLATFORM_SCHEMA, SensorEntity
from homeassistant.const import CONF_NAME, CONF_SCAN_INTERVAL
from homeassistant.core import HomeAssistant
from homeassistant.helpers.entity_platform import AddEntitiesCallback
from homeassistant.helpers.typing import ConfigType, DiscoveryInfoType

import feedparser

__version__ = "0.1.6"

COMPONENT_REPO = "https://github.com/custom-components/sensor.feedparser/"

REQUIREMENTS = ["feedparser"]

CONF_FEED_URL = "feed_url"
CONF_DATE_FORMAT = "date_format"
CONF_INCLUSIONS = "inclusions"
CONF_EXCLUSIONS = "exclusions"
CONF_SHOW_TOPN = "show_topn"

DEFAULT_SCAN_INTERVAL = timedelta(hours=1)

PLATFORM_SCHEMA = PLATFORM_SCHEMA.extend(
    {
        vol.Required(CONF_NAME): cv.string,
        vol.Required(CONF_FEED_URL): cv.string,
        vol.Required(CONF_DATE_FORMAT, default="%a, %b %d %I:%M %p"): cv.string,
        vol.Optional(CONF_SHOW_TOPN, default=9999): cv.positive_int,
        vol.Optional(CONF_INCLUSIONS, default=[]): vol.All(cv.ensure_list, [cv.string]),
        vol.Optional(CONF_EXCLUSIONS, default=[]): vol.All(cv.ensure_list, [cv.string]),
        vol.Optional(CONF_SCAN_INTERVAL, default=DEFAULT_SCAN_INTERVAL): cv.time_period,
    }
)


@asyncio.coroutine
def async_setup_platform(
    hass: HomeAssistant,
    config: ConfigType,
    async_add_devices: AddEntitiesCallback,
    discovery_info: DiscoveryInfoType | None = None,
) -> None:
    async_add_devices(
        [
            FeedParserSensor(
                feed=config[CONF_FEED_URL],
                name=config[CONF_NAME],
                date_format=config[CONF_DATE_FORMAT],
                show_topn=config[CONF_SHOW_TOPN],
                inclusions=config[CONF_INCLUSIONS],
                exclusions=config[CONF_EXCLUSIONS],
                scan_interval=config[CONF_SCAN_INTERVAL],
            )
        ],
        True,
    )


class FeedParserSensor(SensorEntity):
    def __init__(
        self,
        feed: str,
        name: str,
        date_format: str,
        show_topn: str,
        exclusions: str,
        inclusions: str,
        scan_interval: int,
    ) -> None:
        self._feed = feed
        self._attr_name = name
        self._attr_icon = "mdi:rss"
        self._date_format = date_format
        self._show_topn = show_topn
        self._inclusions = inclusions
        self._exclusions = exclusions
        self._scan_interval = scan_interval
        self._attr_state = None
        self._entries = []
        self._attr_extra_state_attributes = {"entries": self._entries}

    def update(self):
        parsed_feed = feedparser.parse(self._feed)

        if not parsed_feed:
            return False
        else:
            self._attr_state = (
                self._show_topn
                if len(parsed_feed.entries) > self._show_topn
                else len(parsed_feed.entries)
            )
            self._entries = []

            for entry in parsed_feed.entries[: self._attr_state]:
                entry_value = {}

                for key, value in entry.items():
                    if (
                        (self._inclusions and key not in self._inclusions)
                        or ("parsed" in key)
                        or (key in self._exclusions)
                    ):
                        continue
                    if key in ["published", "updated", "created", "expired"]:
                        value = parser.parse(value).strftime(self._date_format)

                    entry_value[key] = value

                if "image" in self._inclusions and "image" not in entry_value.keys():
                    images = []
                    if "summary" in entry.keys():
                        images = re.findall(
                            r"<img.+?src=\"(.+?)\".+?>", entry["summary"]
                        )
                    if images:
                        entry_value["image"] = images[0]
                    else:
                        if "links" in entry.keys():
                            images = re.findall(
                            '(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-&?=%.]+', str(entry["links"][1])
                        )
                        if images:
                            entry_value["image"] = images[0]
                        else:
                            entry_value[
                                "image"
                            ] = "https://www.home-assistant.io/images/favicon-192x192-full.png"

                self._entries.append(entry_value)

    @property
    def state(self):
        """Return the state of the sensor."""
        return self._attr_state

    @property
    def extra_state_attributes(self):
        return {"entries": self._entries}

Gamepsyched avatar Apr 21 '22 22:04 Gamepsyched

#81 should fix this issue. Could you please try the beta release I did and tell me if images show up for you? https://github.com/custom-components/feedparser/releases/tag/0.2.0b0

If they do not show up, could you please provide the feed URL, so I can investigate?

Note: #78, #57 and #64 should be addressed and fixed by #81.

ogajduse avatar Jul 28 '23 07:07 ogajduse

Same here - trying to get image url from media:content element inside RSS item:

<media:content url="https://g.delfi.lt/images/pix/518x0/BvXDZaXkhRU/rusijos-karas-pries-ukraina-94097143.jpg"/>

Paktas avatar Aug 02 '23 20:08 Paktas

@Paktas What version of the integration do you run?

ogajduse avatar Aug 03 '23 13:08 ogajduse