Can't get images from enclosure element
I use the feed: https://www.nu.nl/rss/Algemeen
The image are in an enclosure element. Can this be solved in a new release?
<atom:link href="https://www.nu.nl/rss/Algemeen" rel="self"/> <language>nl-nl</language> <copyright>Copyright (c) 2021, NU</copyright> <lastBuildDate>Mon, 20 Dec 2021 16:39:45 +0100</lastBuildDate> <ttl>60</ttl> <atom:logo>https://www.nu.nl/algemeenstatic/img/atoms/images/logos/rss-logo-250x40.png</atom:logo> <item> <title>Novavax-vaccin kan volgens CBG sommige prikweigeraars alsnog overtuigen</title> <link>https://www.nu.nl/coronavirus/6174197/novavax-vaccin-kan-volgens-cbg-sommige-prikweigeraars-alsnog-overtuigen.html</link> <description>Het nieuwe coronavaccin van Novavax, dat <a href="https://www.nu.nl/coronavirus/6174179/coronavaccin-van-novavax-als-vijfde-goedgekeurd-voor-gebruik-in-europa.html" target="_blank">maandag</a> een positieve beoordeling van het Europese geneesmiddelenbureau (EMA) kreeg, kan helpen om ongevaccineerden alsnog over de streep te trekken. Het vaccin is namelijk gebaseerd op een oude techniek, met eiwitten. Dat zegt Ton de Boer, voorzitter van het College ter Beoordeling van Geneesmiddelen (CBG).</description> <pubDate>Mon, 20 Dec 2021 15:51:55 +0100</pubDate> <guid isPermaLink="false">https://www.nu.nl/-/6174197/</guid> **<enclosure length="0" type="image/jpeg" url="https://media.nu.nl/m/rnlxvbaa9mz9_sqr256.jpg/novavax-vaccin-kan-volgens-cbg-sommige-prikweigeraars-alsnog-overtuigen.jpg"/>** <category>Algemeen</category> <category>Binnenland</category> <category>Coronavirus</category> <dc:creator>NU.nl/ANP</dc:creator> <dc:rights>copyright photo: EPA</dc:rights> <atom:link href="https://nu.nl/coronavirus/6174179/coronavaccin-van-novavax-als-vijfde-goedgekeurd-voor-gebruik-in-europa.html" rel="related" title="Coronavaccin van Novavax als vijfde goedgekeurd voor gebruik in Europa" type="text/html"/> <atom:link href="https://nu.nl/coronavirus/6174077/speciale-omikronvaccins-komen-eraan-maar-onzeker-of-ze-worden-ingezet.html" rel="related" title="Speciale omikronvaccins komen eraan, maar onzeker of ze worden ingezet" type="text/html"/> <atom:link href="https://nu.nl/coronavirus/6173729/mensen-bellen-massaal-telefoonlijn-voor-vaccintwijfelaars-limiet-bijna-bereikt.html" rel="related" title="Mensen bellen massaal telefoonlijn voor vaccintwijfelaars, limiet bijna bereikt" type="text/html"/> </item>
Bump on this.
Currently setting up my first RSS feed and sadly the feed is using <enclosure> tag rather than an image field.
Hi Dynmix,
I have adjusted the sensor.py file under custom component and it seems to be working for me. I had checked your RSS feed and it seems to work.
"""Feedparser sensor"""
from __future__ import annotations
import asyncio
import re
from datetime import timedelta
import homeassistant.helpers.config_validation as cv
import voluptuous as vol
from dateutil import parser
from homeassistant.components.sensor import PLATFORM_SCHEMA, SensorEntity
from homeassistant.const import CONF_NAME, CONF_SCAN_INTERVAL
from homeassistant.core import HomeAssistant
from homeassistant.helpers.entity_platform import AddEntitiesCallback
from homeassistant.helpers.typing import ConfigType, DiscoveryInfoType
import feedparser
__version__ = "0.1.6"
COMPONENT_REPO = "https://github.com/custom-components/sensor.feedparser/"
REQUIREMENTS = ["feedparser"]
CONF_FEED_URL = "feed_url"
CONF_DATE_FORMAT = "date_format"
CONF_INCLUSIONS = "inclusions"
CONF_EXCLUSIONS = "exclusions"
CONF_SHOW_TOPN = "show_topn"
DEFAULT_SCAN_INTERVAL = timedelta(hours=1)
PLATFORM_SCHEMA = PLATFORM_SCHEMA.extend(
{
vol.Required(CONF_NAME): cv.string,
vol.Required(CONF_FEED_URL): cv.string,
vol.Required(CONF_DATE_FORMAT, default="%a, %b %d %I:%M %p"): cv.string,
vol.Optional(CONF_SHOW_TOPN, default=9999): cv.positive_int,
vol.Optional(CONF_INCLUSIONS, default=[]): vol.All(cv.ensure_list, [cv.string]),
vol.Optional(CONF_EXCLUSIONS, default=[]): vol.All(cv.ensure_list, [cv.string]),
vol.Optional(CONF_SCAN_INTERVAL, default=DEFAULT_SCAN_INTERVAL): cv.time_period,
}
)
@asyncio.coroutine
def async_setup_platform(
hass: HomeAssistant,
config: ConfigType,
async_add_devices: AddEntitiesCallback,
discovery_info: DiscoveryInfoType | None = None,
) -> None:
async_add_devices(
[
FeedParserSensor(
feed=config[CONF_FEED_URL],
name=config[CONF_NAME],
date_format=config[CONF_DATE_FORMAT],
show_topn=config[CONF_SHOW_TOPN],
inclusions=config[CONF_INCLUSIONS],
exclusions=config[CONF_EXCLUSIONS],
scan_interval=config[CONF_SCAN_INTERVAL],
)
],
True,
)
class FeedParserSensor(SensorEntity):
def __init__(
self,
feed: str,
name: str,
date_format: str,
show_topn: str,
exclusions: str,
inclusions: str,
scan_interval: int,
) -> None:
self._feed = feed
self._attr_name = name
self._attr_icon = "mdi:rss"
self._date_format = date_format
self._show_topn = show_topn
self._inclusions = inclusions
self._exclusions = exclusions
self._scan_interval = scan_interval
self._attr_state = None
self._entries = []
self._attr_extra_state_attributes = {"entries": self._entries}
def update(self):
parsed_feed = feedparser.parse(self._feed)
if not parsed_feed:
return False
else:
self._attr_state = (
self._show_topn
if len(parsed_feed.entries) > self._show_topn
else len(parsed_feed.entries)
)
self._entries = []
for entry in parsed_feed.entries[: self._attr_state]:
entry_value = {}
for key, value in entry.items():
if (
(self._inclusions and key not in self._inclusions)
or ("parsed" in key)
or (key in self._exclusions)
):
continue
if key in ["published", "updated", "created", "expired"]:
value = parser.parse(value).strftime(self._date_format)
entry_value[key] = value
if "image" in self._inclusions and "image" not in entry_value.keys():
images = []
if "summary" in entry.keys():
images = re.findall(
r"<img.+?src=\"(.+?)\".+?>", entry["summary"]
)
if images:
entry_value["image"] = images[0]
else:
if "links" in entry.keys():
images = re.findall(
'(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-&?=%.]+', str(entry["links"][1])
)
if images:
entry_value["image"] = images[0]
else:
entry_value[
"image"
] = "https://www.home-assistant.io/images/favicon-192x192-full.png"
self._entries.append(entry_value)
@property
def state(self):
"""Return the state of the sensor."""
return self._attr_state
@property
def extra_state_attributes(self):
return {"entries": self._entries}
#81 should fix this issue. Could you please try the beta release I did and tell me if images show up for you? https://github.com/custom-components/feedparser/releases/tag/0.2.0b0
If they do not show up, could you please provide the feed URL, so I can investigate?
Note: #78, #57 and #64 should be addressed and fixed by #81.
Same here - trying to get image url from media:content element inside RSS item:
<media:content url="https://g.delfi.lt/images/pix/518x0/BvXDZaXkhRU/rusijos-karas-pries-ukraina-94097143.jpg"/>
@Paktas What version of the integration do you run?