news icon indicating copy to clipboard operation
news copied to clipboard

items become unread after some time

Open jastgasi opened this issue 3 years ago • 3 comments

Explain the Problem

Many items of different feeds get unread again after several hours or even days. I do not think that the feed items were updated on the feed (which would look like the news app marked the items as unread), because 1) they are too many and 2) the news app shows the age of the feed item for example as "yesterday". Different users and different feeds and different items are affected. Deleting the feeds and adding news feeds with the same URL (command line) does not help.

Steps to Reproduce

This occured at some point that I cannot tell. Feeds that were followed for years suddenly started this behaviour.

System Information

News app version: 18.1.1 Nextcloud version: 24.0.4 Cron type: system-cron PHP version: 7.4.3 Database and version: MariaDB 10.3 Browser and version: multiple Distribution and version: Server: ubuntu 20.04.5-LTS, client: multiple

What more information can I supply? Thanks!

jastgasi avatar Sep 17 '22 06:09 jastgasi

Hi there,

this is a difficult problem. If this was due to a common bug in the app we would have a lot of reports about this. We did have a similar issue some time ago but that was fixed.

You will have to do some debugging on your own since it is not reproducible.

My approach would be to login to the dB, mark all items of a given feed, that has this issue, as read. Then query for the items of that feed and store them in some temporary file.

Next time that feed is again marked as unread login to dB again and query for the same feed.

Compare the result with the temporary file, IDs of items should be the same, did any of the other values change.

You can also enable debug logging in Nextcloud, that will create much more logs. Maybe you can find some pattern or hints regarding this behaviour.

I would debug this myself but none of my instances behaves like this.

Grotax avatar Sep 17 '22 07:09 Grotax

Hi Grotax,

I managed to get an example from the feed at https://www.heise.de/rss/heise.rdf

Using this SQL command: select * from oc_news_items where id=284905 and feed_id=89; I got yesterday morning:

284905  f541373ad20e6f90c189a7b1d72b1369        immer wieder kapern phisher fremde instagram-accounts. wir sind einem besonderen fall nachgegangen und erklären, wie sie ihren account schützen.heise+ | sicherheit: erste hilfe bei geknacktem instagram-accounthttps://www.heise.de/ratgeber/sicherheit-erste-hilfe-bei-geknacktem-instagram-account-7263417.html?wt_mc=rss.red.ho.ho.rdf.beitrag_plus.beitrag_plus  http://heise.de/-7263417        https://www.heise.de/ratgeber/Sicherheit-Erste-Hilfe-bei-geknacktem-Instagram-Account-7263417.html?wt_mc=rss.red.ho.ho.rdf.beitrag_plus.beitrag_plus      heise+ | Sicherheit: Erste Hilfe bei geknacktem Instagram-Account       NULL    1663401600      <p><a target="_blank" rel="noreferrer" href="https://www.heise.de/ratgeber/Sicherheit-Erste-Hilfe-bei-geknacktem-Instagram-Account-7263417.html?wt_mc=rss.red.ho.ho.rdf.beitrag_plus.beitrag_plus"><img src="https://www.heise.de/scale/geometry/450/q80//imgs/18/3/6/0/8/2/0/3/ct2022instagram_albert_hulm_121579_rei_online-ffa0b666976c6345.jpeg" alt="" /></a></p><p>Immer wieder kapern Phisher fremde Instagram-Accounts. Wir sind einem besonderen Fall nachgegangen und erklären, wie Sie Ihren Account schützen.</p>    NULL    NULL    89      1663402875474902        0       4840ccc769489da41405c700868846f7     4840ccc769489da41405c700868846f7        0       0       NULL    NULL    []      NULL

Now the item reappeared with this DB contents:

285291  f541373ad20e6f90c189a7b1d72b1369        immer wieder kapern phisher fremde instagram-accounts. wir sind einem besonderen fall nachgegangen und erklären, wie sie ihren account schützen.heise+ | sicherheit: erste hilfe bei geknacktem instagram-accounthttps://www.heise.de/ratgeber/sicherheit-erste-hilfe-bei-geknacktem-instagram-account-7263417.html?wt_mc=rss.red.ho.ho.rdf.beitrag_plus.beitrag_plus  http://heise.de/-7263417        https://www.heise.de/ratgeber/Sicherheit-Erste-Hilfe-bei-geknacktem-Instagram-Account-7263417.html?wt_mc=rss.red.ho.ho.rdf.beitrag_plus.beitrag_plus      heise+ | Sicherheit: Erste Hilfe bei geknacktem Instagram-Account       NULL    1663401600      <p><a target="_blank" rel="noreferrer" href="https://www.heise.de/ratgeber/Sicherheit-Erste-Hilfe-bei-geknacktem-Instagram-Account-7263417.html?wt_mc=rss.red.ho.ho.rdf.beitrag_plus.beitrag_plus"><img src="https://www.heise.de/scale/geometry/450/q80//imgs/18/3/6/0/8/2/0/3/ct2022instagram_albert_hulm_121579_rei_online-ffa0b666976c6345.jpeg" alt="" /></a></p><p>Immer wieder kapern Phisher fremde Instagram-Accounts. Wir sind einem besonderen Fall nachgegangen und erklären, wie Sie Ihren Account schützen.</p>    NULL    NULL    89      1663499709053653        0       4840ccc769489da41405c700868846f7     4840ccc769489da41405c700868846f7        1       0       NULL    NULL    []      NULL

So the news article got a new ID, a new modification timestamp (which is indeed from today) and is unread again. The rest is identical. In the feed the item looks show a publication date from yesterday. Where is the modification timestamp coming from?

Do you have an idea what might have happened?

Thanks!

[done.] PS: please tell me if I can improve the markup somehow. my post is hard to read right now, sorry for that.

jastgasi avatar Sep 18 '22 13:09 jastgasi

Hi, based on that my first impression would e that they republished these items.

It's not uncommon that news outlets do that, like they change the title or the first snippet. Depending on how their CMS processes that it becomes a new/updated entry in the feed file.

If that is the case then you would not even notice it if you would read the news less frequently 😁

The pub_date is a field of the items, they set that to a disired date. But it's not necessarily the date it gets published or updated at.

They also have a modification timestamp which is used for that last modified field.

Apart from that you could check how often your feeds get updated in the settings. Maybe if you increase the value you won't see that anymore. Also if your number of read items to keep per feed, is low you might run into the situation that an item gets deleted from the dB but it is still in the feed file, then it gets added again as a new item.

Regarding the formatting:

\```
code
\```

Without \

Grotax avatar Sep 18 '22 17:09 Grotax

Hi Grotax,

sorry for the delay (we are moving...). I can understand your impression that the problem comes from minor updates in the items themselves. Anyway I doubt that because so many items and feeds show this behavior and all of this feeds have not done this before. For me now it is impossible to follow the feeds as the items are replicated that often that I needed to go through about 200 items per feed each day.

I am trying to adjust the settings now and will report if this helps.

Thank you!

jastgasi avatar Sep 25 '22 06:09 jastgasi

Hi Grotax,

as nextcloud admin I increased in the webinterface settings for the news app the number of read items to keep (now 500 per feed) and the cleaning interval (now 60000 seconds = 16,7 hours). After a week of testing I can say that this improves the behaviour immensely.

Thank you!

jastgasi avatar Oct 03 '22 08:10 jastgasi