Getting fast and accurate notifications for site changes?

On the side, I work as a digital coordinator for a Danish classical music label.

I have to keep diligent tabs on reviews of classical music from a Danish newspaper, specifically this reporter and site as it updates: https://politiken.dk/person/2780_Thomas_Michelsen

There is no RSS-feed as far as I can see to this specific site otherwise I could set up an IFTTT with an email.

What would be the best way for me be able to get quick notifications on any new review from this site?

You could set up a monitoring alert with VisualPing: https://visualping.io
It used to be ChangeDetect.com (or something like that). The service used to be free, but now they charge. But if it’s something you need, it may very well be worth the cost.

2 Likes

https://blogtrottr.com/

1 Like

See if you can make a custom RSS feed with Feed43. It’s worked for me before.

1 Like

This solution needs a working rss as far as I can tell, no? I tried it with this site: https://www.kristeligt-dagblad.dk/bruger/781

it’s a little complicated that service, it serves me a bunch of code when I enter this link https://www.kristeligt-dagblad.dk/bruger/781 in an attempt to create a new feed. Can you help?

I have used visualping before and it’s a good service, but I often run out pretty fast of the free checks unfortunately

Happy to help. I’ve configured the feed—does this look good?

https://feed43.com/2606702124374453.xml

I’ll add some notes on how to set up a feed shortly, for future reference :slight_smile:

Edit: notes as promised:

Check out Feed43’s own tutorials

Step 1. Specify source page address (URL)

Self-explanatory.

https://www.kristeligt-dagblad.dk/bruger/781

Step 2. Define extraction rules

From the raw html of the entire page, how do we get neatly formatted items for the RSS feed?

Global Search Pattern (optional):

Isolate the section of the page which contains all of the items you’re interested in

<h2 class="heading">Seneste</h2>{%}<h2 class="heading">Mest læste</h2>

Explanation:

  • Scan for <h2 class="heading">Seneste</h2>
  • Extract everything until <h2 class="heading">Mest læste</h2> (the beginning of the next section, which we are not interested in)
  • {%} is the content of interest
Item (repeatable) Search Pattern:

Extract the individual items (i.e. articles) for the RSS feed

Looking at the html, this is the full code for each article:

Click to expand
<article class="article small-wide kicker paid" data-id="2018677">
<a href="https://www.kristeligt-dagblad.dk/kultur/beethovens-langvarige-soegen-efter-det-rette-religioese-udtryk" class="link" tabindex="-1"></a>
<strong class="kicker">Boguddrag</strong>
<em class="paid">For abonnenter</em>
<time datetime="2020-04-20T00:00:00+02:00" class="publication"><span>20.04.2020</span> <span>00:00</span></time>
<div class="byline">Af Peter DĂĽrrfeld</div>
<h2 class="heading"><a href="https://www.kristeligt-dagblad.dk/kultur/beethovens-langvarige-soegen-efter-det-rette-religioese-udtryk">Beethovens langvarige søgen efter det rette religiøse udtryk</a></h2>
<div class="image" data-image="/sites/default/files/styles/*/public/2020/04/1587308084.jpg"></div>
<div class="lead">
<p>Kristeligt Dagblads musikanmelder Peter Dürrfeld er aktuel med bogen ”Beethoven. Den symfoniske mester”, som der bringes et uddrag fra her</p>
</div>
</article>

So this is what I used:

<article{*}
<time{*}
<span>{%}</span>{*}
<div class="byline">{%}</div>{*}
<h2 class="heading"><a href="{%}">{%}</a>{*}
<div class="lead">{*}
<p>{%}</p>

Explanation:

  • Scan for <article – this finds the start of the block of code corresponding to an article
  • {*} (sort of) means “go to the next search pattern”, i.e…
  • Scan for <time, then…
  • Take whatever is between <span> and </span> as the first string of interest – this is the date of the article (again, {%} represents content of interest)
  • Then scan for <div class="byline"> then take everything until </div> as the second string of interest – this is the name of the author. I didn’t end up using this in the final output since all the articles are by Peter DĂĽrrfeld
  • Along the same lines, <h2 class="heading"><a href="{%}">{%}</a>{*} extracts the link and the title of the article
  • Finally, we scan for <div class="lead"> then take everything between <p> and </p> as the content of the article

All going well, we have the following to work with:

Item 1 <Sun, 26 Apr 2020 01:25:23 GMT>
{%1} = 20.04.2020
{%2} = Af Peter DĂĽrrfeld
{%3} = https://www.kristeligt-dagblad.dk/kultur/beethovens-langvarige-soegen-efter-det-rette-religioese-udtryk
{%4} = Beethovens langvarige søgen efter det rette religiøse udtryk
{%5} = Kristeligt Dagblads musikanmelder Peter Dürrfeld er aktuel med bogen ”Beethoven. Den symfoniske mester”, som der bringes et uddrag fra her

Item 2 ... 

The rest is pretty self-explanatory. Set up the feed how you like, in this case:

  • Item Title Template: {%1} - {%4}
  • Item Link Template: {%3}
  • Item Content Template: {%5}

I hope my walk-through is intelligible! :grimacing:

1 Like

Thanks! That was a tremendous help, although I have no idea how you did that.

I just took the RSS-feed and set it up with an IFTTT that checks the RSS for new entries and sends me an email.

I guess I won’t know if it all works until there actually is a new article and that might be a while yet

But if you could show me how you set that up then I could I set it up for the others as well