Happy to help. I’ve configured the feed—does this look good?
https://feed43.com/2606702124374453.xml
I’ll add some notes on how to set up a feed shortly, for future reference
Edit: notes as promised:
Check out Feed43’s own tutorials
Step 1. Specify source page address (URL)
Self-explanatory.
https://www.kristeligt-dagblad.dk/bruger/781
Step 2. Define extraction rules
From the raw html of the entire page, how do we get neatly formatted items for the RSS feed?
Global Search Pattern (optional):
Isolate the section of the page which contains all of the items you’re interested in
<h2 class="heading">Seneste</h2>{%}<h2 class="heading">Mest læste</h2>
Explanation:
- Scan for
<h2 class="heading">Seneste</h2>
- Extract everything until
<h2 class="heading">Mest læste</h2>
(the beginning of the next section, which we are not interested in)
-
{%}
is the content of interest
Item (repeatable) Search Pattern:
Extract the individual items (i.e. articles) for the RSS feed
Looking at the html, this is the full code for each article:
Click to expand
<article class="article small-wide kicker paid" data-id="2018677">
<a href="https://www.kristeligt-dagblad.dk/kultur/beethovens-langvarige-soegen-efter-det-rette-religioese-udtryk" class="link" tabindex="-1"></a>
<strong class="kicker">Boguddrag</strong>
<em class="paid">For abonnenter</em>
<time datetime="2020-04-20T00:00:00+02:00" class="publication"><span>20.04.2020</span> <span>00:00</span></time>
<div class="byline">Af Peter DĂĽrrfeld</div>
<h2 class="heading"><a href="https://www.kristeligt-dagblad.dk/kultur/beethovens-langvarige-soegen-efter-det-rette-religioese-udtryk">Beethovens langvarige søgen efter det rette religiøse udtryk</a></h2>
<div class="image" data-image="/sites/default/files/styles/*/public/2020/04/1587308084.jpg"></div>
<div class="lead">
<p>Kristeligt Dagblads musikanmelder Peter Dürrfeld er aktuel med bogen ”Beethoven. Den symfoniske mester”, som der bringes et uddrag fra her</p>
</div>
</article>
So this is what I used:
<article{*}
<time{*}
<span>{%}</span>{*}
<div class="byline">{%}</div>{*}
<h2 class="heading"><a href="{%}">{%}</a>{*}
<div class="lead">{*}
<p>{%}</p>
Explanation:
- Scan for
<article
– this finds the start of the block of code corresponding to an article
-
{*}
(sort of) means “go to the next search pattern”, i.e…
- Scan for
<time
, then…
- Take whatever is between
<span>
and </span>
as the first string of interest – this is the date of the article (again, {%}
represents content of interest)
- Then scan for
<div class="byline">
then take everything until </div>
as the second string of interest – this is the name of the author. I didn’t end up using this in the final output since all the articles are by Peter Dürrfeld
- Along the same lines,
<h2 class="heading"><a href="{%}">{%}</a>{*}
extracts the link and the title of the article
- Finally, we scan for
<div class="lead">
then take everything between <p>
and </p>
as the content of the article
All going well, we have the following to work with:
Item 1 <Sun, 26 Apr 2020 01:25:23 GMT>
{%1} = 20.04.2020
{%2} = Af Peter DĂĽrrfeld
{%3} = https://www.kristeligt-dagblad.dk/kultur/beethovens-langvarige-soegen-efter-det-rette-religioese-udtryk
{%4} = Beethovens langvarige søgen efter det rette religiøse udtryk
{%5} = Kristeligt Dagblads musikanmelder Peter Dürrfeld er aktuel med bogen ”Beethoven. Den symfoniske mester”, som der bringes et uddrag fra her
Item 2 ...
The rest is pretty self-explanatory. Set up the feed how you like, in this case:
- Item Title Template:
{%1} - {%4}
- Item Link Template:
{%3}
- Item Content Template:
{%5}
I hope my walk-through is intelligible!