2
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2024-08-28 12:01:29 +02:00
miniflux-v2/reader/scraper
Jouni K. Seppänen dcf87bd642 Add scrape and rewrite rules for quantamagazine
This is a somewhat complex React site so the rules could be a little fragile.
Text content seems to be always inside .outer--content, and most h6 elements
are fluff like "read later" or pointers to other articles. However, h6.byline
and h6.post__title__kicker are relevant to the current article.

Figure captions are sometimes inside both figure and div.outer--content
elements, sometimes only inside figure, so take both and remove the
intersection.

The figure elements sometimes contain multiple copies of images or
videos, and we just take them all. Math articles seem to use Mathjax,
which we don't add.
2022-01-03 10:10:13 -08:00
..
testdata Return outer HTML when scraping elements 2019-12-21 21:18:31 -08:00
doc.go Add missing package descriptions for GoDoc 2018-10-08 17:32:17 -07:00
rules.go Add scrape and rewrite rules for quantamagazine 2022-01-03 10:10:13 -08:00
scraper.go add proxy arg in scraper.Fetch 2021-08-28 21:57:11 -07:00
scraper_test.go Remove deprecated io/ioutil package 2021-02-16 21:25:21 -08:00