-
-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
We're using https://github.com/victoriadrake/hydra-link-checker for daily link checking in CI on jQuery documentation sites, including https://api.jquery.com and https://qunitjs.com.
Example:
- https://github.com/qunitjs/qunit/blob/0d59037215/.github/workflows/spider-check.yaml
- https://github.com/qunitjs/qunit/blob/0d59037215/build/hydra-config.json
I noticed that over time, a number of domains have started blocking the Hydra crawler (or have questionable/broken web server configurations). I found a way to exclude these domains, but I'm not sure if this is a recommended or supported way, so I wanted to share it here for visibility, and so that others may discover it as well:
{
"//": [
"2023-05: twitter.com serves broken redirect-loop",
"2024-06: npmjs.com responds HTTP 429 Too Many Requests too easily",
"2024-07: element.io and gitter.im chat rooms render fine but oddly use HTTP 404",
"2025-01: github.com responds HTTP 429 Too Many Requests for the bulk of trivial links. We can probably keep a few other GitHub links.",
""
],
"exclude_scheme_prefixes": [
"https://twitter.com/",
"https://app.element.io/",
"https://app.gitter.im/",
"https://www.npmjs.com/package/",
"https://github.com/qunitjs/qunit/issues/",
"https://github.com/qunitjs/qunit/pull/",
"https://github.com/qunitjs/qunit/blob/main/docs/_posts/"
]Feel free to close this!
mgol
Metadata
Metadata
Assignees
Labels
No labels