Skip to content

FYI: Blocked domains and how to exclude them #17

@Krinkle

Description

@Krinkle

We're using https://github.com/victoriadrake/hydra-link-checker for daily link checking in CI on jQuery documentation sites, including https://api.jquery.com and https://qunitjs.com.

Example:

I noticed that over time, a number of domains have started blocking the Hydra crawler (or have questionable/broken web server configurations). I found a way to exclude these domains, but I'm not sure if this is a recommended or supported way, so I wanted to share it here for visibility, and so that others may discover it as well:

{
  "//": [
    "2023-05: twitter.com serves broken redirect-loop",
    "2024-06: npmjs.com responds HTTP 429 Too Many Requests too easily",
    "2024-07: element.io and gitter.im chat rooms render fine but oddly use HTTP 404",
    "2025-01: github.com responds HTTP 429 Too Many Requests for the bulk of trivial links. We can probably keep a few other GitHub links.",
    ""
  ],
  "exclude_scheme_prefixes": [
    "https://twitter.com/",
    "https://app.element.io/",
    "https://app.gitter.im/",
    "https://www.npmjs.com/package/",
    "https://github.com/qunitjs/qunit/issues/",
    "https://github.com/qunitjs/qunit/pull/",
    "https://github.com/qunitjs/qunit/blob/main/docs/_posts/"
  ]

Feel free to close this!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions