Skip to content
Oeiuwq Faith Blog OpenSource Porfolio

mgdm/htmlq

Like jq, but for HTML.

mgdm/htmlq.json
{
"createdAt": "2019-05-07T20:55:20Z",
"defaultBranch": "master",
"description": "Like jq, but for HTML.",
"fullName": "mgdm/htmlq",
"homepage": null,
"language": "Rust",
"name": "htmlq",
"pushedAt": "2024-05-29T03:40:49Z",
"stargazersCount": 7449,
"topics": [],
"updatedAt": "2025-11-24T08:17:59Z",
"url": "https://github.com/mgdm/htmlq"
}

Like jq, but for HTML. Uses CSS selectors to extract bits of content from HTML files.

Terminal window
cargo install htmlq
Terminal window
pkg install htmlq
Terminal window
brew install htmlq
Terminal window
scoop install htmlq
Terminal window
$ htmlq -h
htmlq 0.4.0
Michael Maclean <michael@mgdm.net>
Runs CSS selectors on HTML
USAGE:
htmlq [FLAGS] [OPTIONS] [--] [selector]...
FLAGS:
-B, --detect-base Try to detect the base URL from the <base> tag in the document. If not found, default to
the value of --base, if supplied
-h, --help Prints help information
-w, --ignore-whitespace When printing text nodes, ignore those that consist entirely of whitespace
-p, --pretty Pretty-print the serialised output
-t, --text Output only the contents of text nodes inside selected elements
-V, --version Prints version information
OPTIONS:
-a, --attribute <attribute> Only return this attribute (if present) from selected elements
-b, --base <base> Use this URL as the base for links
-f, --filename <FILE> The input file. Defaults to stdin
-o, --output <FILE> The output file. Defaults to stdout
-r, --remove-nodes <SELECTOR>... Remove nodes matching this expression before output. May be specified multiple
times
ARGS:
<selector>... The CSS expression to select [default: html]
$

Using with cURL to find part of a page by ID

Section titled “Using with cURL to find part of a page by ID”
Terminal window
$ curl --silent https://www.rust-lang.org/ | htmlq '#get-help'
<div class="four columns mt3 mt0-l" id="get-help">
<h4>Get help!</h4>
<ul>
<li><a href="https://doc.rust-lang.org">Documentation</a></li>
<li><a href="https://users.rust-lang.org">Ask a Question on the Users Forum</a></li>
<li><a href="http://ping.rust-lang.org">Check Website Status</a></li>
</ul>
<div class="languages">
<label class="hidden" for="language-footer">Language</label>
<select id="language-footer">
<option title="English (US)" value="en-US">English (en-US)</option>
<option title="French" value="fr">Français (fr)</option>
<option title="German" value="de">Deutsch (de)</option>
</select>
</div>
</div>
Terminal window
$ curl --silent https://www.rust-lang.org/ | htmlq --attribute href a
/
/tools/install
/learn
/tools
/governance
/community
https://blog.rust-lang.org/
/learn/get-started
https://blog.rust-lang.org/2019/04/25/Rust-1.34.1.html
https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html
[...]
Terminal window
$ curl --silent https://nixos.org/nixos/about.html | htmlq --text .main
About NixOS
NixOS is a GNU/Linux distribution that aims to
improve the state of the art in system configuration management. In
existing distributions, actions such as upgrades are dangerous:
upgrading a package can cause other packages to break, upgrading an
entire system is much less reliable than reinstalling from scratch,
you can’t safely test what the results of a configuration change will
be, you cannot easily undo changes to the system, and so on. We want
to change that. NixOS has many innovative features:
[...]

There’s a big SVG image in this page that I don’t need, so here’s how to remove it.

Terminal window
$ curl --silent https://nixos.org/ | ./target/debug/htmlq '.whynix' --remove-nodes svg
<ul class="whynix">
<li>
<h2>Reproducible</h2>
<p>
Nix builds packages in isolation from each other. This ensures that they
are reproducible and don't have undeclared dependencies, so <strong>if a
package works on one machine, it will also work on another</strong>.
</p>
</li>
<li>
<h2>Declarative</h2>
<p>
Nix makes it <strong>trivial to share development and build
environments</strong> for your projects, regardless of what programming
languages and tools you’re using.
</p>
</li>
<li>
<h2>Reliable</h2>
<p>
Nix ensures that installing or upgrading one package <strong>cannot
break other packages</strong>. It allows you to <strong>roll back to
previous versions</strong>, and ensures that no package is in an
inconsistent state during an upgrade.
</p>
</li>
</ul>

(This is a bit of a work in progress)

Terminal window
$ curl --silent https://mgdm.net | htmlq --pretty '#posts'
<section id="posts">
<h2>I write about...
</h2>
<ul class="post-list">
<li>
<time datetime="2019-04-29 00:%i:1556496000" pubdate="">
29/04/2019</time><a href="/weblog/nettop/">
<h3>Debugging network connections on macOS with nettop
</h3></a>
<p>Using nettop to find out what network connections a program is trying to make.
</p>
</li>
[...]
Terminal window
$ curl --silent example.com | htmlq 'body' | bat --language html
Syntax highlighted output