Re: Junit - "Credible" HTML checker?

From:

=?ISO-8859-1?Q?Arne_Vajh=F8j?= <arne@vajhoej.dk>

Newsgroups:

comp.lang.java.programmer

Date:

Fri, 07 Aug 2009 14:04:05 -0400

Message-ID:

<4a7c6c8a$0$303$14726298@news.sunsite.dk>

Jean-Baptiste Nizet wrote:

Tom Anderson a ?crit :

On Thu, 6 Aug 2009, bugbear wrote:

I have some routines that generate HTML;
it would be useful if (in my unit testing)
I had a quick and dirty "is this valid HTML" test.

I don't need an html renderer - something
cruddy based on "likely" looking regexps would
suit me very well.

I'm simply trying to avoid doing full deploy + interactive
testing of stuff (html) which isn't even "likely".

Does anyone know of anything?

The Rolls-Royce here is HtmlUnit, which is a complete headless browser
- it reads HTML, parses CSS, runs javascript (courtesy of Rhino), etc.
It has interfaces which make it easy to ask questions like "get me all
the div elements", "get me all the paragraph elements with class
errorReport", "get me the text content of this element", etc, which is
what you need for testing.

It's built on top of NekoHTML, which is a pretty decent HTML parser.
Other popular parsers are JTidy and TagSoup, but i think those are
more lenient in their parsing (Neko can be lenient, but tends more
towards strictness), and for what you want to do, you don't want
leniency.

Apologies for the lack of URLs, but you strike me as the kind of chap
who is quite capable of using google!

The problem with HtmlUnit (in this particular case) is precisely that it
tries to work like a real browser, which means that it'll do his best to
give you a dom tree even if the HTML is not valid at all.

If super strict parsing is needed, then XHTML and a regular XML
parser is an option.

Arne