Java and HTML parsing.

From:

Mathias Mejborn <mathiasSNABELmejborn.dk>

Newsgroups:

comp.lang.java.programmer

Date:

Mon, 07 May 2007 16:49:09 +0200

Message-ID:

<463f3c03$0$931$edfadb0f@dread12.news.tele.dk>

Hello.

Iam trying to make my first html parser in Java, but i have some
problems that i can't figure out how to solve.

The interesting method in my program looks like this:

public void findHTML(){
 try{
 while (s != null){
 if(s.indexOf("title=\"DR1\"")>-1){
 System.out.println("DR1 fundet");
 dr1Fundet = true;
 if(dr1Fundet){

 int start = s.indexOf("style=\"margin:0px;\">")+20;
 System.out.println("Udskriver start: " + start);

tid = s.substring(start,5);
 System.out.println("Udskriver tid" + tid);
 }
 }
 s = ind.readLine();
 }
 }catch(Exception e){}
 }

(I hope that the code turns out right when i post this).

What iam trying to achieve is:

On the website http://ontv.dk/tv/1 i would like to parse the following html:

 Senere i dag p?
DR1<table cellspacing="0" style="width:100%;"><tr
style="background-color:#eeeeee;"><td style="width:40px;
text-align:right;">17.00:</td><td><a href="/programinfo/11178550000">Troldspejlet

You can see the html block starting on line 159 in the html source, and
ending on line 171.

What i want to extract from the html is: 17.00 followed by Troldspejlet.

My problem is that i can't figure out how to do this in any way, hope
some of you would help me out.