Re: Help with regular expression
G Winstanley =E3=81=AE=E3=83=A1=E3=83=83=E3=82=BB=E3=83=BC=E3=82=B8:
On 20 Aug 2006 22:31:03 -0700, the cup of "hiwa" <HGA03630@nifty.ne.jp>
overfloweth with the following:
Any clearer?
No. Is the *only* requirement for XXX and YYY that they are not HTML
comment?
Then, what should we do for them XXX and YYY?
Please formalize the requirement.
Or, show typical examples of input texts and desired results from them.
Yes, that is the only requirement, that it should not contain HTML
comments. For example, given:
<tr><td class="caption"><!--@caption--><!--<br />(@caption)--></td></tr>
the appropriate regex should produce a match for this:
<!--<br />(@caption)-->
and this should NOT match:
<!--@caption--><!--<br />(@caption)-->
which is the current problem.
Similarly, the text:
foo<!--<a href="#bar">(@caption)</a>-->wibble
should provide a match for:
<!--<a href="#bar">(@caption)</a>-->
but this should NOT give any matches:
foo<!--<a href="#bar"><!--Link-->(@caption)</a>-->wibble
Stan
If the problem is complex, it is better not
to try to devise one single complex regular
expression. Divide and conquer method
is much better both for ease of development
and readability of the code.
--------------------------------------------------------------------------
import java.util.regex.*;
import java.util.*;
public class Grost2{
public static void main(String[] args){
String[] texts = {
"<!--<br />(@caption)-->",
"<!--@caption--><!--<br />(@caption)-->",
"<!--<a href=\"#bar\">(@caption)</a>-->",
"<!--<a href=\"#bar\"><!--Link-->(@caption)</a>-->"
};
ArrayList<String> as;
for (String s : texts){
as = getMatches(s);
if (as != null){
for (String m : as){
System.out.println(m);
}
}
else { // null List
System.out.println("--invalid input text--");
}
System.out.println();
}
}
static ArrayList<String> getMatches(String text){
String comreg = "(.*?)<!--.*?-->(.*?)";
ArrayList<String> m = new ArrayList<String>();
Pattern pat = Pattern.compile(comreg);
String[] subs = text.split("\\(@caption\\)");
Matcher[] mats = new Matcher[subs.length];
for (int i = 0; i < subs.length; ++i){ // make matcher for each
substr
mats[i] = pat.matcher(subs[i]);
}
for (int i = 0; i < subs.length; ++i){ // do matching
if (mats[i].find()){ // found HTML comment
return null; // error value
}
}
for (int i = 0; i < subs.length; ++i){
m.add("(" + (i + 1) + ") " + trim(subs[i])); // prepend (n) to
substr
}
return m;
}
static String trim(String s){ // trim <!-- and --> from split substr
if (s.endsWith("-->")){
return s.substring(0, s.length() - 3);
}
else if (s.startsWith("<!--")){
return s.substring(4);
}
else{
return s;
}
}
}