Re: Regular Expression to match the domain part of an email address
emzyme20@hotmail.com writes:
Er, no. It's made optional by adding a ? in the end, like so:
([\\*\\w\\-]{0,61}[\\*\\w])?
Your original expression contained two of these already, so I
thought you knew this. Other optional expressions are E* and
E{0,61} but they also repeatable.
heh thanks for that.. I inherited this particular piece of code. I'm
trying to diagnose and fix a few problems that have been highlighted
since conception. When I sat down with the expression and separated
it into sections following a guide I was using, it stated that ?
stood for 1 or more times, so that's why I never noticed that.
Ok, here are some suggestions. First, if the guide really says ?
stands for one or more, don't trust it. Sun's documentation for
java.util.Pattern is actually rather good:
<http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html>
Second, my main point here, this particular pattern is a good
candidate for some abstraction, because it contains a repeated
sub-pattern. Tame it by naming that pattern. I do this below starting
from Robert Klemme's pattern and putting in that `*' that you want.
Third, I'm not convinced that you need to bother with {0,61}. That 61
is so many that I would just use *.
Fourth, when a single expression becomes unwieldy, you may be able to
write separate tests. One test to see that only the allowed characters
are used, another to see that the input starts and ends properly, for
example.
Consider this:
class Roska { public static void main(String [] args) {
// Wrapping `word' in (?: ) is a redundant safety
// measure here, but matters a lot if `word' ends
// before a quantifier or something.
String word = "(?:[a-z*](?:[a-z0-9\\-*]*[a-z0-9*])?)";
String words = "(?:" + word + "[.])+" + word;
for (int k = 0 ; k < args.length ; ++ k) {
System.out.println(args[k].matches(words));
}
}}
It seems to work. One or more words ending in a period, and then one
more word, where a word starts with ...
I'm not sure if the escape is needed for `-' in a character class, and
Sun does not seem to tell. It appears to work with or without.