Re: regex capability

From:

Robert Klemme <shortcutter@googlemail.com>

Newsgroups:

comp.lang.java.programmer

Date:

Tue, 5 Apr 2011 06:33:36 -0700 (PDT)

Message-ID:

<29adb57e-c515-4859-910c-6393ac812fa6@1g2000yqq.googlegroups.com>

On 5 Apr., 14:28, Patricia Shanahan <p...@acm.org> wrote:

On 4/5/2011 2:10 AM, Paul Cager wrote:

On Apr 5, 2:35 am, markspace<-@.> wrote:

On 4/4/2011 1:13 PM, Robert Klemme wrote:

if ( m.matches() ) {
for (m = number.matcher(m.group(1)); m.find();) {
int x = Integer.parse(m.group());
}

Why re-invent the wheel?

In this case I just wanted to demonstrate the strategy to first check
overall validity of the input and extract the interesting part and
then ripping that interesting part apart. Whether a Scanner or
another Matcher is used for the second step wasn't that important to
me. Also, the thread is called "regex capability". :-)

But, of course, your approach using the Scanner is perfectly
compatible with the two step strategy as Patricia also pointed
out. :-)

public class ScannerTest {
      public static void main(String[] args) {
          StringReader in = new StringReader(
                  "Support DDR2 100/200/300/400 DDR2=

SDRAM");

          Scanner scanner = new Scanner(in);
          scanner.useDelimiter( "[^0-9]+" );
          while( scanner.hasNextInt() ) {
              System.out.println( scanner.nextInt() );
          }
      }

}

(Lightly tested.)

$ java ScannerTest
2
100
200
300
400
2

This is a nice illustration of the case for a strategy I often use in
this sort of situation, combining tools using each to do the jobs it
does best.

For example, a regular expression match could pull out the
"100/200/300/400" substring, and a Scanner could extract the integers
from that. More generally, it could be split and then each of the split
results processed some other way.

I generally prefer scanning over splitting in those cases. The
difference might be negligible for this case but assuming that the
original pattern changes (e.g. because we want to allow "@" as
separator instead of or additionally to "/") then for the split
approach two patterns need to be changed while for scanning of
integers (pattern \d+) only the master pattern needs to change. Also,
with scanning it is clear what I want (positively defining the matched
portion) while with splitting it is not so clear (negatively defining
what I do not want, the separator) - but that leaves a lot of room for
what is returned from _between_ separators.

Kind regards

robert