Re: Parsing XML schema- variable attributes

From:

ram@zedat.fu-berlin.de (Stefan Ram)

Newsgroups:

comp.lang.java.programmer

Date:

20 Sep 2008 01:09:12 GMT

Message-ID:

<multiattributes-XML-20080920030633@ram.dialup.fu-berlin.de>

Mike <mikes3959@yahoo.com> writes:

So, there's absolutely no way to parse attributes with same name?

  Not in XML.

  But you can have multiple IDREFs per attribute value in XML.

http://www.w3.org/TR/2000/REC-xml-20001006.html#idref

  (see there for ?IDREFS?)

  I have specified and implemented a data language ?Unotal? that
  directly handles multiple values with the same attribute name
  indeed. For example:

import java.lang.String;
import java.lang.System;
import de.dclj.ram.notation.unotal.Room;
import static de.dclj.ram.notation.unotal.RoomFromModule.room;

public final class Main
{ public static void main( final String argv[] )
  { System.out.println( room( "< a=b >" ).get( "a" ));
    System.out.println( room( "< a=b >" ).get( "a" ).getClass() );

    System.out.println( room( "< a=b a=c >" ).get( "a" ));
    System.out.println( room( "< a=b a=c >" ).get( "a" ).getClass() );

    System.out.println( room( "< >" ).getValues( "a" ));
    System.out.println( room( "< >" ).getValues( "a" ).getClass() );

    System.out.println( room( "< a=b >" ).getValues( "a" ));
    System.out.println( room( "< a=b >" ).getValues( "a" ).getClass() );

    System.out.println( room( "< a=b a=b >" ).getValues( "a" ));
    System.out.println( room( "< a=b a=b >" ).getValues( "a" ).getClass() );

    System.out.println( room( "< a=b a=c >" ).getValues( "a" ));
    System.out.println( room( "< a=b a=c >" ).getValues( "a" ).getClass() ); }}

System.out

b
class de.dclj.ram.notation.unotal.StringValue

[b, c]
class de.dclj.ram.notation.unotal.SprayValue

[]
class de.dclj.ram.notation.unotal.SprayValue

[b]
class java.util.HashSet

[b]
class java.util.HashSet

[b, c]
class java.util.HashSet

  For more about this:

http://www.purl.org/stefan_ram/pub/junotal_tutorial

  I have written an XML criticism, but this has not yet incorporated
  the possibility to use an IDREFS attribute in XML. Still the rest
  of it is valid: (The rest of this post is my XML criticism.)

  When a new document type is to be defined, when should one
  choose child elements and when attributes?

  The criterion that makes sense regarding the meaning can not
  be used in XML due to syntactic restrictions.

  An element is describing something. A description is an
  assertion. An assertion might contain unary predicates or
  binary relations.

  Comparing this structure of assertions with the structure
  of XML, it seems to be natural to represent unary predicates
  with types and binary relations with attributes.

  Say, "x" is a rose and belongs to Jack. This assertion can
  be written in a more formal way to show the relations used:

rose( x ) ^ owner( x, Jack )

  This is written in XML as:

<rose owner="Jack" />

  Thus, my answer would be: use element types for unary
  predicates and attributes for binary relations.

  Unfortunately, in XML, this is not always possible, because
  in XML:

    - there might be at most one type per element,

    - there might be at most one attribute value per attribute
      name, and

    - attribute values are not allowed to be structured in
      XML.

  Therefore, the designers of XML document types are forced to
  abuse element /types/ in order to describe the /relation/
  of an element to its parent element.

  This /is/ an abuse, because the designation "element type"
  obviously is supposed to give the /type of an element/,
  i.e., a property which is intrinsic to the element alone
  and has nothing to do with its relation to other elements.

  The document type designers, however, are being forced to
  commit this abuse, to reinvent poorly the missing structured
  attribute values using the means of XML. If a rose has two
  owners, the following element is not allowed in XML:

<rose owner="Jack" owner="Jill" />

  One is made to use representations such as the following:

<rose>
  <owner>Jack</owner>
  <owner>Jill</owner></rose>

  Here the notion "element type" suggests that it is marked
  that Jack is "an owner", in the sense that "owner" is
  supposed to be the type (the kind) of Jack. Not an
  "owner of ..." (which would make sense), but just "an owner".

  The intention of the author, however, is that "owner" is
  supposed to give the /relation/ to the containing element
  "rose". This is the natural field of application for
  attributes, as the meaning of the word "attribute" outside
  of XML clearly indicates, but it is not possible to
  always use attributes for this purpose in XML.

  An alternative solution might be the following notation.

<rose owner="Jack Jill" />

  Here a /new/ mini language (not XML anymore) is used within
  an attribute value, which, of course, can not be checked
  anymore by XML validators. This is really done so, for
  example, in XHTML, where classes are written this way.

  So in its most prominent XML application XHTML, the W3C
  has to abandon XML even to write class attributes. This
  is not such a good accomplishment given that the W3C
  was able to use the experience made with SGML and HTML
  when designing XML.

  The needless restrictions of XML inhibit the meaningful
  use of syntax. This makes many document type designers
  wonder, when attributes and when elements
  should be used, which actually is an evidence of
  incapacity for the design of XML: XML does not have many
  more notations than these two: attributes and elements.
  And now the W3C failed to give even these two
  notations a clear and meaningful dedication!

  Without the restrictions described, XML alone would have
  nearly the expressive power of RDF/XML, which has to repair
  painfully some of the errors made in the XML-design.

  Now, some "experts" recommend to /always/ use subelements,
  because one can never know whether an attribute value
  that seems to be unstructured today might need to become
  structured tomorrow. Other "experts" recommend to use
  attributes only when one is quite confident that they
  never will need to be structured. This recommendation
  does not even try to make a sense out of attributes,
  but just explains how to circumvent the obstacles
  the W3C has built into XML.

  Others recommend to use attributes for something they
  call "metadata". They ignore that this limits "metadata"
  to unstructured values.

  Others use an XML editor that happens to make the input of
  attributes more comfortable than the input of elements and
  seriously suggest, therefore, to use as many attributes as
  possible.

  Still others have studied how to use CSS to format XML
  documents and are using this to give recommendations about
  when to use attributes and when to use subelements. (So
  that the resulting document can be formatted most easily
  with CSS.)

  Of course: Mixing all these criteria (structured vs.
  unstructured, data vs. "metadata", by CSS, by the ease of
  editing, ...) often will give conflicting recommendations.

  Certain other notations than XML have solved the problem
  by either omitting attributes altogether or by allowing
  structured attributes.