Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmarshalling hexBinary with leading whitespaces yields null #312

Open
netmikey opened this issue Jun 14, 2024 · 8 comments · May be fixed by #313
Open

Unmarshalling hexBinary with leading whitespaces yields null #312

netmikey opened this issue Jun 14, 2024 · 8 comments · May be fixed by #313

Comments

@netmikey
Copy link

netmikey commented Jun 14, 2024

We have the following situation:

Our XSD defines an element as hexBinary as follows:

<xs:element name="atr" type="xs:hexBinary">
</xs:element>

We got XML documents that contain this (line break intended!):

...
  <atr>
    6f1087c8105312e302e30820408a00005001588306312e3021000000a5049f1c611a4d02156501ff</atr>
...

Note that some XML editor probably introduced the carriage return / newline and formatting spaces within the atr element in an attempt to "pretty print" the file.

When we validate this file against the XSD, the validation passes.

However, when we unmarshal it, the atr byte array that should contain the data is set to null. While debugging, we found that the HexBinaryAdapter which is generated into the Java bindings like this:

    @XmlElement(required = true, type = String.class)
    @XmlJavaTypeAdapter(HexBinaryAdapter.class)
    @XmlSchemaType(name = "hexBinary")
    protected byte[] atr;

... obtains the full value of the element, including the line breaks and the leading spaces.

public byte[] unmarshal(String s) {
if(s==null) return null;
return DatatypeConverter.parseHexBinary(s);
}

It does not trim those non-hex characters and passes the value on to DatatypeConverter#parseHexBinary, which seems to assume only hex characters:

public byte[] parseHexBinary(String s) {
final int len = s.length();
// "111" is not a valid hex encoding.
if (len % 2 != 0) {
throw new IllegalArgumentException("hexBinary needs to be even-length: " + s);
}

This IllegalArgumentException gets thrown and then somewhere down the line apparently dropped in favor of simply setting the field to null.

The inconvenient part here is that the JAXB unmarshalling doesn't seem to be compliant with the XSD validation: our validation passes but the application still fails afterwards because of this.

@antoniosanct
Copy link
Contributor

@netmikey
This snippet works in jaxb-api

@Test
public void testHexBinary() {
    final String originStr = "6f1087c8105312e302e30820408a00005001588306312e3021000000a5049f1c611a4d02156501ff";
        //\u006f\u0010\ufffd\ufffd\u0010\u0053\u0012\ufffd\u0002\ufffd\u0008\u0020\u0040\ufffd\u0000\u0000\u0050\u0001\u0058\ufffd\u0006\u0031\u002e\u0030\u0021\u0000\u0000\u0000\ufffd\u0004\ufffd\u001c\u0061\u001a\u004d\u0002\u0015\u0065\u0001\ufffd
    final byte[] decoded = DatatypeConverter.parseHexBinary(originStr);
    final String encoded = DatatypeConverter.printHexBinary(decoded);
    Assert.assertEquals(originStr, encoded.toLowerCase());
}

Maybe is a real bug in eclipse-ee4j/jaxb-ri ??

Regards,
Antonio.

@netmikey
Copy link
Author

@antoniosanct

Not sure I was able to make my point clear...

To have your test reproduce what actually happens at runtime, it should test HexBinaryAdapter (this in turn calls DataTypeConcerter) and most importantly include the newline and whitespace content of the element like this (binary content shortened for readability):

@Test
public void testHexBinary() {
    final String originStr = "\n    6f1087c8105312e302e30820408a00005001588306312e";
    // the following fails:
    final byte[] decoded = new HexBinaryAdapter().unmarshal(originStr);
}

@antoniosanct
Copy link
Contributor

@netmikey
My point is there is no problem about hex binary conversion. Although you maybe think that hex binary adapter should trim the input string before unmarshalling?

All characters included between node tags, including non-visible characters, are valid characters for hex binary conversion.

@lukasj has a better top-down vision about JAXB api. He maybe can help you about the problem.

Regards,
Antonio.

@netmikey
Copy link
Author

All characters included between node tags, including non-visible characters, are valid characters for hex binary conversion.

I'm afraid they're not:

@Test
public void testHexBinaryConversion() {
    // OK
    new jakarta.xml.bind.DatatypeConverterImpl().parseHexBinary("1234567890abcdef");
    // java.lang.IllegalArgumentException: hexBinary needs to be even-length
    new jakarta.xml.bind.DatatypeConverterImpl().parseHexBinary("\n1234567890abcdef");
    // java.lang.IllegalArgumentException: contains illegal character for hexBinary
    new jakarta.xml.bind.DatatypeConverterImpl().parseHexBinary("\n   1234567890abcdef");
}

@antoniosanct
Copy link
Contributor

@netmikey
You're right about non-visible characters, my mistake! However, I've my doubts about if jaxb-api have to clean up such characters. The solution is easy: set a trimmed string in HexBinaryAdapter.

Regards,
Antonio.

@netmikey
Copy link
Author

What do you mean by "set a trimmed string in HexBinaryAdapter"? I'm not calling HexBinaryAdapter manually. I'm just using the library and expect it to unmarshal an XML document without errors.

Something like:

JAXBContext context = JAXBContext.newInstance(Book.class);
context.createUnmarshaller().unmarshal(new FileReader("./book.xml"));

@antoniosanct
Copy link
Contributor

@netmikey
You don't! But you could maybe create a PR fixing this issue at this project!

Regards,
Antonio.

@netmikey
Copy link
Author

Oh, yes, of course. Where do you think trimming the hex binary string would make the most sense then? Since this particular behavior seems to be hexBinary-specific, my first impulse would be adding it in HexBinaryAdapter before passing it to DatatypeConverter.

Would you be open for such a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants