Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml parsing of entityId fails #7

Open
caseyjlaw opened this issue Dec 15, 2016 · 11 comments
Open

xml parsing of entityId fails #7

caseyjlaw opened this issue Dec 15, 2016 · 11 comments

Comments

@caseyjlaw
Copy link
Contributor

I started seeing issues with parsing some of the XML in SDMs. E.g.,

In [5]: sdm = sdmpy.SDM('16A-459_TEST_1hr.57623.72670021991.cut')
  File "<string>", line unknown
XMLSyntaxError: Element 'Entity', attribute 'entityId': [facet 'pattern'] The value '' is not accepted by the pattern '(fake)?[uU][iI][dD]:/{1,2}(/[0-9a-zA-Z]+){1,2}(/[xX]?[0-9a-fA-F]+){1,2}(#\\w{1,}){0,}'.

This errors seems to arise for all SDMs I have around, including those that have and have not been trimmed by the sdmpy scan cut script. I believe that all of my tests used SDMs that have some BDFs removed, however.
I recently reinstalled, but I can't see how that can explain my issue. I also see you've been making some changes. I thought I'd submit this in case it triggered any ideas.

@demorest
Copy link
Owner

Hm, I don't get that error. Are you looking at /lustre/aoc/projects/16A-459/ddcut/16A-459_TEST_1hr.57623.72670021991.cut? That SDM seems to load OK for me..

@caseyjlaw
Copy link
Contributor Author

Yes, that's the one. I just reproduced the error with the full path.
My installation must be different somehow. I recently reset my entire anaconda installation, so it may be a version/dependency issue. Does this match yours?

nmpost026$ conda list | egrep 'sdmpy|lxml|numpy'
lxml                      3.7.0                     <pip>
numpy                     1.11.2          py27_blas_openblas_202  [blas_openblas]  conda-forge
sdmpy                     1.36                      <pip>

@demorest
Copy link
Owner

I have lxml 3.5.0 (and numpy 1.9.2 but this part of sdmpy does not even import numpy so probably not relevant). I'll try updating lxml and see if I can get the error. What version of python are you using?

@demorest
Copy link
Owner

I updated lxml to 3.7.0 and it still works for me. Very odd! Is it possible for me to use your python environment?

@caseyjlaw
Copy link
Contributor Author

caseyjlaw commented Dec 15, 2016 via email

@caseyjlaw
Copy link
Contributor Author

Minor update on my side...
The hiccup is in parsing the ASDM.xml file, but any file fails for me if done manually with lxml.objectify. If I set the parser to None, then it works.
The default is to use this file:
/users/claw/miniconda/lib/python2.7/site-packages/sdmpy-1.36-py2.7.egg/sdmpy/xsd/sdm_all.xsd.
I see the file there and that is what I just installed, so I don't see why yours would work. Could it be you have a stale version of this file that is no longer compatible?

@demorest
Copy link
Owner

I don't think that is the explanation.. My installed version /users/pdemores/pulsar/lib/python2.7/site-packages/sdmpy-1.36-py2.7.egg/sdmpy/xsd/sdm_all.xsd is identical to yours.

I am able to get the same error if I run your ~claw/miniconda/bin/python version. This is a slightly different version than the NRAO python install (2.7.12 vs 2.7.10). I'll keep poking at it.

In the meantime for a workaround, you should be able to set use_xsd=False argument when calling SDM(), this will turn off the XML schema validation, which seems to be the part causing trouble. This should not cause any loss of functionality.

@demorest
Copy link
Owner

demorest commented Dec 15, 2016

I looked at the lxml bug list a bit, and this appears kind of similar:

https://bugs.launchpad.net/lxml/+bug/1639866

There is a similar sounding error message about an empty string failing some XML validation step. Is it easy for you to try installing an older lxml version to see if the problem goes away? The bug report listed 3.6.1 as the last known working version.

@caseyjlaw
Copy link
Contributor Author

I can use the use_xsd=False workaround. In fact, I see that rtpipe already had that in a try/except statement, so I think I ran into this problem before. As I recall, it was related to some change in the SDM, but I thought the new SDM format worked properly with the schema in sdmpy. I am now using sdmpy 1.36 and it still doesn't work when using the schema on new SDMs.
Anyway, I'm fine for now. Thanks.

@demorest
Copy link
Owner

Ok, thanks Casey. Yes, the previous time this came up it was because sdmpy only validates against the most recent SDM XML schema, but this is not appropriate for older SDMs.

The current issue really does sound more like a bug in lxml and/or some of the underlying xml libraries. I would like to get it fixed (or at least understood better), so I'll leave this issue open..

@desilinguist
Copy link

desilinguist commented Jan 9, 2017

Hi, I am not really an sdmpy user but I ran into the same XML issue and figured out that the latest conda packages for libxml2(a library that lxml uses underlyingly) seem to have a bug in them. If I explicitly use the older version e.g., libxml2=2.9.0 in my conda environment, this weird parsing error goes away. Just FYI :)

Another option is to just switch to using conda-forge as your default conda channel :)

demorest pushed a commit that referenced this issue Feb 13, 2019
fix zero length table issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants