-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not processed volumes #15
Comments
good that we are listing these, please try to have them manually in the dataset, so we have a complete dataset as of today |
Vol-1549 is interesting; @S6savahd would you be able to talk to Sarven about this one? I wonder why it's not working because I thought it's technically the same as Vol-1550 and Vol-1551. The source code has changed completely, and so has the way the layout is computed, so the developers of the information extraction tool had no chance to adapt their tool to it, but these new volumes look the same as the old volumes so should work. These volumes are important because this will soon be the new standard format. Many of them (those created with ceur-make and Rohan's soon-finished web UI frontend) will have RDFa so won't require sophisticated information extraction, but others (those created manually) won't have RDFa. For the latter an adaptation of one of the other information extraction tools might work better; in any case all of these new volumes will have a very clean, uniform structure. |
@clange you are right, after filtering non relevant information with related to layout, Vol-1549, Vol-1550, Vol-1551 will not have any information, sorry that the list is not complete, below some data before filtering <http://ceur-ws.org/Vol-1549/> <http://fitlayout.github.io/ontology/segmentation.owl#country> <http://dbpedia.org/resource/Australia> ;
<http://fitlayout.github.io/ontology/segmentation.owl#icoloc> "" ;
<http://fitlayout.github.io/ontology/segmentation.owl#idateplace> "Proceedings of the 1st International Workshop on Semantic Statistics co-located with 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia, October 11th, 2013" ;
<http://fitlayout.github.io/ontology/segmentation.owl#ienddate> "2013-10-11" ;
<http://fitlayout.github.io/ontology/segmentation.owl#iproceedings> "Proceedings of the 1st International Workshop on Semantic Statistics co-located with 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia, October 11th, 2013" ;
<http://fitlayout.github.io/ontology/segmentation.owl#istartdate> "2013-10-11" ;
<http://fitlayout.github.io/ontology/segmentation.owl#isubmitted> "2016-03-15" ;
<http://fitlayout.github.io/ontology/segmentation.owl#ititle> "Semantic Statistics 2013" .
<http://ceur-ws.org/Vol-1550/> <http://fitlayout.github.io/ontology/segmentation.owl#related> <http://ceur-ws.org/Vol-1549/> ;
<http://fitlayout.github.io/ontology/segmentation.owl#country> <http://dbpedia.org/resource/Italy> ;
<http://fitlayout.github.io/ontology/segmentation.owl#icoloc> "" ;
<http://fitlayout.github.io/ontology/segmentation.owl#idateplace> "Proceedings of the 2nd International Workshop on Semantic Statistics co-located with 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 19th, 2014" ;
<http://fitlayout.github.io/ontology/segmentation.owl#ienddate> "2014-10-19" ;
<http://fitlayout.github.io/ontology/segmentation.owl#iproceedings> "Proceedings of the 2nd International Workshop on Semantic Statistics co-located with 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 19th, 2014" ;
<http://fitlayout.github.io/ontology/segmentation.owl#istartdate> "2014-10-19" ;
<http://fitlayout.github.io/ontology/segmentation.owl#isubmitted> "2016-04-23" ;
<http://fitlayout.github.io/ontology/segmentation.owl#ititle> "Semantic Statistics 2014" .
<http://ceur-ws.org/Vol-1551/> <http://fitlayout.github.io/ontology/segmentation.owl#related> <http://ceur-ws.org/Vol-1549/> , <http://ceur-ws.org/Vol-1550/> ;
<http://fitlayout.github.io/ontology/segmentation.owl#icoloc> "" ;
<http://fitlayout.github.io/ontology/segmentation.owl#idateplace> "Proceedings of the 3rd International Workshop on Semantic Statistics, co-located with 14th International Semantic Web Conference (ISWC 2015), Bethlehem, U.S., October 11th, 2015" ;
<http://fitlayout.github.io/ontology/segmentation.owl#ienddate> "2015-10-11" ;
<http://fitlayout.github.io/ontology/segmentation.owl#iproceedings> "Proceedings of the 3rd International Workshop on Semantic Statistics, co-located with 14th International Semantic Web Conference (ISWC 2015), Bethlehem, U.S., October 11th, 2015" ;
<http://fitlayout.github.io/ontology/segmentation.owl#istartdate> "2015-10-11" ;
<http://fitlayout.github.io/ontology/segmentation.owl#isubmitted> "2016-03-15" ;
<http://fitlayout.github.io/ontology/segmentation.owl#ititle> "Semantic Statistics 2015" . The original tool will process some "related volume" for Vol-1550 and Vol-1551, as the information come from index.html. If we remove layout information for these three volumes like other volumes, then no information will be left. |
some information added for Vol-1549, Vol-1550 and Vol-1551 from indexing page in the new dataset, and also all the information from Vol-41. |
@S6savahd @clange I wrote a tool to process these three volumes, and it should also work with volumes in the same structure, the tool is ceurws.py, the output is 1549-1551.ttl and it can be extended to process other volumes as well. |
great! I didn't have a look into the code but is there a way that we have it embedded in the main code, I mean that for future we do not run them separately but all in once? |
@S6savahd It is possible to embed it into the post processing script we already have, but I need to check how to embed it into the original tool as they are written in different language. We can also extend this tool to process different structure in the future, as the original tool uses common strategy to process all the volumes, it will not always be able to process all the volumes information completely. |
Some not processed volumes:
The text was updated successfully, but these errors were encountered: