Skip to content

sputnick-dev/saxon-lint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

saxon-lint

This program is aimed to query XML/(X)HTML files via command line such as XMLStarlet or xmllint, but with the ability to use XPath 3.0/XQuery 3.0/XSLT 2.0 (Other command-line tools are stuck with libxml2 (apart xidel and BaseX) and XPath 1.0/XSLT 1.0).

It can be considered as a simple wrapper around Saxon-HE and TagSoup java libs.

As far as you have the prerequisites, this project is cross-platform (Linux, MacOsX/*BSD, Windows... ).

The default XPath output displays each result nodes on a separate newline, this is suitable for shell scripting to split results in an array (by example). This feature is was missing with xmllint.

Main features

  • XML parsing via files (and STDIN)
  • XPath 3.0/XQuery 3.0/XSLT 2.0 using Michael Kay's Saxon-HE Java library
  • (X)HTML parsing via HTTP, HTTPS or files, even with broken RealLife©®™ HTML using John Cowan's TagSoup Java library

Limitations

Based on the Saxon Home Edition (HE) documentation, it supports the XQuery 3.1 Minimal Conformance. And it doesn't include the following:

  • XQuery 3.1 Schema Aware
  • XQuery 3.1 Typed Data
  • XQuery 3.1 Static Typing
  • XQuery Update 1.0

This is not FOSS software.

For some FOSS tools that can update, check BaseX linked earlier.

Install prerequisites

  • java (openjdk...)
  • perl
  • libxml2
  • git (or use the .zip)

And Perl modules :

  • XML::LibXML :libxml-libxml-perl debian package
  • LWP::UserAgent & LWP::protocol::https (if HTTPS is needed) : libwww-perl liblwp-protocol-https-perl debian packages

With one command for Debian and derivatives :

 apt-get update && apt-get install openjdk-11-jre perl libxml2 libxml2-dev \
    libxml-libxml-perl libwww-perl liblwp-protocol-https-perl

Install:

$ git clone https://github.com/sputnick-dev/saxon-lint.git
$ cd saxon-lint
$ ./saxon-lint.pl --help

Usage:

Usage:
    saxon-lint.pl <opts> <file(s)>
    Parse the XML files and output the result of the parsing
    --help -h,                  this help
    --xpath,                    XPath expression
    --xquery,                   Xquery expression or file
    --html,                     use the HTML parser
    --xslt,                     use XSL transformation
    --output-separator,         set output separator to character ("\n", ","...)
    --indent,                   indent the output
    --no-pi,                    remove Processing Instruction (<?xml ...>)
    --saxon-opt,                Saxon extra argument
    --verbose -v,               verbose mode
    --version,                  current version

Examples:

saxon-lint.pl --xpath '//key[text()="String"]/following-sibling::string[1]' file.xml
saxon-lint.pl --xquery 'for $r in 1 to count(/table/tr) return /title' file.xml
saxon-lint.pl --indent --xquery file.xquery
curl -Ls 'http://domain.tld/file.xml' | saxon-lint.pl --xpath '//key[1]' -
saxon-lint.pl --xslt file.xsl file.xml
saxon-lint.pl --xquery file.xquery --saxon-opt -t --saxon-opt '!indent=yes'
saxon-lint.pl --html --xpath 'string-join(//a/@href, "\r\n")' http://x.y/z.html

Get shortened URL via tinyurl:

saxon-lint --html --xpath '//div[@class="indent"][1]/b/text()' \
    'http://tinyurl.com/create.php?url=http://google.com'

To set the string-join() character (like the latest snippet) for Unix likes, hit ctrl+v and ENTER. For Windows, just type "\r\n".

Check others examples.

For --saxon-opt, check Saxon documentation

Tricks:

To be able to run the command without dot-slash : ./saxon-lint, you need to modify the PATH variable. For windows, check http://www.computerhope.com/issues/ch000549.htm For Unix Likes, modify ~/.bashrc by searching PATH= and put PATH=$PATH:/PATH/TO/saxon-lint_DIRECTORY, then source ~/.bashrc

If you want to enable bash-completion, you have to install this program and move usr_share_bash-completion_completions_saxon-lint to /usr/share/bash-completion/completions/saxon-lint (or similar).

TroobleShooting:

Tested platforms :

  • GNU/Linux (Archlinux, Ubuntu 12.04) the most tested
  • FreeBSD 10.1
  • Windows XP (with or without Cygwin)

Thanks to report any bug here.

Licensing:

This program is under the same licence as Saxon-HE.

About

XPath3/XQuery 3.0/XSLT 2.0 cross-platform command line tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published