A java clone of Google's robotst.txt parser, passing all unit tests.
Disclaimer:
- The author of this repository is not affiliated to Google by any means.
- With google specific optimizations, compared with other implmentations (credit to Google authors)
- No extra dependency other than JDK
-
Add this package to your build dependency, based on your build tool:
- Maven
<dependency> <groupId>com.github.itechbear</groupId> <artifactId>robotstxt</artifactId> <version>0.0.1</version> </dependency>
- Gradle(Groovy)
implementation 'com.github.itechbear:robotstxt:0.0.1'
- SBT
libraryDependencies += "com.github.itechbear" % "robotstxt" % "0.0.1"
- For any other build tool, please refer to https://search.maven.org/artifact/com.github.itechbear/robotstxt/0.0.1/jar
- Maven
-
Code sample
String robotstxt = "allow: /foo/bar/\n" +
"\n" +
"user-agent: FooBot\n" +
"disallow: /\n" +
"allow: /x/\n" +
"user-agent: BarBot\n" +
"disallow: /\n" +
"allow: /y/\n" +
"\n" +
"\n" +
"allow: /w/\n" +
"user-agent: BazBot\n" +
"\n" +
"user-agent: FooBot\n" +
"allow: /z/\n" +
"disallow: /\n";
String url = "http://test.com/x";
RobotsMatcher matcher = new RobotsMatcher();
// check whether FooBot is allowed to crawl url.
matcher.OneAgentAllowedByRobots(robotstxt, "FooBot", url);
// check whether any of (FooBot,BarBot) is allowed to crawl url
matcher.AllowedByRobots(robotstxt, Arrays.asList("FooBot", "BarBot"), url);
- 0.0.1 Initial release, based on google/robotstxt@750aec7