-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NUTCH-2481 HostDatum deltas(previous step statistics) and Metadata expressions #278
base: master
Are you sure you want to change the base?
Conversation
…both (strng,number) and (string, string) pairs
…config. Now the format of the return key can be either Number of String
// Create or retrieve a JexlEngine | ||
JexlEngine jexl = new JexlEngine(); | ||
|
||
// Dont't be silent and be strict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont't
must be a typo, but beyond that, setSilent(true)
seems to contradict this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YossiTamari
The funny part it is a copy-paste from ReadHostDb line 83.
How do you propose to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the original intent was, but maybe we should replace this whole code (in both places) with:
this.deltaExpression = org.apache.nutch.util.JexlUtil.parseExpression(stringDeltaExpression);
?
@YossiTamari |
The logic of updatehostdb is changed slightly.
In case of specification of hostdb.deltaExpression, we dont reset statistics in mapper, but send the previous step statistic first to the reducer and reset it afterwards.
In line 215 of the mapper
if (readingCrawlDb)
is replaced by
if (readingCrawlDb && !isDeltaStatisticCalculated) {
hostDatum.resetStatistics();
Please, verify that logic doesn't break the current functionality.