Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help wanted on filter #230

Open
spady7 opened this issue Aug 20, 2024 · 22 comments
Open

Help wanted on filter #230

spady7 opened this issue Aug 20, 2024 · 22 comments

Comments

@spady7
Copy link

spady7 commented Aug 20, 2024

Hello everyone. I am in the condition of receiving the CDR from a syslog (SBC). The point is that the SBC does not send me the headers, but only the data. Is it possible to use the "pastash" filters to create a sort of association to then send to "qryn"?
The data I get is this:

<181>[S=3] |STOP |Mediant SW |179 |62 |81a539:179:232 |16:51:54.608 UTC Tue Aug 20 2024|183 |309 |UTC |2000 |2000 |1000 |1000 |192.168.10.1 |192.168.10.1 |GWAPP_NORMAL_CALL_CLEAR |BYE |Telecom |Telecom |0

and its respective header (which is not sent to me) would be:

|RecordType |ProductName |ShelfInfo|SeqNum |SessionId |SetupTime |TimeToConnect |CallDuration |TimeZone|IngressCallingUser |EgressCallingUser |IngressDialedUser |EgressCalledUser |IngressCallSourceIp |EgressCallDestIp |EgressTrmReason |EgressSIPTrmReason |IngressSipInterfaceName |EgressSipInterfaceName |RouteAttemptNum

Thank you in advance

@lmangani
Copy link
Member

lmangani commented Aug 20, 2024

Hey @spady7
You could use the CSV filter which allows using a custom separator as well as defining the column headers as an array.

filter {
  csv {
    separator => ' |'
    headers => ['some','header']
  }
}

@spady7
Copy link
Author

spady7 commented Aug 21, 2024

Hi @lmangani just for test, if i use what suggested:

  csv {
    separator => ' |'
    headers => ['RecordType','ProductName']
  }

output is following:

{"0":"<181>[S=14]","1":"|STOP","2":"","3":"","4":"","5":"","6":"","7":"","8":"","9":"","10":"|Mediant","11":"SW","12":"","13":"","14":"","15":"","16":"","17":"","18":"","19":"","20":"","21":"","22":"","23":"","24":"","25":"","26":"","27":"","28":"","29":"","30":"|180","31":"","32":"","33":"","34":"","35":"","36":"|26","37":"","38":"","39":"","40":"","41":"|81a539:180:26","42":"","43":"","44":"|09:18:20.101","45":"","46":"UTC","47":"Wed","48":"Aug","49":"21","50":"2024|195","51":"","52":"","53":"","54":"","55":"","56":"","57":"","58":"","59":"","60":"","61":"|321","62":"","63":"","64":"","65":"","66":"","67":"","68":"","69":"","70":"","71":"","72":"|UTC","73":"","74":"","75":"","76":"","77":"|2000","78":"","79":"","80":"","81":"","82":"","83":"","84":"","85":"","86":"","87":"","88":"","89":"","90":"","91":"","92":"","93":"|2000","94":"","95":"","96":"","97":"","98":"","99":"","100":"","101":"","102":"","103":"","104":"","105":"","106":"","107":"","108":"","109":"|1000","110":"","111":"","112":"","113":"","114":"","115":"","116":"","117":"","118":"","119":"","120":"","121":"","122":"","123":"","124":"","125":"|1000","126":"","127":"","128":"","129":"","130":"","131":"","132":"","133":"","134":"","135":"","136":"","137":"","138":"","139":"","140":"","141":"|192.168.10.1","142":"","143":"","144":"","145":"","146":"","147":"","148":"","149":"|192.168.10.1","150":"","151":"","152":"","153":"","154":"","155":"","156":"","157":"|GWAPP_NORMAL_CALL_CLEAR","158":"","159":"","160":"","161":"","162":"","163":"","164":"","165":"","166":"","167":"","168":"","169":"","170":"","171":"","172":"","173":"","174":"|BYE","175":"","176":"","177":"","178":"","179":"","180":"","181":"","182":"","183":"","184":"","185":"","186":"","187":"","188":"","189":"","190":"","191":"|Telecom","192":"","193":"","194":"","195":"","196":"","197":"","198":"","199":"","200":"","201":"","202":"","203":"","204":"","205":"","206":"","207":"","208":"","209":"","210":"","211":"","212":"","213":"","214":"","215":"","216":"|Telecom","217":"","218":"","219":"","220":"","221":"","222":"","223":"","224":"","225":"","226":"","227":"","228":"","229":"","230":"","231":"","232":"","233":"","234":"","235":"","236":"","237":"","238":"","239":"","240":"","241":"|0"}

If I just use:

  csv {
    separator => ' |'
  }

output is:
{"21":"21","<181>[S=15]":"<181>[S=16]","|STOP":"|STOP","":"","|Mediant":"|Mediant","SW":"SW","|180":"|180","|28":"|30","|81a539:180:29":"|81a539:180:30","|09:21:31.833":"|09:21:53.145","UTC":"UTC","Wed":"Wed","Aug":"Aug","2024|201":"2024|142","|370":"|414","|UTC":"|UTC","|2000":"|2000","|1000":"|1000","|192.168.10.1":"|192.168.10.1","|GWAPP_NORMAL_CALL_CLEAR":"|GWAPP_NORMAL_CALL_CLEAR","|BYE":"|BYE","|Telecom":"|Telecom","|0":"|0"}

I don't understand how this filter works. I would have expected that in first case output was something like:

RecordType: STOP
ProductName: Mediant SW

and so on..

Am I wrong?

@lmangani
Copy link
Member

You need to define all columns to begin with

@sipcapture sipcapture deleted a comment from spady7 Aug 21, 2024
@spady7
Copy link
Author

spady7 commented Aug 23, 2024

Hi, changed approach and use GROK filter. For whom should be interested on it here GROk match pattern to analyze SDR logs coming from Audiocodes SBC:

filter {
  grok {
    match => '<%{NUMBER:Internal_Seq}>\[S=%{NUMBER:SDR_Seq_Num}\] \|%{WORD:RecordType}\s*\|%{DATA:ProductName}\s*\|%{NUMBER:ShelfInfo}\s*\|%{NUMBER:SeqNum}\s*\|%{DATA:SipSessionId}\s*\|%{TIME:SetupTime}\s+%{WORD:TimeZone} %{WORD:Day} %{MONTH:Month} %{MONTHDAY:Monthday} %{YEAR:Year}\|%{NUMBER:TimeToConnect}\s*\|%{NUMBER:CallDuration}\s*\|%{WORD:NodeTimeZone}\s*\|%{NUMBER:IngressCallingUserName}\s*\|%{NUMBER:EgressCallingUserName}\s*\|%{NUMBER:IngressDialedUserName}\s*\|%{NUMBER:EgressCalledUserName}\s*\|%{IP:IngressCallSourceIp}\s*\|%{IP:EgressCallDestIp}\s*\|%{DATA:EgressTrmReason}\s*\|%{WORD:EgressSIPTrmReason}\s*\|%{DATA:IngressSipInterfaceName}\s*\|%{DATA:EgressSipInterfaceName}\s*\|%{NUMBER:RouteAttemptNum}'
  }
}

@lmangani
Copy link
Member

Perhaps we can add this to the Wiki page for the SBC alongside any other nodes?

@spady7
Copy link
Author

spady7 commented Aug 23, 2024

@lmangani for sure! Could be useful for someone else

@spady7
Copy link
Author

spady7 commented Aug 23, 2024

I am trying to use "eval" plugin to make simple opration because data "TimeToConnect" and "CallDuration" comes in centiseconds.
I would like to transform it in seconds.
Reading eval plugin wiki should be simple by using following:

filter {
  compute_field {
    field => origin
    value => "AUDIOCODES"
	}
  grok {
    match => '<%{NUMBER:Internal_Seq}>\[S=%{NUMBER:SDR_Seq_Num}\] \|%{WORD:RecordType}\s*\|%{DATA:ProductName}\s*\|%{NUMBER:ShelfInfo}\s*\|%{NUMBER:SeqNum}\s*\|%{DATA:SipSessionId}\s*\|%{TIME:SetupTime}\s+%{WORD:TimeZone} %{WORD:Day} %{MONTH:Month} %{MONTHDAY:Monthday} %{YEAR:Year}\|%{NUMBER:TimeToConnect}\s*\|%{NUMBER:CallDuration}\s*\|%{WORD:NodeTimeZone}\s*\|%{NUMBER:IngressCallingUserName}\s*\|%{NUMBER:EgressCallingUserName}\s*\|%{NUMBER:IngressDialedUserName}\s*\|%{NUMBER:EgressCalledUserName}\s*\|%{IP:IngressCallSourceIp}\s*\|%{IP:EgressCallDestIp}\s*\|%{DATA:EgressTrmReason}\s*\|%{WORD:EgressSIPTrmReason}\s*\|%{DATA:IngressSipInterfaceName}\s*\|%{DATA:EgressSipInterfaceName}\s*\|%{NUMBER:RouteAttemptNum}'
  }

   eval {
     field => CallDuration
     operation => "x / 100"
   }
}

but i get following error:

[Fri, 23 Aug 2024 13:27:15 GMT] NOTICE Starting pastash 1.0.82
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Max http socket 100
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Loading config file : /opt/cdr_audiocodes.conf
[Fri, 23 Aug 2024 13:27:15 GMT] INFO File loaded, 6 urls found
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Loading config : 9 urls
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Loading urls [
  'filter://add_host://',
  'filter://add_timestamp://',
  'filter://add_version://',
  'input://udp://?host=0.0.0.0&port=30514&tags=CDR',
  'filter://compute_field://?field=origin&value=AUDIOCODES',
  'filter://grok://?match=%3C%25%7BNUMBER%3AInternal_Seq%7D%3E%5C%5BS%3D%25%7BNUMBER%3ASDR_Seq_Num%7D%5C%5D%20%5C%7C%25%7BWORD%3ARecordType%7D%5Cs*%5C%7C%25%7BDATA%3AProductName%7D%5Cs*%5C%7C%25%7BNUMBER%3AShelfInfo%7D%5Cs*%5C%7C%25%7BNUMBER%3ASeqNum%7D%5Cs*%5C%7C%25%7BDATA%3ASipSessionId%7D%5Cs*%5C%7C%25%7BTIME%3ASetupTime%7D%5Cs%2B%25%7BWORD%3ATimeZone%7D%20%25%7BWORD%3ADay%7D%20%25%7BMONTH%3AMonth%7D%20%25%7BMONTHDAY%3AMonthday%7D%20%25%7BYEAR%3AYear%7D%5C%7C%25%7BNUMBER%3ATimeToConnect%7D%5Cs*%5C%7C%25%7BNUMBER%3ACallDuration%7D%5Cs*%5C%7C%25%7BWORD%3ANodeTimeZone%7D%5Cs*%5C%7C%25%7BNUMBER%3AIngressCallingUserName%7D%5Cs*%5C%7C%25%7BNUMBER%3AEgressCallingUserName%7D%5Cs*%5C%7C%25%7BNUMBER%3AIngressDialedUserName%7D%5Cs*%5C%7C%25%7BNUMBER%3AEgressCalledUserName%7D%5Cs*%5C%7C%25%7BIP%3AIngressCallSourceIp%7D%5Cs*%5C%7C%25%7BIP%3AEgressCallDestIp%7D%5Cs*%5C%7C%25%7BDATA%3AEgressTrmReason%7D%5Cs*%5C%7C%25%7BWORD%3AEgressSIPTrmReason%7D%5Cs*%5C%7C%25%7BDATA%3AIngressSipInterfaceName%7D%5Cs*%5C%7C%25%7BDATA%3AEgressSipInterfaceName%7D%5Cs*%5C%7C%25%7BNUMBER%3ARouteAttemptNum%7D',
  'filter://eval://?field=CallDuration&operation=x%20%2F%20100',
  'output://stdout://',
  'output://file://?path=%2Fusr%2Fsrc%2Fapp%2Foutput_testing.json&serializer=json_logstash'
]
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Initializing module output
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing output Stdout
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Initializing module output
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing output file
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Start output to file /usr/src/app/output_testing.json
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Initializing module filter
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing filter AddHost
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Initializing module filter
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing filter AddTimestamp
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Initializing module filter
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing filter AddVersion
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Initializing module filter
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing filter ComputeField
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initialized compute field filter on field: origin, value: AUDIOCODES
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Initializing module filter
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing filter Grok
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing grok filter, pattern: <%{NUMBER:Internal_Seq}>\[S=%{NUMBER:SDR_Seq_Num}\] \|%{WORD:RecordType}\s*\|%{DATA:ProductName}\s*\|%{NUMBER:ShelfInfo}\s*\|%{NUMBER:SeqNum}\s*\|%{DATA:SipSessionId}\s*\|%{TIME:SetupTime}\s+%{WORD:TimeZone} %{WORD:Day} %{MONTH:Month} %{MONTHDAY:Monthday} %{YEAR:Year}\|%{NUMBER:TimeToConnect}\s*\|%{NUMBER:CallDuration}\s*\|%{WORD:NodeTimeZone}\s*\|%{NUMBER:IngressCallingUserName}\s*\|%{NUMBER:EgressCallingUserName}\s*\|%{NUMBER:IngressDialedUserName}\s*\|%{NUMBER:EgressCalledUserName}\s*\|%{IP:IngressCallSourceIp}\s*\|%{IP:EgressCallDestIp}\s*\|%{DATA:EgressTrmReason}\s*\|%{WORD:EgressSIPTrmReason}\s*\|%{DATA:IngressSipInterfaceName}\s*\|%{DATA:EgressSipInterfaceName}\s*\|%{NUMBER:RouteAttemptNum}
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Loading grok patterns
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Grok patterns loaded from patterns directories 188
[Fri, 23 Aug 2024 13:27:15 GMT] DEBUG Initializing module filter
[Fri, 23 Aug 2024 13:27:15 GMT] INFO Initializing filter Eval
[Fri, 23 Aug 2024 13:27:15 GMT] ERROR Unable to load urls from command line
[Fri, 23 Aug 2024 13:27:15 GMT] ERROR Error: No host found in url ?field=CallDuration&operation=x%20%2F%20100
    at BaseComponent.loadConfig (/usr/local/lib/node_modules/@pastash/pastash/lib/lib/base_component.js:98:23)
    at BaseFilter.init (/usr/local/lib/node_modules/@pastash/pastash/lib/lib/base_filter.js:14:8)
    at LogstashAgent.configure (/usr/local/lib/node_modules/@pastash/pastash/lib/agent.js:152:12)
    at LogstashAgent.<anonymous> (/usr/local/lib/node_modules/@pastash/pastash/lib/agent.js:215:10)
    at /usr/local/lib/node_modules/@pastash/pastash/node_modules/async/dist/async.js:3113:16
    at replenish (/usr/local/lib/node_modules/@pastash/pastash/node_modules/async/dist/async.js:1014:17)
    at iterateeCallback (/usr/local/lib/node_modules/@pastash/pastash/node_modules/async/dist/async.js:998:17)
    at /usr/local/lib/node_modules/@pastash/pastash/node_modules/async/dist/async.js:972:16
    at LogstashAgent.<anonymous> (/usr/local/lib/node_modules/@pastash/pastash/lib/agent.js:265:5)
    at LogstashAgent.<anonymous> (/usr/local/lib/node_modules/@pastash/pastash/lib/agent.js:220:7)

What that means? What Am I wrong? maybe module's issue?

@lmangani
Copy link
Member

lmangani commented Aug 23, 2024

Docs updated to clarify usage:

filter {
  eval {
    source_field => CallDuration
    target_field => CallDuration
    operation => "x / 100"
  }
}

@spady7
Copy link
Author

spady7 commented Aug 23, 2024

@lmangani it now works!! ;-)

@spady7
Copy link
Author

spady7 commented Aug 23, 2024

UPDATED Filter to add to Wiki:

filter {
  compute_field {
    field => origin
    value => "AUDIOCODES"
	}
  grok {
    match => '<%{NUMBER:Internal_Seq}>\[S=%{NUMBER:SDR_Seq_Num}\] \|%{WORD:RecordType}\s*\|%{DATA:ProductName}\s*\|%{NUMBER:ShelfInfo}\s*\|%{NUMBER:SeqNum}\s*\|%{DATA:SipSessionId}\s*\|%{TIME:SetupTime}\s+%{WORD:TimeZone} %{WORD:Day} %{MONTH:Month} %{MONTHDAY:Monthday} %{YEAR:Year}\|%{NUMBER:TimeToConnect}\s*\|%{NUMBER:CallDuration}\s*\|%{WORD:NodeTimeZone}\s*\|%{NUMBER:IngressCallingUserName}\s*\|%{NUMBER:EgressCallingUserName}\s*\|%{NUMBER:IngressDialedUserName}\s*\|%{NUMBER:EgressCalledUserName}\s*\|%{IP:IngressCallSourceIp}\s*\|%{IP:EgressCallDestIp}\s*\|%{DATA:EgressTrmReason}\s*\|%{WORD:EgressSIPTrmReason}\s*\|%{DATA:IngressSipInterfaceName}\s*\|%{DATA:EgressSipInterfaceName}\s*\|%{NUMBER:RouteAttemptNum}'
  }

  eval {
    source_field => CallDuration
    target_field => CallDuration
    operation => "x / 100"
  }

    eval {
    source_field => TimeToConnect
    target_field => TimeToConnect
    operation => "x / 100"
  }
}

@lmangani
Copy link
Member

Could you perhaps provide the full recipe including the "other side" of the config for the SBC sending?

@spady7
Copy link
Author

spady7 commented Aug 26, 2024

Hi, sure.
I attach some screenshots of SBC's side. Ip 192.168.10.132 is host where is running pastash's container.
Tell me if something missing.

2024-08-26 09 35 50

2024-08-26 09 36 19

2024-08-26 09 36 43

@spady7
Copy link
Author

spady7 commented Sep 12, 2024

UPDATED Filter to add to Wiki:

filter {
  compute_field {
    field => origin
    value => "AUDIOCODES"
	}
  grok {
    match => '<%{NUMBER:Internal_Seq}>\[S=%{NUMBER:SDR_Seq_Num}\] \|%{WORD:RecordType}\s*\|%{DATA:ProductName}\s*\|%{NUMBER:ShelfInfo}\s*\|%{NUMBER:SeqNum}\s*\|%{DATA:SipSessionId}\s*\|%{TIME:SetupTime}\s+%{WORD:TimeZone} %{WORD:Day} %{MONTH:Month} %{MONTHDAY:Monthday} %{YEAR:Year}\|%{NUMBER:TimeToConnect}\s*\|%{NUMBER:CallDuration}\s*\|%{WORD:NodeTimeZone}\s*\|%{NUMBER:IngressCallingUserName}\s*\|%{NUMBER:EgressCallingUserName}\s*\|%{NUMBER:IngressDialedUserName}\s*\|%{NUMBER:EgressCalledUserName}\s*\|%{IP:IngressCallSourceIp}\s*\|%{IP:EgressCallDestIp}\s*\|%{DATA:EgressTrmReason}\s*\|%{WORD:EgressSIPTrmReason}\s*\|%{DATA:IngressSipInterfaceName}\s*\|%{DATA:EgressSipInterfaceName}\s*\|%{NUMBER:RouteAttemptNum}'
  }

  eval {
    source_field => CallDuration
    target_field => CallDuration
    operation => "x / 100"
  }

    eval {
    source_field => TimeToConnect
    target_field => TimeToConnect
    operation => "x / 100"
  }
}

I'm reviving this post because I'm encountering a strange behavior of the grok filter.
Given that if I test my input value with the site "https://grokdebugger.com/" the output is correct (so I'm sure that the sequence of the grok pattern is correct), the output from pastash is anomalous.
Let me explain.
The correct output should be like this:
{ "message": "<181>[S=19] |STOP |Mediant SW |192 |18 |81a539:192:48 |15:04:45.185 UTC Thu Sep 12 2024|155 |750 |UTC |2000 |2000 |1000 |1000 |192.168.10.1 |192.168.10.1 |GWAPP_NORMAL_CALL_CLEAR |BYE |Telecom |Telecom |0 |yes", "host": "192.168.10.11", "udp_port": "514", "tags": [ "CDR" ], "@timestamp": "2024-09-12T13:04:55.263Z", "@version": "1", "origin": "AUDIOCODES", "Internal_Seq": 181, "SDR_Seq_Num": 19, "RecordType": "STOP", "ProductName": "Mediant SW", "ShelfInfo": 192, "SeqNum": 18, "SipSessionId": "81a539:192:48", "SetupTime": "15:04:45.185", "TimeZone": "UTC", "Day": "Thu", "Month": "Sep", "Monthday": 12, "Year": 2024, "TimeToConnect": 1.55, "CallDuration": 7.5, "NodeTimeZone": "UTC", "IngressCallingUserName": 2000, "EgressCallingUserName": 2000, "IngressDialedUserName": 1000, "EgressCalledUserName": 1000, "IngressCallSourceIp": "192.168.10.1", "EgressCallDestIp": "192.168.10.1", "EgressTrmReason": "GWAPP_NORMAL_CALL_CLEAR", "EgressSIPTrmReason": "BYE", "IngressSipInterfaceName": "Telecom", "EgressSipInterfaceName": "Telecom", "RouteAttemptNum": "0", "isSuccess": "yes" }

instead the pastash one is like this:

{ "message": "<181>[S=19] |STOP |Mediant SW |192 |18 |81a539:192:48 |15:04:45.185 UTC Thu Sep 12 2024|155 |750 |UTC |2000 |2000 |1000 |1000 |192.168.10.1 |192.168.10.1 |GWAPP_NORMAL_CALL_CLEAR |BYE |Telecom |Telecom |0 |yes", "host": "192.168.10.11", "udp_port": "514", "tags": [ "CDR" ], "@timestamp": "2024-09-12T13:04:55.263Z", "@version": "1", "origin": "AUDIOCODES", "Internal_Seq": 181, "SDR_Seq_Num": 19, "RecordType": "STOP", "ProductName": "Mediant SW", "ShelfInfo": 192, "SeqNum": 18, "SipSessionId": "81a539:192:48", "SetupTime": "15:04:45.185", "TimeZone": "UTC", "Day": "Thu", "Month": "Sep", "Monthday": 12, "Year": 2024, "TimeToConnect": 1.55, "CallDuration": 7.5, "NodeTimeZone": "UTC", "IngressCallingUserName": 2000, "EgressCallingUserName": 2000, "IngressDialedUserName": 1000, "EgressCalledUserName": 1000, "IngressCallSourceIp": "192.168.10.1", "EgressCallDestIp": "192.168.10.1", "EgressTrmReason": "GWAPP_NORMAL_CALL_CLEAR", "EgressSIPTrmReason": "GWAPP_NORMAL_CALL_CLEAR", "IngressSipInterfaceName": "BYE", "EgressSipInterfaceName": "BYE", "RouteAttemptNum": "Telecom", "isSuccess": "Telecom" }

Filter grok pattern:

<%{NUMBER:Internal_Seq}>\[S=%{NUMBER:SDR_Seq_Num}\] \|%{WORD:RecordType}\s*\|%{DATA:ProductName}\s*\|%{NUMBER:ShelfInfo}\s*\|%{NUMBER:SeqNum}\s*\|%{DATA:SipSessionId}\s*\|%{TIME:SetupTime}\s+%{WORD:TimeZone} %{WORD:Day} %{MONTH:Month} %{MONTHDAY:Monthday} %{YEAR:Year}\|%{NUMBER:TimeToConnect}\s*\|%{NUMBER:CallDuration}\s*\|%{WORD:NodeTimeZone}\s*\|%{NUMBER:IngressCallingUserName}\s*\|%{NUMBER:EgressCallingUserName}\s*\|%{NUMBER:IngressDialedUserName}\s*\|%{NUMBER:EgressCalledUserName}\s*\|%{IP:IngressCallSourceIp}\s*\|%{IP:EgressCallDestIp}\s*\|(%{DATA:EgressTrmReason}|)\s*\|(%{WORD:EgressSIPTrmReason}|)\s*\|%{DATA:IngressSipInterfaceName}\s*\|%{DATA:EgressSipInterfaceName}\s*\|%{NUMBER:RouteAttemptNum}\s*\|(%{WORD:isSuccess}|)\s

In practice, and this is what I don't understand, it doubles the value of "GWAPP_NORMAL_CALL_CLEAR" and "BYE".
What am I missing? How can I investigate this problem? Is there a way to do a thorough debugging of the filter?
Thanks

@spady7
Copy link
Author

spady7 commented Sep 12, 2024

@lmangani can you please point me how to troubleshoot this strange behavior? Is there a way to know what grok filter is going to do when it parses input?
Regards

@lmangani
Copy link
Member

Is it always the same field/column showing this odd behaviour?

@spady7
Copy link
Author

spady7 commented Sep 12, 2024

Seams yes.
EgressTrmReason and EgressSIPTrmReason.
They are doubles and then the following are not present.

@lmangani
Copy link
Member

are you postitive those fields do not contain some special characters or some weird term? if you pipe the same object manually, does the same error reproduce or does it only happen for streamed CDRs?

@spady7
Copy link
Author

spady7 commented Sep 12, 2024

I didn't tried with manual injection.
Anyway is not weird that same input used in https://grokdebugger.com/ or other online grok tool works as expected? If weird term or special characters are present should have same issue. Isn't it?
What is not clear why it doubles those elements.
When something is not recognized by grok it result in grok failure.

@spady7
Copy link
Author

spady7 commented Sep 13, 2024

Tried to pipe manually (filling a local file and modifying input) but same weird behaviour.

Used:
input { file { path => "/tmp/cdr_manual.log" } }

and injected following:

<181>[S=3] |STOP|Mediant SW|193|2|81a539:193:4|08:53:18.585 UTC Fri Sep 13 2024|259|324|UTC|2000|2000|1000|1000|192.168.10.1|192.168.10.1|GWAPP_NORMAL_CALL_CLEAR|BYE|Telecom|Telecom|0|yes

@lmangani
Copy link
Member

@spady7 thats not what i meant. Using the same file will bring the same issue. Please input as stdin and paste the object making sure it contains no special characters (ie: copy to a blank doc, and paste from the doc rather than the original)

@spady7
Copy link
Author

spady7 commented Sep 13, 2024

@lmangani found issue.
Was due to this kind of formatting, that every grok debugger I tested accept; not pastash.

Pattern causing issue:
|(%{WORD:EgressTrmReason}|)\s*\|(%{WORD:EgressSIPTrmReason}|)\s*\|

Pattern that work fine in paStash:
|%{WORD:EgressTrmReason}\s*\|%{WORD:EgressSIPTrmReason}\s*\|

paStash seams not accept the "OR" condition.
I read on documentation that if a field has zero value is not taken into account.
That's fine anyway.

@lmangani
Copy link
Member

lmangani commented Sep 13, 2024

That's interesting! Thanks for sharing this important bit - I'll take a look at the library and see if we can provide this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants