"buffer exceeded max size" when reading JSON array via auto-detect #3865

philrz · 2022-04-30T19:38:30Z

Repro is with Zed commit 6288fa9 with the attached test data nfcapd.json.gz which consists of a JSON array of NetFlow records.

Prior to #3555 (cc: @nwt), this input data was auto-detected as JSON and the elements were treated as individual records, such that the following worked ok.

$ zq -version
Version: v0.33.0-167-g06c8ed11

$ zq -z 'head 1' nfcapd.json.gz 
{app_latency:0,cli_latency:0,dst4_addr:"10.47.2.154",dst_port:58331,export_sysid:0,fwd_status:0,in_bytes:313,in_packets:1,label:"<none>",proto:17,sampled:0,src4_addr:"10.0.0.100",src_port:53,src_tos:0,srv_latency:0,t_first:"2018-03-23T12:58:22.641",t_last:"2018-03-23T12:58:22.641",tcp_flags:"........",type:"FLOW"}

However, ever since #3555, the input's original array-ness is preserved, so now I'd need to apply over this to remove the array layer. This makes sense, but now auto-detect fails to get past the input phase successfully.

$ zq -version
Version: v1.0.0-72-g6288fa9e

$ zq -z 'over this | head 1' nfcapd.json.gz 
nfcapd.json.gz: format detection error
	zeek: line 1: bad types/fields definition in zeek header
	zjson: line 1: unexpected end of JSON input
	zson: parse error: string literal: buffer exceeded max size trying to infer input format
	zng: zngio: unknown compression format 0x7b
	zng21: zng type ID out of range
	csv: line 1: no comma found
	json: buffer exceeded max size trying to infer input format
	parquet: auto-detection not supported
	zst: auto-detection not supported

I can make it work again if I can add the explicit -i json, but I'm curious if this limitation with auto-detect could be seen as a bug or undesirable limitation that could be addressed. Imagining a future of users bringing arbitrary inputs to the tooling, it does seem like being forgiving with auto-detect is a desirable goal.

The wider context is that I've been drafting updates to the Custom Brimcap Config wiki article, which currently shows example command lines that depended on the previous ability to successfully auto-detect this JSON input. Due to known Brimcap limitation brimdata/brimcap#80, the option of explicitly specifying the equivalent of -i json on a brimcap analyze command line is not currently available, so I can't employ the same workaround as I would for zq. However, if the limitation identified here is deemed too difficult to address in the short term, I could revise the article to work around it for now in another way, such as perhaps using CSV input as it had in the past.

The text was updated successfully, but these errors were encountered:

philrz · 2022-09-21T22:14:45Z

One of the approaches proposed to improve on this is for the tooling to recognize the file extension .json and hence use the JSON reader rather than auto-detect in that case.

philrz · 2023-07-19T16:01:57Z

A user in a public Slack thread just bumped into this problem and thought of the same heuristic we did about recognizing the .json extension.

philrz · 2024-03-13T21:13:11Z

Just to chalk up another incident, the file stored at s3://zui-issues/3025/mun.json as referenced in brimdata/zui#3025 hits this problem when read with zed load.

$ zed -version
Version: v1.14.0-16-g38763f82

$ curl http://localhost:9867/version
{"version":"v1.14.0-16-g38763f82"}

$ zed create -use foo
pool created: foo 2deMxbCropxzZZ3lKVQkjxGVYoG
Switched to branch "main" on pool "foo"

$ zed load mun.json 
mun.json: format detection error
	arrows: schema message length exceeds 1 MiB
	csv: line 1: delimiter ',' not found
	json: buffer exceeded max size trying to infer input format
	line: auto-detection not supported
	parquet: auto-detection requires seekable input
	tsv: line 1: delimiter '\t' not found
	vng: auto-detection requires seekable input
	zeek: line 1: bad types/fields definition in zeek header
	zjson: line 1: malformed ZJSON: bad type object: "{": unpacker error parsing JSON: unexpected end of JSON input
	zng: unknown ZNG message frame type: 3
	zson: buffer exceeded max size trying to infer input format
(0/1) 10.90MB/85.11MB 10.90MB/s 12.81%
status code 400: no records in request

Curiously, it doesn't have a problem reading via auto-detect with zq.

$ zq -version
Version: v1.14.0-16-g38763f82

$ zq mun.json
{type:"FeatureCollection",name:"mun",...

If I attempt to leverage this partial success to load in two steps, I can get a different ZNG auto-detect error message.

$ zq mun.json > mun.zng

$ zed load mun.zng
mun.zng: format detection error
	arrows: schema message length exceeds 1 MiB
	csv: line 1: delimiter ',' not found
	json: invalid character 'D' looking for beginning of value
	line: auto-detection not supported
	parquet: auto-detection requires seekable input
	tsv: line 1: delimiter '\t' not found
	vng: auto-detection requires seekable input
	zeek: line 1: bad types/fields definition in zeek header
	zjson: line 1: malformed ZJSON: bad type object: "D\x1c\x00\xb6\x18\xf4+\x00\x01\x04name\x19\x00\x02\x04type\x19": unpacker error parsing JSON: invalid character 'D' looking for beginning of value
	zng: malformed zng record
	zson: ZSON syntax error
(0/1) 10.90MB/32.25MB 10.90MB/s 33.80%
status code 400: no records in request

But specifying explicit ZNG input works at that point, as would explicit JSON input if I'd done that initially.

$ zed load -i zng mun.zng
(1/1) 32.25MB/32.25MB 32.25MB/s 100.00%
2dePWMTjTxhi40yhG7wFtaE8MHt committed

$ zed load -i json mun.json
(1/1) 85.11MB/85.11MB 24.55MB/s 100.00%
2dePbccqpEql2X8mmpGM6qbRNFF committed

philrz mentioned this issue May 30, 2022

Brimcap wiki updates brimdata/brimcap#250

Merged

philrz mentioned this issue Aug 23, 2022

Allow specification of input format when the user knows it or auto-detect fails brimdata/zui#2498

Closed

philrz mentioned this issue Feb 28, 2023

Reading contents of large arrays as a stream #4413

Open

philrz added the community label Jul 19, 2023

This was referenced Mar 13, 2024

Specify explicit ZNG format from file loader brimdata/zui#3026

Merged

JSON file containing single 81 MB object won't load in Zui brimdata/zui#3025

Open

philrz mentioned this issue Apr 11, 2024

Update "Custom Brimcap Config" wiki article brimdata/brimcap#340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"buffer exceeded max size" when reading JSON array via auto-detect #3865

"buffer exceeded max size" when reading JSON array via auto-detect #3865

philrz commented Apr 30, 2022

philrz commented Sep 21, 2022

philrz commented Jul 19, 2023

philrz commented Mar 13, 2024 •

edited

Loading

"buffer exceeded max size" when reading JSON array via auto-detect #3865

"buffer exceeded max size" when reading JSON array via auto-detect #3865

Comments

philrz commented Apr 30, 2022

philrz commented Sep 21, 2022

philrz commented Jul 19, 2023

philrz commented Mar 13, 2024 • edited Loading

philrz commented Mar 13, 2024 •

edited

Loading