Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase performance #408

Closed
pnsafonov opened this issue Aug 31, 2022 · 7 comments
Closed

Increase performance #408

pnsafonov opened this issue Aug 31, 2022 · 7 comments

Comments

@pnsafonov
Copy link

pnsafonov commented Aug 31, 2022

What version are you using (fq -v)?

$ fq -v
0.0.9 (linux amd64)

How was fq installed?

fq is build from source.

My branch:
https://github.com/pnsafonov/fq/tree/postgres

Can you reproduce the problem using the latest release or master branch?

Yes

What did you do?

I am implementing PostgreSQL format parsers. I can offer perfomance fix.

time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 8064
}
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 384
}

real    1m20.079s
user    1m21.643s
sys     0m0.389s

Then i made fix, and run again:

time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 8064
}
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 384
}

real    0m2.076s
user    0m3.452s
sys     0m0.242s

This fix decrease execution time 1m20.079s -> 0m2.076s, file decode.go:

func (d *D) AddChild(v *Value) {
	v.Parent = d.Value

	switch fv := d.Value.V.(type) {
	case *Compound:
		//if !fv.IsArray {
		//	for _, ff := range fv.Children {
		//		if ff.Name == v.Name {
		//			d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
		//		}
		//	}
		//}
		fv.Children = append(fv.Children, v)
	}
}

Also i made some perf with go tools. This was how i found fix:

   283.22s 84.00% 84.00%    321.80s 95.44%  github.com/wader/fq/pkg/decode.(*D).AddChild
    36.65s 10.87% 94.87%     36.65s 10.87%  memeqbody
     2.42s  0.72% 95.58%      4.94s  1.47%  runtime.scanobject
     0.94s  0.28% 95.86%      3.75s  1.11%  runtime.mallocgc
     0.44s  0.13% 95.99%      2.98s  0.88%  github.com/wader/fq/pkg/decode.(*D).TryFieldScalarFn.func1
     0.25s 0.074% 96.07%         5s  1.48%  runtime.gcDrain
     0.13s 0.039% 96.11%      2.58s  0.77%  runtime.newobject
     0.10s  0.03% 96.14%    323.53s 95.95%  github.com/wader/fq/pkg/decode.(*D).FillGaps
     0.08s 0.024% 96.16%      3.68s  1.09%  github.com/wader/fq/pkg/decode.(*D).TryFieldScalarFn
     0.03s 0.0089% 96.17%      2.21s  0.66%  github.com/wader/fq/pkg/decode.(*D).TryFieldScalarU16
     0.03s 0.0089% 96.18%      3.60s  1.07%  github.com/wader/fq/pkg/decode.(*D).TryFieldValue
     0.02s 0.0059% 96.18%      4.47s  1.33%  github.com/wader/fq/pkg/decode.(*D).FieldStruct
     0.02s 0.0059% 96.19%      2.25s  0.67%  github.com/wader/fq/pkg/decode.(*D).FieldU16
     0.02s 0.0059% 96.19%      2.14s  0.63%  github.com/wader/gojq.(*compiler).compileModule
     0.02s 0.0059% 96.20%    330.10s 97.90%  github.com/wader/gojq.(*env).Next
     0.01s 0.003% 96.20%      5.86s  1.74%  runtime.systemstack
         0     0% 96.20%      4.47s  1.33%  github.com/wader/fq/format/postgres.decodePgheap
         0     0% 96.20%      4.18s  1.24%  github.com/wader/fq/format/postgres/flavours/pgproee14.DecodeHeap (inline)
@wader
Copy link
Owner

wader commented Aug 31, 2022

Hey, interesting and nice to see your working on postgres decoders!

I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?

Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.

diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go
index e899eac3..5ddc834f 100644
--- a/pkg/decode/decode.go
+++ b/pkg/decode/decode.go
@@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) {
        switch fv := d.Value.V.(type) {
        case *Compound:
                if !fv.IsArray {
-                       for _, ff := range fv.Children {
-                               if ff.Name == v.Name {
-                                       d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
-                               }
+                       if _, ok := fv.Keys[v.Name]; ok {
+                               d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
                        }
+                       fv.Keys[v.Name] = struct{}{}
                }
                fv.Children = append(fv.Children, v)
        }
diff --git a/pkg/decode/value.go b/pkg/decode/value.go
index 63927f6a..bf82ba04 100644
--- a/pkg/decode/value.go
+++ b/pkg/decode/value.go
@@ -13,6 +13,7 @@ type Compound struct {
        IsArray     bool
        RangeSorted bool
        Children    []*Value
+       Keys        map[string]struct{}
        Description string
 }

Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.

@pnsafonov
Copy link
Author

Link to 16397_8:
https://github.com/pnsafonov/fq_testdata_postgres14/raw/master/16397_8

16397_8 - first 8 mb of 1 GB 16397

@pnsafonov
Copy link
Author

Hey, interesting and nice to see your working on postgres decoders!

I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?

Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.

diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go
index e899eac3..5ddc834f 100644
--- a/pkg/decode/decode.go
+++ b/pkg/decode/decode.go
@@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) {
        switch fv := d.Value.V.(type) {
        case *Compound:
                if !fv.IsArray {
-                       for _, ff := range fv.Children {
-                               if ff.Name == v.Name {
-                                       d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
-                               }
+                       if _, ok := fv.Keys[v.Name]; ok {
+                               d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
                        }
+                       fv.Keys[v.Name] = struct{}{}
                }
                fv.Children = append(fv.Children, v)
        }
diff --git a/pkg/decode/value.go b/pkg/decode/value.go
index 63927f6a..bf82ba04 100644
--- a/pkg/decode/value.go
+++ b/pkg/decode/value.go
@@ -13,6 +13,7 @@ type Compound struct {
        IsArray     bool
        RangeSorted bool
        Children    []*Value
+       Keys        map[string]struct{}
        Description string
 }

Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.

Thank you for patch.
I got memory issues with fq:
#409
I will think how to implement compile time check or cmd option.

@pnsafonov
Copy link
Author

Hey, interesting and nice to see your working on postgres decoders!

I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?

Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.

diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go
index e899eac3..5ddc834f 100644
--- a/pkg/decode/decode.go
+++ b/pkg/decode/decode.go
@@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) {
        switch fv := d.Value.V.(type) {
        case *Compound:
                if !fv.IsArray {
-                       for _, ff := range fv.Children {
-                               if ff.Name == v.Name {
-                                       d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
-                               }
+                       if _, ok := fv.Keys[v.Name]; ok {
+                               d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
                        }
+                       fv.Keys[v.Name] = struct{}{}
                }
                fv.Children = append(fv.Children, v)
        }
diff --git a/pkg/decode/value.go b/pkg/decode/value.go
index 63927f6a..bf82ba04 100644
--- a/pkg/decode/value.go
+++ b/pkg/decode/value.go
@@ -13,6 +13,7 @@ type Compound struct {
        IsArray     bool
        RangeSorted bool
        Children    []*Value
+       Keys        map[string]struct{}
        Description string
 }

Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.

Patch works well:

time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 8064
}
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 384
}

real    0m2.227s
user    0m3.679s
sys     0m0.264s

@pnsafonov
Copy link
Author

Pull request with map:
#411

@wader
Copy link
Owner

wader commented Sep 2, 2022

Fixed by #411

@wader wader closed this as completed Sep 2, 2022
@wader
Copy link
Owner

wader commented Sep 2, 2022

BTW feel free to open a PR for the postgres decoder. Had a quick look at your branch and it looks impressive 👍 I think it is usually good to do to a PR earlier to get feedback and start iterating.

Also i usually prefer rebase over merging upstream would that be ok with you? personally think it keeps the history clearer and easier to follow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants