Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart body should not be map[string][]byte #822

Open
abiriadev opened this issue Jul 25, 2024 · 0 comments
Open

Multipart body should not be map[string][]byte #822

abiriadev opened this issue Jul 25, 2024 · 0 comments

Comments

@abiriadev
Copy link

First of all, thanks for the wonderful project! colly has saved our team a lot of time!!

Context

ref: #8, #33

According to RFC7578 section 4.3:

4.3. Multiple Files for One Form Field

The form data for a form field might include multiple files.
[RFC2388] suggested that multiple files for a single form field be transmitted using a nested "multipart/mixed" part. This usage is deprecated.

To match widely deployed implementations, multiple files MUST be sent by supplying each file in a separate part but all with the same "name" parameter.

Receiving applications intended for wide applicability (e.g., multipart/form-data parsing libraries) SHOULD also support the older method of supplying multiple files.

and this practice is unsurprisingly common, and I am facing the exact same case.

The issue

The name field does not have to be unique. There are few common cases when a duplicated name field is required (e.g., when uploading an array of files), and this case should be properly covered.

colly/colly.go

Lines 551 to 559 in 99b7fb1

// PostMultipart starts a collector job by creating a Multipart POST request
// with raw binary data. PostMultipart also calls the previously provided callbacks
func (c *Collector) PostMultipart(URL string, requestData map[string][]byte) error {
boundary := randomBoundary()
hdr := http.Header{}
hdr.Set("Content-Type", "multipart/form-data; boundary="+boundary)
hdr.Set("User-Agent", c.UserAgent)
return c.scrape(URL, "POST", 1, createMultipartReader(boundary, requestData), nil, hdr, true)
}

colly/colly.go

Lines 1461 to 1469 in 99b7fb1

buffer.WriteString("Content-type: multipart/form-data; boundary=" + boundary + "\n\n")
for contentType, content := range data {
buffer.WriteString(dashBoundary + "\n")
buffer.WriteString("Content-Disposition: form-data; name=" + contentType + "\n")
buffer.WriteString(fmt.Sprintf("Content-Length: %d \n\n", len(content)))
buffer.Write(content)
buffer.WriteString("\n")
}
buffer.WriteString(dashBoundary + "--\n\n")

Unfortunately, the current implementaion accepts map[string][]byte, which enforces name to be unique.

Suggestion

Maybe we can accept []Subpart so that:

  1. The order of subparts is guaranteed
  2. filename and other metadata can be optionally included
  3. Duplicate name fields are allowed

and so on.

I would love to hear your opinion! If you think this is feasible, I will start working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant