DocumentIntelligenceClient raising a TypeError - begin_analyze_document() missing 'body' #2244

O-EAI · 2024-12-19T23:22:25Z

Please provide us with the following information:

This issue is for a: (mark with an `x`)

bug report -> please search issues before submitting
feature request
documentation issue or request
regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Run the code provided in the pdfparser.py file that uses the DocumentIntelligenceClient with begin_analyze_document.
DocumentIntelligenceClientuse_content_understanding=False,
Ensure the input is valid and the required dependencies (Azure SDK, pymupdf, etc.) are installed.
Trigger the function that calls begin_analyze_document.

Log message

TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body'

Expected/desired behavior

The begin_analyze_document method should successfully process the document without raising a TypeError.

Versions

Python: 3.11
azure-ai-documentintelligence==1.0.0

Useful details

The problem arises when calling the begin_analyze_document method on the DocumentIntelligenceClient. The error suggests that the required body argument is not being provided. Below is the relevant code snippet:

async with DocumentIntelligenceClient(
        endpoint=self.endpoint, credential=self.credential
    ) as document_intelligence_client:
        if self.use_content_understanding:
            if self.content_understanding_endpoint is None:
                raise ValueError("Content Understanding is enabled but no endpoint was provided")
            if isinstance(self.credential, AzureKeyCredential):
                raise ValueError(
                    "AzureKeyCredential is not supported for Content Understanding, use keyless auth instead"
                )
            cu_describer = ContentUnderstandingDescriber(self.content_understanding_endpoint, self.credential)
            content_bytes = content.read()
            poller = await document_intelligence_client.begin_analyze_document(
                model_id="prebuilt-layout",
                analyze_request=AnalyzeDocumentRequest(bytes_source=content_bytes),
                output=["figures"],
                features=["ocrHighResolution"],
                output_content_format="markdown",
            )
            doc_for_pymupdf = pymupdf.open(stream=io.BytesIO(content_bytes))
        else:
            poller = await document_intelligence_client.begin_analyze_document(
                model_id=self.model_id, analyze_request=content, content_type="application/octet-stream"
            )
        analyze_result: AnalyzeResult = await poller.result()

The text was updated successfully, but these errors were encountered:

hosimesi · 2024-12-20T07:25:12Z

I'm in the same situation. You could use this one.
I think it's because the documentation hasn't kept up with the latest updates.

async with DocumentIntelligenceClient(
        endpoint=self.endpoint, credential=self.credential
    ) as document_intelligence_client:
        if self.use_content_understanding:
            if self.content_understanding_endpoint is None:
                raise ValueError("Content Understanding is enabled but no endpoint was provided")
            if isinstance(self.credential, AzureKeyCredential):
                raise ValueError(
                    "AzureKeyCredential is not supported for Content Understanding, use keyless auth instead"
                )
            cu_describer = ContentUnderstandingDescriber(self.content_understanding_endpoint, self.credential)
            content_bytes = content.read()
            poller = await document_intelligence_client.begin_analyze_document(
                model_id="prebuilt-layout",
                body=AnalyzeDocumentRequest(bytes_source=content_bytes),  #  analyze_request -> body
                output=["figures"],
                features=["ocrHighResolution"],
                output_content_format="markdown",
            )
            doc_for_pymupdf = pymupdf.open(stream=io.BytesIO(content_bytes))
        else:
            poller = await document_intelligence_client.begin_analyze_document(
                model_id=self.model_id, analyze_request=content, content_type="application/octet-stream"
            )
        analyze_result: AnalyzeResult = await poller.result()

o1100 · 2024-12-20T09:50:55Z

Yeah, I have this too. Will do a pull request this weekend, just waiting to run the required tests.

ElisaPiccin · 2024-12-22T15:08:14Z

I tried this solution and it worked for me: Azure/azure-sdk-for-python#38622
I also applied it to document_intelligence_client.begin_classify_document() for inference from a custom classification model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DocumentIntelligenceClient raising a TypeError - begin_analyze_document() missing 'body' #2244

DocumentIntelligenceClient raising a TypeError - begin_analyze_document() missing 'body' #2244

O-EAI commented Dec 19, 2024 •

edited

Loading

Please provide us with the following information:

hosimesi commented Dec 20, 2024 •

edited

Loading

o1100 commented Dec 20, 2024

ElisaPiccin commented Dec 22, 2024

DocumentIntelligenceClient raising a TypeError - begin_analyze_document() missing 'body' #2244

DocumentIntelligenceClient raising a TypeError - begin_analyze_document() missing 'body' #2244

Comments

O-EAI commented Dec 19, 2024 • edited Loading

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Log message

Expected/desired behavior

Versions

Useful details

hosimesi commented Dec 20, 2024 • edited Loading

o1100 commented Dec 20, 2024

ElisaPiccin commented Dec 22, 2024

O-EAI commented Dec 19, 2024 •

edited

Loading

This issue is for a: (mark with an `x`)

hosimesi commented Dec 20, 2024 •

edited

Loading