feat(ocr): prioritize direct DOCX parse to resolve Google Drive numbering...
feat(ocr): prioritize direct DOCX parse to resolve Google Drive numbering corruption, add text export + regex fallback for PDFs
Showing
final_test.txt
0 → 100644
format_contract.py
0 → 100644
out.txt
0 → 100644
| ... | ... | @@ -6,3 +6,5 @@ pymupdf4llm==0.0.17 |
| google-api-python-client | ||
| google-auth-httplib2 | ||
| google-auth-oauthlib | ||
| python-docx | ||
| lxml |
static/index.html
0 → 100644
test/67_formatted.docx
0 → 100644
File added
test_drive_ocr.txt
0 → 100644
test_e2e.py
0 → 100644
test_ocr.txt
0 → 100644
test_out.txt
0 → 100644
text_to_markdown.py
0 → 100644
Please register or sign in to comment