How to Convert PDF to CSV: 4 Methods That Actually Work

PDF files lock data inside a fixed layout — great for reading, terrible for analysis. Whether you need to import bank statements into a spreadsheet, process invoices in bulk, or feed tabular data into a script, you need CSV. This guide covers four proven methods: an online converter, Python with pdfplumber, Microsoft Excel, and Google Sheets.

Convert PDF to CSV

Upload your PDF and download CSV instantly

PDF CSV

Tap to choose your file

or

Supports M4A, WAV, FLAC, OGG, AAC, WMA, AIFF, OPUS • Max 100 MB

Encrypted upload via HTTPS. Files auto-deleted within 2 hours.

Tables vs Plain Text: Why It Matters

Before choosing a method, check what kind of data your PDF contains. The approach depends entirely on the PDF structure:

PDF Type What It Contains Best Method
Native tables Text-based PDF with visible table borders and grid lines Any method — Convertio is fastest
Borderless tables Columns aligned by spacing, no visible grid Python (pdfplumber) for precision
Scanned PDF Image of a printed page (no selectable text) Convertio with OCR enabled
Mixed content Tables + paragraphs + headers on the same page Python for selective extraction

Quick test: open your PDF and try selecting text with your mouse. If you can highlight individual words, it's a native (text-based) PDF. If the entire page selects as one block, it's a scanned image — you'll need OCR.

Method 1: Convert Online with Convertio

Easy No software • Works on any device • OCR support

The fastest option for most users. Convertio handles native PDFs, borderless tables, and even scanned documents with OCR. No installation, no account required.

  1. Go to convertio.com/pdf-to-csv
  2. Upload your PDF — drag and drop, or click "Choose PDF File". Max 100 MB.
  3. For scanned PDFs: select your OCR language from the dropdown before converting.
  4. Click "Convert to CSV" — conversion takes a few seconds for most files.
  5. Download the CSV — open it in Excel, Google Sheets, or import into your database.

Convertio processes all pages of your PDF and combines extracted data into a single CSV file. Files are encrypted during transfer and auto-deleted within 2 hours.

Method 2: Python with pdfplumber

Advanced Full control • Batch processing • Handles borderless tables

pdfplumber is the best Python library for extracting tables from PDFs. It understands both bordered and borderless tables, gives you coordinates for every character, and lets you fine-tune extraction parameters.

Install pdfplumber

Terminal
pip install pdfplumber

Basic table extraction

This script extracts all tables from every page of a PDF and writes them to a CSV file:

Python
import pdfplumber
import csv

with pdfplumber.open("invoice.pdf") as pdf:
    all_rows = []
    for page in pdf.pages:
        table = page.extract_table()
        if table:
            all_rows.extend(table)

with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(all_rows)

print(f"Extracted {len(all_rows)} rows to output.csv")

Handling borderless tables

When tables don't have visible borders, pdfplumber can still detect columns using character positions. Use extract_table() with custom settings:

Python
# For PDFs with no visible table borders
table_settings = {
    "vertical_strategy": "text",
    "horizontal_strategy": "text",
    "snap_y_tolerance": 5,
    "intersection_x_tolerance": 15,
}

with pdfplumber.open("report.pdf") as pdf:
    page = pdf.pages[0]
    table = page.extract_table(table_settings)
    for row in table:
        print(row)

Batch convert multiple PDFs

Python
import pdfplumber
import csv
from pathlib import Path

for pdf_file in Path("./invoices").glob("*.pdf"):
    csv_path = pdf_file.with_suffix(".csv")
    with pdfplumber.open(pdf_file) as pdf:
        rows = []
        for page in pdf.pages:
            table = page.extract_table()
            if table:
                rows.extend(table)
        with open(csv_path, "w", newline="") as f:
            csv.writer(f).writerows(rows)
    print(f"{pdf_file.name} -> {csv_path.name} ({len(rows)} rows)")

Method 3: Microsoft Excel (Get Data)

Medium Desktop only • Microsoft 365 (Excel for 365) • Manual steps

Microsoft 365 (Excel for 365) can import PDF files directly using the Power Query / Get Data feature. This option is not available in standalone Excel 2016 or 2019 — it requires an active Microsoft 365 subscription. It works well for simple, well-structured tables.

  1. Open Excel and create a new blank workbook.
  2. Go to Data → Get Data → From File → From PDF.
  3. Select your PDF from the file browser.
  4. Choose the table(s) you want to import from the Navigator panel. Excel will show a preview of each detected table.
  5. Click "Load" to import the data into your worksheet.
  6. Save as CSV: File → Save As → choose "CSV (Comma delimited) (*.csv)" as the format.

Limitation: Excel's PDF import works best with simple, bordered tables. It struggles with multi-column layouts, merged cells, and borderless tables. For complex PDFs, use Convertio or Python instead.

Method 4: Google Sheets

Easy Free • Browser-based • Requires Google account

Google Sheets doesn't import PDFs directly, but you can use Google Drive's built-in OCR to extract the text first, then copy it into Sheets.

  1. Upload the PDF to Google Drive.
  2. Right-click the PDF → Open with → Google Docs. Google will OCR the file and convert it to an editable document.
  3. Select the table data in the Google Doc and copy it (Ctrl+C / Cmd+C).
  4. Open a new Google Sheet and paste (Ctrl+V / Cmd+V). The data will fill into cells.
  5. Clean up the data — adjust column widths, remove extra rows, fix any OCR errors.
  6. Download as CSV: File → Download → Comma Separated Values (.csv).

Tip: Google's OCR works surprisingly well for scanned PDFs. But the table structure may not survive the copy-paste step intact. For better results with tabular data, use Convertio's direct PDF to CSV converter.

Method Comparison

Feature Convertio Python Excel Google Sheets
Difficulty Easy Advanced Medium Easy
Installation None (browser) Python + pip Microsoft 365 None (browser)
Bordered tables Excellent Excellent Good Fair
Borderless tables Good Excellent Poor Poor
Scanned PDFs (OCR) Built-in With pytesseract Not supported Via Google Drive
Batch processing One file at a time Unlimited One file at a time One file at a time
Best for Quick one-off conversions Automation & complex PDFs Excel users with simple tables Quick extraction with OCR

Tips for Clean CSV Output

  • Check the header row. Some PDFs have multi-line headers that get split into separate CSV rows. After conversion, verify that your column headers are on a single row.
  • Watch for merged cells. PDF tables often merge cells for group headings. These usually become empty cells in CSV. Fill them manually or with a script after extraction.
  • Handle special characters. Commas, quotes, and line breaks inside cell values can break CSV parsing. Good converters (Convertio, pdfplumber) handle escaping automatically. If yours doesn't, wrap values in double quotes.
  • Encoding matters. Use UTF-8 encoding when saving CSV to preserve accented characters, currency symbols, and non-Latin text. In Python: open("out.csv", "w", encoding="utf-8-sig") (the -sig adds a BOM that helps Excel detect UTF-8).
  • Multi-page tables. When a table spans multiple PDF pages, some tools extract each page as a separate table. In Python, skip the header row on subsequent pages to avoid duplicates.

Common Issues and Fixes

Problem Cause Solution
Empty CSV output Scanned PDF (image-based) Enable OCR in Convertio or use pytesseract
All data in one column Excel opened CSV with wrong delimiter Use Data → Text to Columns → Delimited → Comma
Misaligned columns Borderless table with uneven spacing Use pdfplumber with vertical_strategy: "text"
Garbled characters Wrong encoding (usually Latin-1 vs UTF-8) Open in text editor, save as UTF-8
Duplicate headers Multi-page table with repeated headers In Python, skip row 0 on pages after the first

Ready to Convert?

Extract your PDF tables to CSV format

PDF CSV

Tap to choose your file

or

Supports M4A, WAV, FLAC, OGG, AAC, WMA, AIFF, OPUS • Max 100 MB

Frequently Asked Questions

Yes. Online converters like Convertio process all pages and extract every table into a single CSV. In Python, pdfplumber lets you iterate over each page and extract tables individually, giving you full control over which tables to include and how to merge them.

Scanned PDFs contain images, not text. You need OCR (Optical Character Recognition) first. Convertio has built-in OCR — just select your language before converting. In Python, use pytesseract or pdf2image + Tesseract to extract text, then parse the table structure manually or with tabula-py.

Column misalignment usually happens when the PDF uses spaces instead of actual table borders to separate data. Try a different extraction tool — pdfplumber handles borderless tables better than most. You can also define explicit column boundaries in pdfplumber using the explicit_vertical_lines parameter.

Yes. Convertio offers free PDF to CSV conversion with no registration, no watermarks, and no email required. Files are encrypted via 256-bit SSL and auto-deleted within 2 hours. The maximum file size is 100 MB.

Back to PDF to CSV Converter