Skip to content

extractTextFromExcel()

Overview

Reads an Excel file (.xlsx or .xls) from disk and returns all cell values across all sheets as a plain text string. Each sheet is introduced with a header line, and each non-empty row is output as a delimited line of cell values.

Internally uses phpoffice/phpspreadsheet. Memory is freed automatically via disconnectWorksheets() after extraction, making it safe for large files.

!!! note "Requirement" Run composer require phpoffice/phpspreadsheet before using this function.


Function Signature

function extractTextFromExcel(string $filePath, string $delimiter = "\t"): string

Parameters

Parameter Type Required Default Description
$filePath string Yes Absolute path to the .xlsx or .xls file on disk.
$delimiter string No "\t" (tab) The character used to separate cell values within each row. Use "," for CSV-style output or " \| " for a readable pipe format.

Return Value

Type Description
string All sheet data as plain text. Sheet names are prefixed as === Sheet: SheetName ===. Trailing empty cells in each row are stripped. Completely empty rows are skipped.

Output Format Example

For a workbook with two sheets:

=== Sheet: Vendors ===
Vendor Name PAN GSTIN   City
Ramesh Traders  ABCPK1234F  24ABCPK1234F1Z5 Surat
Kanika Exports  ZZZGT9876Q  29ZZZGT9876Q1ZA Bangalore

=== Sheet: Summary ===
Total Vendors   2

Exceptions

Exception When thrown
\RuntimeException File does not exist at the given $filePath.
\PhpOffice\PhpSpreadsheet\Reader\Exception File format is unrecognised, corrupted, or unsupported.

Example — Default Tab Delimiter

<?php
require_once '/var/www/html/vendor/autoload.php';

$text = extractTextFromExcel('/var/www/uploads/vendor_list.xlsx');
echo $text;

Example — Pipe Delimiter (Readable)

$text = extractTextFromExcel('/var/www/uploads/vendor_list.xlsx', ' | ');
echo $text;

Example — CSV-Style Output

$text = extractTextFromExcel('/var/www/uploads/vendor_list.xlsx', ',');
file_put_contents('/tmp/output.csv', $text);

Usage with PAN Scanner

$text = extractTextFromExcel('/var/www/uploads/party_master.xlsx');
$pans = extractPANsWithDetails($text);

foreach ($pans as $item) {
    echo $item['pan'] . ' — ' . $item['entity_type'] . PHP_EOL;
}

Notes & Caveats

!!! warning "Formula Cells" getFormattedValue() returns the last cached display value of a formula cell — not a live calculation. If the file was saved without recalculating, the value may be stale. Use getCalculatedValue() instead if you need live formula results (slower, loads more of the workbook).

!!! warning "Legacy .xls Files" phpspreadsheet's auto-detect reader handles .xls files via the Xls reader. If you encounter issues, explicitly specify the reader: php $reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader('Xls'); $spreadsheet = $reader->load($filePath);

!!! tip "Memory on Large Files" extractTextFromExcel() calls disconnectWorksheets() and unset() after reading. For very large files (10,000+ rows), also consider raising memory_limit in php.ini or using a chunk-based read with ReadFilter.

!!! note ".xlsm and .ods" .xlsm (macro-enabled) is structurally identical to .xlsx and loads correctly. .ods (OpenDocument) requires the odf engine — use pandas or a dedicated ODS library instead of this function.


See Also


# Add to mkdocs.yml nav:
- Developer Zone:
  - File Parsing Reference:
    - extractTextFromExcel: dev/extract_text_from_excel.md