extractTextFromExcel()
Overview
Reads an Excel file (.xlsx or .xls) from disk and returns all cell values across all sheets as a plain text string. Each sheet is introduced with a header line, and each non-empty row is output as a delimited line of cell values.
Internally uses phpoffice/phpspreadsheet. Memory is freed automatically via disconnectWorksheets() after extraction, making it safe for large files.
!!! note "Requirement"
Run composer require phpoffice/phpspreadsheet before using this function.
Function Signature
function extractTextFromExcel(string $filePath, string $delimiter = "\t"): string
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
$filePath |
string |
Yes | — | Absolute path to the .xlsx or .xls file on disk. |
$delimiter |
string |
No | "\t" (tab) |
The character used to separate cell values within each row. Use "," for CSV-style output or " \| " for a readable pipe format. |
Return Value
| Type | Description |
|---|---|
string |
All sheet data as plain text. Sheet names are prefixed as === Sheet: SheetName ===. Trailing empty cells in each row are stripped. Completely empty rows are skipped. |
Output Format Example
For a workbook with two sheets:
=== Sheet: Vendors ===
Vendor Name PAN GSTIN City
Ramesh Traders ABCPK1234F 24ABCPK1234F1Z5 Surat
Kanika Exports ZZZGT9876Q 29ZZZGT9876Q1ZA Bangalore
=== Sheet: Summary ===
Total Vendors 2
Exceptions
| Exception | When thrown |
|---|---|
\RuntimeException |
File does not exist at the given $filePath. |
\PhpOffice\PhpSpreadsheet\Reader\Exception |
File format is unrecognised, corrupted, or unsupported. |
Example — Default Tab Delimiter
<?php
require_once '/var/www/html/vendor/autoload.php';
$text = extractTextFromExcel('/var/www/uploads/vendor_list.xlsx');
echo $text;
Example — Pipe Delimiter (Readable)
$text = extractTextFromExcel('/var/www/uploads/vendor_list.xlsx', ' | ');
echo $text;
Example — CSV-Style Output
$text = extractTextFromExcel('/var/www/uploads/vendor_list.xlsx', ',');
file_put_contents('/tmp/output.csv', $text);
Usage with PAN Scanner
$text = extractTextFromExcel('/var/www/uploads/party_master.xlsx');
$pans = extractPANsWithDetails($text);
foreach ($pans as $item) {
echo $item['pan'] . ' — ' . $item['entity_type'] . PHP_EOL;
}
Notes & Caveats
!!! warning "Formula Cells"
getFormattedValue() returns the last cached display value of a formula cell — not a live calculation. If the file was saved without recalculating, the value may be stale. Use getCalculatedValue() instead if you need live formula results (slower, loads more of the workbook).
!!! warning "Legacy .xls Files"
phpspreadsheet's auto-detect reader handles .xls files via the Xls reader. If you encounter issues, explicitly specify the reader:
php
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader('Xls');
$spreadsheet = $reader->load($filePath);
!!! tip "Memory on Large Files"
extractTextFromExcel() calls disconnectWorksheets() and unset() after reading. For very large files (10,000+ rows), also consider raising memory_limit in php.ini or using a chunk-based read with ReadFilter.
!!! note ".xlsm and .ods"
.xlsm (macro-enabled) is structurally identical to .xlsx and loads correctly. .ods (OpenDocument) requires the odf engine — use pandas or a dedicated ODS library instead of this function.
See Also
# Add to mkdocs.yml nav:
- Developer Zone:
- File Parsing Reference:
- extractTextFromExcel: dev/extract_text_from_excel.md