extractPANsWithDetails()
Overview
Scans a plain text string for valid Indian PAN numbers and returns each match with its decoded entity type and holder initial. This is the primary function for PAN detection. Two lower-level helpers — extractPANNumbers() and isValidPAN() — are also documented on this page.
PAN format: AAAAA9999A — 5 uppercase letters, 4 digits, 1 uppercase letter. The 4th character encodes the entity type (Individual, Company, HUF, etc.) and is strictly validated against the RBI-defined whitelist.
Function Signature
function extractPANsWithDetails(string $text): array
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
$text |
string |
Yes | Plain text to scan. Can be any length. Typically the output of extractTextFromFile() or any of the individual parser functions. Case-insensitive — the function internally uppercases the input before matching. |
Return Value
Returns an array of associative arrays, one per unique PAN found:
[
[
'pan' => 'ABCPK1234F', // The matched PAN number (uppercase)
'entity_type' => 'Individual (Person)', // Decoded from 4th character
'holder_initial' => 'K', // 5th character — first letter of name/surname
],
// ...
]
Returns an empty array [] if no valid PAN numbers are found.
Entity Type Reference
The 4th character of a PAN encodes the holder's entity type:
| Character | Entity Type |
|---|---|
P |
Individual (Person) |
C |
Company |
H |
Hindu Undivided Family (HUF) |
F |
Firm / LLP |
A |
Association of Persons (AOP) |
B |
Body of Individuals (BOI) |
G |
Government |
J |
Artificial Juridical Person |
L |
Local Authority |
T |
Trust |
E |
Limited Liability Partnership |
Example
<?php
require_once '/var/www/html/vendor/autoload.php';
$text = "
Customer: Ramesh Kumar, PAN: ABCPK1234F
Vendor: Kanika Exports Pvt Ltd, PAN: AAACR5055K
Invalid: XYZ123 or ABCD123E (not a PAN)
";
$results = extractPANsWithDetails($text);
foreach ($results as $item) {
echo "PAN : {$item['pan']}\n";
echo "Entity Type : {$item['entity_type']}\n";
echo "Holder Initial: {$item['holder_initial']}\n";
echo "---\n";
}
Output:
PAN : ABCPK1234F
Entity Type : Individual (Person)
Holder Initial: K
---
PAN : AAACR5055K
Entity Type : Company
Holder Initial: R
---
Full Pipeline Example
// Scan any uploaded file for PAN numbers in one call
$text = extractTextFromFile('/var/www/uploads/party_master.xlsx');
$results = extractPANsWithDetails($text);
if (empty($results)) {
echo "No PAN numbers found in the document.\n";
} else {
echo count($results) . " PAN(s) found:\n";
foreach ($results as $item) {
echo " {$item['pan']} — {$item['entity_type']}\n";
}
}
Helper: extractPANNumbers()
Lower-level function that returns only the matched PAN strings, without decoding entity details. Use when you only need the raw PAN list.
Signature
function extractPANNumbers(string $text): array
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
$text |
string |
Yes | Plain text to scan. Internally uppercased before matching. |
Return Value
array of unique PAN strings (e.g. ['ABCPK1234F', 'AAACR5055K']). Empty array if none found.
Example
$pans = extractPANNumbers("PAN: ABCPK1234F and AAACR5055K");
// ['ABCPK1234F', 'AAACR5055K']
Helper: isValidPAN()
Validates a single PAN string against the full format rule. Use for validating user-entered PAN fields in forms or API inputs.
Signature
function isValidPAN(string $pan): bool
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
$pan |
string |
Yes | A single PAN string to validate. Leading/trailing whitespace is trimmed. Case-insensitive. |
Return Value
true if the string is a valid PAN format, false otherwise.
Example
isValidPAN('ABCPK1234F'); // true
isValidPAN('ABCD1234F'); // false — only 9 chars
isValidPAN('abcpk1234f'); // true — lowercased internally
isValidPAN('1BCPK1234F'); // false — first char must be a letter
isValidPAN('ABCZK1234F'); // false — 4th char Z is not a valid entity code
Regex Pattern Reference
The underlying regex used by extractPANNumbers():
/\b([A-Z]{3}[ABCFGHLJPTE][A-Z]\d{4}[A-Z])\b/
| Segment | Pattern | Meaning |
|---|---|---|
| Characters 1–3 | [A-Z]{3} |
Issuing office code |
| Character 4 | [ABCFGHLJPTE] |
Entity type (strict whitelist — rejects invalid codes) |
| Character 5 | [A-Z] |
First letter of surname or entity name |
| Characters 6–9 | \d{4} |
Sequential number |
| Character 10 | [A-Z] |
Alphabetic check character |
| Boundaries | \b...\b |
Word boundaries — prevents matching inside longer strings |
Notes & Caveats
!!! tip "Duplicate Handling"
extractPANNumbers() uses array_unique() internally — each PAN appears only once in the result, even if it appears multiple times in the source text.
!!! warning "OCR Quality"
If text was extracted from a scanned PDF via OCR, characters like O/0 or I/1 may be misread, causing a real PAN to fail the regex. Validate results against a known master list where possible.
!!! note "Not a Legal Validator" This function validates PAN format only. It does not verify the PAN against the Income Tax Department database. For authoritative verification, use the official ITD API or a NSDL-authorised verification service.
See Also
# Add to mkdocs.yml nav:
- Developer Zone:
- File Parsing Reference:
- extractPANsWithDetails: dev/extract_pans_with_details.md