Skip to content

extractPANsWithDetails()

Overview

Scans a plain text string for valid Indian PAN numbers and returns each match with its decoded entity type and holder initial. This is the primary function for PAN detection. Two lower-level helpers — extractPANNumbers() and isValidPAN() — are also documented on this page.

PAN format: AAAAA9999A — 5 uppercase letters, 4 digits, 1 uppercase letter. The 4th character encodes the entity type (Individual, Company, HUF, etc.) and is strictly validated against the RBI-defined whitelist.


Function Signature

function extractPANsWithDetails(string $text): array

Parameters

Parameter Type Required Description
$text string Yes Plain text to scan. Can be any length. Typically the output of extractTextFromFile() or any of the individual parser functions. Case-insensitive — the function internally uppercases the input before matching.

Return Value

Returns an array of associative arrays, one per unique PAN found:

[
    [
        'pan'            => 'ABCPK1234F',   // The matched PAN number (uppercase)
        'entity_type'    => 'Individual (Person)',  // Decoded from 4th character
        'holder_initial' => 'K',            // 5th character — first letter of name/surname
    ],
    // ...
]

Returns an empty array [] if no valid PAN numbers are found.


Entity Type Reference

The 4th character of a PAN encodes the holder's entity type:

Character Entity Type
P Individual (Person)
C Company
H Hindu Undivided Family (HUF)
F Firm / LLP
A Association of Persons (AOP)
B Body of Individuals (BOI)
G Government
J Artificial Juridical Person
L Local Authority
T Trust
E Limited Liability Partnership

Example

<?php
require_once '/var/www/html/vendor/autoload.php';

$text = "
    Customer: Ramesh Kumar, PAN: ABCPK1234F
    Vendor: Kanika Exports Pvt Ltd, PAN: AAACR5055K
    Invalid: XYZ123 or ABCD123E (not a PAN)
";

$results = extractPANsWithDetails($text);

foreach ($results as $item) {
    echo "PAN          : {$item['pan']}\n";
    echo "Entity Type  : {$item['entity_type']}\n";
    echo "Holder Initial: {$item['holder_initial']}\n";
    echo "---\n";
}

Output:

PAN           : ABCPK1234F
Entity Type   : Individual (Person)
Holder Initial: K
---
PAN           : AAACR5055K
Entity Type   : Company
Holder Initial: R
---

Full Pipeline Example

// Scan any uploaded file for PAN numbers in one call
$text    = extractTextFromFile('/var/www/uploads/party_master.xlsx');
$results = extractPANsWithDetails($text);

if (empty($results)) {
    echo "No PAN numbers found in the document.\n";
} else {
    echo count($results) . " PAN(s) found:\n";
    foreach ($results as $item) {
        echo "  {$item['pan']} — {$item['entity_type']}\n";
    }
}

Helper: extractPANNumbers()

Lower-level function that returns only the matched PAN strings, without decoding entity details. Use when you only need the raw PAN list.

Signature

function extractPANNumbers(string $text): array

Parameters

Parameter Type Required Description
$text string Yes Plain text to scan. Internally uppercased before matching.

Return Value

array of unique PAN strings (e.g. ['ABCPK1234F', 'AAACR5055K']). Empty array if none found.

Example

$pans = extractPANNumbers("PAN: ABCPK1234F and AAACR5055K");
// ['ABCPK1234F', 'AAACR5055K']

Helper: isValidPAN()

Validates a single PAN string against the full format rule. Use for validating user-entered PAN fields in forms or API inputs.

Signature

function isValidPAN(string $pan): bool

Parameters

Parameter Type Required Description
$pan string Yes A single PAN string to validate. Leading/trailing whitespace is trimmed. Case-insensitive.

Return Value

true if the string is a valid PAN format, false otherwise.

Example

isValidPAN('ABCPK1234F');  // true
isValidPAN('ABCD1234F');   // false — only 9 chars
isValidPAN('abcpk1234f');  // true — lowercased internally
isValidPAN('1BCPK1234F');  // false — first char must be a letter
isValidPAN('ABCZK1234F');  // false — 4th char Z is not a valid entity code

Regex Pattern Reference

The underlying regex used by extractPANNumbers():

/\b([A-Z]{3}[ABCFGHLJPTE][A-Z]\d{4}[A-Z])\b/
Segment Pattern Meaning
Characters 1–3 [A-Z]{3} Issuing office code
Character 4 [ABCFGHLJPTE] Entity type (strict whitelist — rejects invalid codes)
Character 5 [A-Z] First letter of surname or entity name
Characters 6–9 \d{4} Sequential number
Character 10 [A-Z] Alphabetic check character
Boundaries \b...\b Word boundaries — prevents matching inside longer strings

Notes & Caveats

!!! tip "Duplicate Handling" extractPANNumbers() uses array_unique() internally — each PAN appears only once in the result, even if it appears multiple times in the source text.

!!! warning "OCR Quality" If text was extracted from a scanned PDF via OCR, characters like O/0 or I/1 may be misread, causing a real PAN to fail the regex. Validate results against a known master list where possible.

!!! note "Not a Legal Validator" This function validates PAN format only. It does not verify the PAN against the Income Tax Department database. For authoritative verification, use the official ITD API or a NSDL-authorised verification service.


See Also


# Add to mkdocs.yml nav:
- Developer Zone:
  - File Parsing Reference:
    - extractPANsWithDetails: dev/extract_pans_with_details.md