


A Practical Guide to Parsing Fixed-Width Data with PHP String Slicing
Jul 26, 2025 am 09:50 AMUsing substr() to slice by position, trim() to remove spaces and combine field mapping is the core method of parsing fixed-width data. 1. Define the field start position and length or only define the width to calculate the start bit by the program; 2. Use substr($line, $start, $length) to extract the field content, omit the length to get the remaining part; 3. Apply trim() to clear the fill spaces for each field result; 4. Implement reusable analytical functions through loops and schema arrays; 5. Handle edge cases such as completion when the line length is insufficient, empty line skips, missing value set default values and type verification; 6. Use file() for small files when reading files to use fopen() to stream row by line to prevent memory overflow. This method requires no external libraries, is efficient and easy to maintain.
Parsing fixed-width data is a common task when dealing with legacy systems, government reports, or flat-file data exports where fields aren't separated by delimiters like commas or tabs. Instead, each field occurs a predefined number of characters in a line. PHP's string slicing capabilities—particularly substr()
—make this task straightforward and reliable.

Here's how to do it right.
Understanding Fixed-Width Format
In a fixed-width file, each record (line) is structured so that specific columns represent specific fields. For example:

John Doe 35Engineer Jane Smith 28Designer
Here, the format might be:
- First name: 8 characters
- Last name: 8 characters
- Age: 2 characters
- Job: the rest
Even if a value is shorter (like "John"), it's padded with spaces to fit the column width. This consistency allows us to extract data by position, not content.

Using substr()
to Slice Fields
PHP's substr($string, $start, $length)
function is perfect for extracting parts of a string by position.
$line = "John Doe 35Engineer"; $firstName = trim(substr($line, 0, 8)); // "John" $lastName = trim(substr($line, 8, 8)); // "Doe" $age = trim(substr($line, 16, 2)); // "35" $job = trim(substr($line, 18)); // "Engineer" — omit length to get rest
Key points:
-
substr()
uses zero-based indexing. - Always
trim()
the result to remove padding spaces. - If you omit the third argument,
substr()
returns everything from the start position to the end.
This approach is fast, readable, and doesn't require external libraries.
Defining a Field Map for Reusability
Hardcoding positions gets messy with many fields. Define a schema:
$schema = [ ['name' => 'first_name', 'start' => 0, 'length' => 8], ['name' => 'last_name', 'start' => 8, 'length' => 8], ['name' => 'age', 'start' => 16, 'length' => 2], ['name' => 'job', 'start' => 18, 'length' => 0], // 0 = rest of line ];
Now parse any line using a loop:
function parseFixedWidth($line, $schema) { $record = []; foreach ($schema as $field) { $value = substr($line, $field['start'], $field['length'] ?: null); $record[$field['name']] = trim($value); } return $record; } $line = "John Doe 35Engineer"; $data = parseFixedWidth($line, $schema); // Result: ['first_name' => 'John', 'last_name' => 'Doe', ...]
Using ?: null
allows the last field to capture the remainder of the line.
Handling Edge Cases
Real-world data isn't always perfect. Here's how to stay safe:
Check line length before slicing:
if (strlen($line) < 18) { // Handle short lines — log, skip, or pad $line = str_pad($line, 18, ' '); }
Use default values for missing or empty fields:
$record['age'] = trim($age) ?: null;
Validate numeric fields :
$age = trim(substr($line, 16, 2)); $record['age'] = is_numeric($age) ? (int)$age : null;
Skip empty lines :
if (trim($line) === '') continue;
Reading from a File
Most fixed-width data comes from .txt
or .dat
files. Use file()
or fopen()
:
$lines = file('data.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); foreach ($lines as $line) { $data[] = parseFixedWidth($line, $schema); }
For large files, use streaming to avoid memory issues:
$handle = fopen('data.txt', 'r'); while (($line = fgets($handle)) !== false) { $line = trim($line); if ($line) { $data[] = parseFixedWidth($line, $schema); } } fclose($handle);
Bonus: Dynamic Schema with Offsets
Instead of manually counting positions, define widths and let PHP calculate starts:
$fieldWidths = [ ['name' => 'first_name', 'width' => 8], ['name' => 'last_name', 'width' => 8], ['name' => 'age', 'width' => 2], ['name' => 'job', 'width' => 0], // rest ]; // Build schema with start positions $schema = []; $pos = 0; foreach ($fieldWidths as $field) { $schema[] = [ 'name' => $field['name'], 'start' => $pos, 'length' => $field['width'] ]; if ($field['width'] > 0) { $pos = $field['width']; } }
Now you only specify widths — no counting columns on paper.
Parsing fixed-width data in PHP doesn't need to be painful. With substr()
, a clear schema, and a few defendive checks, you can turn rigid, space-padded lines into clean, usable arrays. Whether you're importing payroll data or processing old mainframe exports, this method is fast, reliable, and easy to maintain.
Basically: slice by position, trim the spaces, and map it cleanly. That's the core.
The above is the detailed content of A Practical Guide to Parsing Fixed-Width Data with PHP String Slicing. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

NegativeoffsetsinPythonallowcountingfromtheendofastring,where-1isthelastcharacter,-2isthesecond-to-last,andsoon,enablingeasyaccesstocharacterswithoutknowingthestring’slength;thisfeaturebecomespowerfulinslicingwhenusinganegativestep,suchasin[::-1],whi

array_slice()treatsnulloffsetsas0,clampsout-of-boundsoffsetstoreturnemptyarraysorfullarrays,andhandlesnulllengthas"totheend";substr()castsnulloffsetsto0butreturnsfalseonout-of-boundsorinvalidoffsets,requiringexplicitchecks.1)nulloffsetinarr

Avoidrawindexmathbyencapsulatingslicinglogicinnamedfunctionstoexpressintentandisolateassumptions.2.Validateinputsearlywithdefensivechecksandmeaningfulerrormessagestopreventruntimeerrors.3.HandleUnicodecorrectlybyworkingwithdecodedUnicodestrings,notra

Using substr() to slice by position, trim() to remove spaces and combine field mapping is the core method of parsing fixed-width data. 1. Define the starting position and length of the field or only define the width to calculate the start bit by the program; 2. Use substr($line,$start,$length) to extract the field content, omit the length to get the remaining part; 3. Apply trim() to clear the fill spaces for each field result; 4. Use reusable analytical functions through loops and schema arrays; 5. Handle edge cases such as completion when the line length is insufficient, empty line skips, missing values set default values and type verification; 6. Use file() for small files to use fopen() for large files to streamline

CharactersandbytesarenotthesameinPHPbecauseUTF-8encodinguses1to4bytespercharacter,sofunctionslikestrlen()andsubstr()canmiscountorbreakstrings;1.alwaysusemb_strlen($str,'UTF-8')foraccuratecharactercount;2.usemb_substr($str,0,3,'UTF-8')tosafelyextracts

Usestringviewsormemory-efficientreferencesinsteadofcreatingsubstringcopiestoavoidduplicatingdata;2.Processstringsinchunksorstreamstominimizepeakmemoryusagebyreadingandhandlingdataincrementally;3.Avoidstoringintermediateslicesinlistsbyusinggeneratorst

Using a smooth interface to handle complex string slices can significantly improve the readability and maintainability of the code, and make the operation steps clear through method chains; 1. Create the FluentString class, and return self after each method such as slice, reverse, to_upper, etc. to support chain calls; 2. Get the final result through the value attribute; 3. Extended safe_slice handles boundary exceptions; 4. Use if_contains and other methods to support conditional logic; 5. In log parsing or data cleaning, this mode makes multi-step string transformation more intuitive, easy to debug and less prone to errors, ultimately achieving elegant expression of complex operations.

Using mb_substr() is the correct way to solve the problem of Unicode string interception in PHP, because substr() cuts by bytes and causes multi-byte characters (such as emoji or Chinese) to be truncated into garbled code; while mb_substr() cuts by character, which can correctly process UTF-8 encoded strings, ensure complete characters are output and avoid data corruption. 1. Always use mb_substr() for strings containing non-ASCII characters; 2. explicitly specify the 'UTF-8' encoding parameters or set mb_internal_encoding('UTF-8'); 3. Use mb_strlen() instead of strlen() to get the correct characters
