HomeBlogBackend
Backend

Node.js Streams: Processing Gigabytes Without Crashing

How to use Node.js readable and transform streams to process multi-gigabyte files, large API responses, and real-time data pipelines with constant memory usage.

Dec 5, 2024 9 min read 14.6k views
Node.js Streams Performance Backend

The classic mistake when processing a 2GB CSV in Node.js is loading it all into memory with fs.readFileSync. Your process hits 2GB+ RAM, slows to a crawl, and crashes. Streams let you process the same file with a flat ~30MB memory footprint, regardless of file size.

The Stream Mental Model

Think of streams as a pipe. Data flows through in chunks — you transform each chunk as it passes, then pass it on. You never hold the whole dataset in memory at once. Node.js has four stream types: Readable, Writable, Duplex (both), and Transform (duplex with transformation).

javascript
import { createReadStream, createWriteStream } from 'fs';
import { Transform } from 'stream';
import { pipeline } from 'stream/promises';
import { createGzip } from 'zlib';

const upperCase = new Transform({
  transform(chunk, encoding, callback) {
    callback(null, chunk.toString().toUpperCase());
  }
});

// Process a 2GB file with ~30MB RAM
await pipeline(
  createReadStream('input.csv'),
  upperCase,
  createGzip(),
  createWriteStream('output.csv.gz')
);
💡 Tip

Always use stream/promises pipeline() instead of pipe(). It properly propagates errors and cleans up resources — pipe() silently swallows errors in many cases.

Backpressure: The Hidden Footgun

Backpressure happens when you read faster than you write — your in-memory buffer grows without bound and you're back to the OOM problem you were trying to avoid. pipeline() handles this automatically. If you're using pipe() or manual event listeners, you must check the return value of .write() and pause the readable when it returns false.

Real-World Pattern: Streaming JSON from a Database

Most ORMs return the full result set at once. For large exports, use cursor-based streaming directly from your database driver. PostgreSQL's pg library supports row-by-row streaming via the QueryStream module, letting you pipe millions of rows directly to an HTTP response or a file without buffering.

javascript
import QueryStream from 'pg-query-stream';
import JSONStream from 'JSONStream';

app.get('/export', async (req, res) => {
  const client = await pool.connect();
  const query = new QueryStream('SELECT * FROM orders WHERE year = $1', [2024]);
  const stream = client.query(query);

  res.setHeader('Content-Type', 'application/json');
  stream
    .pipe(JSONStream.stringify())
    .pipe(res)
    .on('finish', () => client.release());
});

Found this useful?

Share it with your network