A million rows used to be a database problem. Now it is a Tuesday afternoon. Transaction logs, ad impressions, sensor readings, user event streams — the files that land on analysts' desks routinely contain millions of rows. And the tools most people reach for first — Excel, Google Sheets — were never designed for this scale.
If you have tried opening a million-row CSV in Excel, you know the outcome: it either truncates your data at row 1,048,576 (silently dropping the rest), freezes for several minutes, or crashes entirely. Google Sheets gives up even earlier, with a 10-million-cell cap that translates to around 500,000 rows for a typical dataset.
This guide covers practical strategies for actually working with million-row data, from quick command-line techniques to tools that handle the scale natively.
Why traditional spreadsheets fail at this scale
The spreadsheet was invented in 1979 for financial modeling. VisiCalc had 254 rows. Even today, the fundamental architecture has not changed much: load every cell into memory, maintain formatting for each one, keep a formula dependency graph, and render the entire grid so users can scroll through it.
This approach works brilliantly for the 10,000-row budget or the 50,000-row sales report. It breaks when you add another zero or two.
| Row Count | Excel Behavior | Recommended Approach |
|---|---|---|
| Under 100K | Works fine | Any spreadsheet |
| 100K – 500K | Sluggish with formulas | Excel with manual calc, or Viztab |
| 500K – 1M | Frequent freezes | Viztab, Python, or DuckDB |
| 1M – 10M | Exceeds row limit | Viztab, pandas, DuckDB, database |
| Over 10M | Cannot open | Database, Polars, Spark |
The problem is not just the row limit. Even if your data fits under 1,048,576 rows, Excel loads every cell's formatting, metadata, and formula dependencies into RAM. A million rows with 20 columns means 20 million cell objects. With Excel's per-cell overhead, that is 2–4 GB of memory before you do anything.
Strategy 1: Filter before you load
The most effective strategy is also the simplest: do not load all the data. Most analysis questions do not require every row. If you need last quarter's Northeast sales, extract only those rows before opening anything visual.
Command-line pre-filtering
These commands stream through the file line by line. They use almost no memory and finish in seconds, even on multi-gigabyte files. They are pre-installed on macOS and Linux, and available on Windows through WSL.
Python pre-filtering with chunks
If your filtering logic is more complex than a simple text match, pandas can read the file in manageable chunks:
Power Query pre-filtering (Excel)
If you must stay in Excel, use Power Query instead of opening the file directly. Go to Data → Get Data → From Text/CSV, then use the query editor to filter rows and remove columns before loading. Power Query streams the file so Excel never has to hold the full dataset in memory.
Strategy 2: Sample first, then scale
When you are exploring an unfamiliar dataset, you do not need a million rows to understand its structure. A random 10,000-row sample tells you about columns, data types, value distributions, and quality issues. Build your analysis logic on the sample, then apply it to the full dataset.
This technique is especially valuable for data validation. If your sample reveals encoding issues, malformed rows, or unexpected values, you have saved yourself the time of loading the full file only to discover problems.
Strategy 3: Aggregate instead of browsing
Most questions about million-row datasets are aggregate questions: total revenue by region, average response time by endpoint, count of events by category. You need a summary, not every individual row.
DuckDB: SQL directly on CSV files
DuckDB is an embedded analytics database that queries CSV and Parquet files in place — no import step, no server, no setup beyond installing it:
This query runs in seconds on a million-row file. DuckDB uses columnar processing and does not load the full file into memory, so it works on files much larger than your available RAM.
Python pivot tables
Polars is worth noting here. For files over 1 GB, Polars is significantly faster than pandas because it uses lazy evaluation and processes data in parallel. The API is slightly different but the concepts are the same.
Strategy 4: Use a spreadsheet built for the scale
Sometimes you need the spreadsheet experience. You want to scroll through rows, eyeball the data, spot anomalies visually, and click to sort and filter. For this, you need a tool that was designed from the ground up for large datasets.
Viztab uses a fundamentally different architecture than Excel. Instead of loading every cell into memory, it indexes the file on import and renders only the rows currently visible on your screen. The result is a spreadsheet that stays responsive with millions of rows.
Import your data
Open viztab.com/app and drag your CSV, TSV, or XLSX file. Multi-million-row files load in seconds.
Explore interactively
Sort by any column, filter by value or range, search across all data. Results are instant because they operate on the index, not a linear scan.
Analyze and export
Use any of 370+ formulas on the full dataset. Export your results or filtered subsets as CSV or XLSX.
Key differences from Excel at the million-row scale:
- No row limit. Viztab handles millions of rows without truncation.
- Instant filtering. Filters apply against an index. No waiting for a million-row scan.
- Low memory usage. Only visible rows are rendered. The rest are indexed on disk, not held in RAM.
- Local processing. Your data never leaves your browser. No upload, no server, no account needed.
Strategy 5: Know when to use a database
If you query the same large dataset repeatedly, join it with other tables, or need multiple people to access it, set up a proper database. PostgreSQL, MySQL, and even SQLite handle millions of rows efficiently and give you the full power of SQL with proper indexing.
The trade-off is setup time. Creating a schema, importing data, and writing queries takes longer than dragging a file into a spreadsheet. For one-off analysis, it is usually not worth it. For anything recurring, it pays for itself on the second run.
Performance tips that apply everywhere
- Use Parquet instead of CSV when possible. Parquet is columnar and compressed. A 1 GB CSV becomes ~150 MB as Parquet, and queries that only need a few columns skip reading the rest entirely. Most modern tools (pandas, Polars, DuckDB, Viztab) read Parquet natively.
- Drop columns you do not need. If your file has 50 columns and you only care about 5, select only those during import. Less data means faster everything.
- Sort on your most-filtered column. If you always filter by date, pre-sort by date. Some tools can skip scanning irrelevant sections of sorted data.
- Compress for storage, decompress for work. CSV files compress at roughly 10:1 with gzip. Keep the compressed copy as your archive.
Frequently asked questions
Technically, Excel supports up to 1,048,576 rows per worksheet. However, performance degrades significantly beyond 500,000 rows, especially with formulas. Sorting, filtering, and scrolling become sluggish, and any recalculation can freeze the application for extended periods. For data at the million-row mark, a purpose-built tool will be more reliable.
For interactive exploration with a visual interface, use Viztab or query the file directly with DuckDB. For statistical analysis, Python with pandas or Polars is the standard. For recurring reports, load the data into a database like PostgreSQL and use SQL. The best choice depends on whether you need a visual interface, programmatic access, or both.
In Excel, filtering a million rows is slow because it evaluates every row against your criteria in the UI thread. Tools designed for large data like Viztab or DuckDB index the data or use columnar storage, so filters return results in milliseconds even on millions of rows. If you must use Excel, use Power Query to pre-filter during import.
If you query the same large dataset repeatedly, join it with other tables, or share it across a team, a database is the right choice. But for ad hoc exploration of a file you just received, a spreadsheet interface is faster to start with. Tools like Viztab give you the visual spreadsheet experience with database-level performance for large files.
Your data is big. Your tools should keep up.
Viztab handles million-row spreadsheets the way Excel handles a thousand. No limits, no lag, no upload.
Open Viztab