Comment filtrer et trier un fichier CSV de 1 Go

12 avril 2026 8 min de lecture

You need to find all the rows where the status is "failed" in a 1 GB CSV file. Or you need to sort 10 million transaction records by date. You open Excel, it stalls. You try Google Sheets, it rejects the upload. You are stuck with a perfectly good file that none of your usual tools can handle.

This is a common problem, and it has good solutions. The approach depends on what you need: a quick one-off filter, a repeatable analysis pipeline, or an interactive exploration of the data. This guide covers all three.

Why normal spreadsheet tools choke on large files

Before diving into solutions, it helps to understand the bottleneck. When you filter or sort data in Excel or Google Sheets, the application needs to:

  1. Load the entire file into memory. A 1 GB CSV becomes 3-8 GB in memory once parsed into the application's internal data structures.
  2. Build an index for the column you are filtering or sorting on.
  3. Re-render the grid with only the matching rows (filter) or in the new order (sort).

Excel caps out at 1,048,576 rows, so if your 1 GB file has more rows than that, it silently truncates. Google Sheets limits you to 10 million cells and rejects file uploads over about 50 MB. Even if the file fits within these limits, the memory overhead makes the application unresponsive.

The tools below take different approaches to avoid this bottleneck.

Approach 1: Command-line filtering with grep and awk

For quick, one-off filtering tasks, command-line tools are unbeatable. They process files line by line, streaming through the data without loading it into memory. A 1 GB file takes seconds, not minutes.

Simple text filtering with grep

If you need all rows containing a specific value, grep is the fastest option:

Terminal
# Find all rows containing "failed" grep "failed" transactions.csv > failed_transactions.csv # Case-insensitive search grep -i "error" server_log.csv > errors.csv # Include the header row in the output head -1 transactions.csv > failed_transactions.csv grep "failed" transactions.csv >> failed_transactions.csv

On modern hardware, grep processes roughly 1 GB every 5-10 seconds. It is pre-installed on macOS and Linux, and available on Windows through WSL or Git Bash.

Column-specific filtering with awk

When you need to filter based on a specific column (not just any text match), awk is the right tool. It splits each line by a delimiter and lets you reference columns by number:

Terminal
# Filter where column 5 equals "failed" (comma-separated) awk -F',' '$5 == "failed"' transactions.csv > failed.csv # Filter where column 3 (amount) is greater than 1000 awk -F',' '$3 > 1000' transactions.csv > large_transactions.csv # Multiple conditions: column 5 is "failed" AND column 3 > 1000 awk -F',' '$5 == "failed" && $3 > 1000' transactions.csv > big_failures.csv

Caveat: awk splits on a simple delimiter. If your CSV has quoted fields containing commas (e.g., "New York, NY"), awk will mis-split those fields. For properly quoted CSVs, use csvkit or Python instead.

Sorting with the sort command

The Unix sort command can sort files much larger than available RAM by using temporary files on disk. It is surprisingly powerful for large-file sorting:

Terminal
# Sort by column 3 numerically (comma-separated) sort -t',' -k3 -n transactions.csv > sorted.csv # Sort by column 1 (date), reverse order sort -t',' -k1 -r transactions.csv > sorted_desc.csv # Sort a file larger than RAM (sort uses temp files automatically) sort -t',' -k3 -n --buffer-size=512M transactions.csv > sorted.csv

A 1 GB file typically sorts in 30-90 seconds. The --buffer-size flag controls how much RAM sort uses before spilling to disk. Set it to about half your available RAM for best performance.

Approach 2: Python with pandas

If you need more complex filtering logic, multi-column sorts, or want to combine filtering and sorting in a reproducible script, Python with pandas is the standard tool.

Filtering a large CSV

Python
import pandas as pd # Read the CSV (needs ~3-5x file size in RAM) df = pd.read_csv('transactions.csv') # Filter by column value failed = df[df['status'] == 'failed'] print(f"Found {len(failed):,} failed transactions") # Multiple conditions big_failures = df[(df['status'] == 'failed') & (df['amount'] > 1000)] # Save the result big_failures.to_csv('big_failures.csv', index=False)

Sorting a large CSV

Python
import pandas as pd df = pd.read_csv('transactions.csv') # Sort by one column df_sorted = df.sort_values('date') # Sort by multiple columns df_sorted = df.sort_values(['status', 'amount'], ascending=[True, False]) # Save df_sorted.to_csv('sorted_transactions.csv', index=False)

When the file does not fit in memory

If your file is larger than available RAM, read it in chunks:

Python
import pandas as pd # Filter in chunks — works for any file size filtered_chunks = [] for chunk in pd.read_csv('huge_file.csv', chunksize=500_000): filtered = chunk[chunk['status'] == 'failed'] filtered_chunks.append(filtered) result = pd.concat(filtered_chunks) result.to_csv('filtered.csv', index=False) print(f"Filtered {len(result):,} rows")

Note: Chunked reading works well for filtering but is tricky for sorting, because you need to see all the data to determine the correct order. For sorting files that do not fit in memory, the Unix sort command or a tool like DuckDB is more practical.

DuckDB: SQL on CSV files

For the best of both worlds — SQL syntax with excellent large-file performance — DuckDB is worth knowing about:

Python
import duckdb # Filter and sort a 1GB CSV with SQL — faster than pandas result = duckdb.sql(""" SELECT * FROM 'transactions.csv' WHERE status = 'failed' AND amount > 1000 ORDER BY date DESC """) result.write_csv('result.csv')

DuckDB is designed for analytical queries on large files. It uses columnar processing and can handle files much larger than RAM without chunking.

Approach 3: Interactive filtering and sorting in Viztab

Command-line tools are fast. Python is flexible. But sometimes you need to explore data interactively — try different filters, sort by various columns, scan through results visually. That is where a graphical tool makes the difference.

Viztab handles GB-sized CSV files with the same point-and-click interface you expect from a spreadsheet, but without the crashes or row limits.

1

Import your file

Open viztab.com/app and drag in your CSV. A 1 GB file typically loads in under 30 seconds.

2

Filter interactively

Click any column header to filter. Type a value, select conditions (equals, contains, greater than), and combine multiple filters.

3

Sort and export

Click column headers to sort ascending or descending. When you have the data you need, export the filtered/sorted result as CSV or XLSX.

Because Viztab processes data locally in your browser, there is no file upload to a server. Your data stays on your machine. The streaming engine indexes the file as it loads, so sorting and filtering operate on pre-built indexes rather than scanning the entire file each time.

Filter and sort your CSV in Viztab →

Practical tips for large CSV operations

Comparison: which tool for which task

Task Best Tool Speed (1 GB file)
Quick text filter grep 5-10 seconds
Column-specific filter awk / Viztab 10-20 seconds
Complex multi-filter Python / Viztab 20-60 seconds
Sort entire file sort / Viztab 30-90 seconds
Interactive exploration Viztab Instant after load
Repeatable pipeline Python / DuckDB 20-60 seconds

Questions fréquentes

Comment trier un fichier CSV trop volumineux pour Excel ?

You have three main options: use the Unix sort command (sort -t',' -k3 -n file.csv), use Python with pandas (df.sort_values('column')), or use a dedicated large-file spreadsheet like Viztab that can sort millions of rows with a click. The command-line approach uses minimal memory; Python needs enough RAM to hold the data; Viztab streams data efficiently in the browser.

Can I filter a 1 GB CSV without loading it all into memory?

Yes. Command-line tools like grep and awk process files line by line using almost no memory. For example, grep 'pattern' file.csv > filtered.csv will stream through a 1 GB file in seconds. Python's pandas can also read in chunks with pd.read_csv('file.csv', chunksize=100000) to filter without loading everything at once.

Quel est le moyen le plus rapide de filtrer un gros fichier CSV ?

For simple text matching, grep is the fastest option — it can process a 1 GB file in under 10 seconds on modern hardware. For column-specific filtering, awk is nearly as fast. For complex multi-condition filters with a visual interface, Viztab provides instant filtering on large files with clickable column headers.

How long does it take to sort a 1 GB CSV file?

With the Unix sort command, sorting a 1 GB CSV typically takes 30-90 seconds depending on your hardware and the sort key. Python pandas takes roughly 20-60 seconds but requires enough RAM to hold the entire dataset (usually 3-5x the file size). Viztab sorts interactively, typically completing within a few seconds for files up to several gigabytes.

Filter and sort without the command line

Viztab gives you spreadsheet filtering and sorting on files that crash Excel. No code, no upload, no limits.

Ouvrir Viztab