Comment filtrer et trier un fichier CSV de 1 Go

Q: How do I sort a CSV file that is too large for Excel?

You have three main options: use the Unix sort command (sort -t',' -k3 -n file.csv), use Python with pandas (df.sort_values('column')), or use a dedicated large-file spreadsheet like Viztab that can sort millions of rows with a click. The command-line approach uses minimal memory; Python needs enough RAM to hold the data; Viztab streams data efficiently in the browser.

Q: Can I filter a 1GB CSV without loading it all into memory?

Yes. Command-line tools like grep and awk process files line by line using almost no memory. For example, grep 'pattern' file.csv > filtered.csv will stream through a 1GB file in seconds. Python's pandas can also read in chunks with pd.read_csv('file.csv', chunksize=100000) to filter without loading everything at once.

Q: What is the fastest way to filter a large CSV file?

For simple text matching, grep is the fastest option — it can process a 1GB file in under 10 seconds on modern hardware. For column-specific filtering, awk is nearly as fast. For complex multi-condition filters with a visual interface, Viztab provides instant filtering on large files with clickable column headers.

Q: How long does it take to sort a 1GB CSV file?

With the Unix sort command, sorting a 1GB CSV typically takes 30-90 seconds depending on your hardware and the sort key. Python pandas takes roughly 20-60 seconds but requires enough RAM to hold the entire dataset (usually 3-5x the file size). Viztab sorts interactively, typically completing within a few seconds for files up to several gigabytes.

You need to find all the rows where the status is "failed" in a 1 GB CSV file. Or you need to sort 10 million transaction records by date. You open Excel, it stalls. You try Google Sheets, it rejects the upload. You are stuck with a perfectly good file that none of your usual tools can handle.

This is a common problem, and it has good solutions. The approach depends on what you need: a quick one-off filter, a repeatable analysis pipeline, or an interactive exploration of the data. This guide covers all three.

Why normal spreadsheet tools choke on large files

Before diving into solutions, it helps to understand the bottleneck. When you filter or sort data in Excel or Google Sheets, the application needs to:

Load the entire file into memory. A 1 GB CSV becomes 3-8 GB in memory once parsed into the application's internal data structures.
Build an index for the column you are filtering or sorting on.
Re-render the grid with only the matching rows (filter) or in the new order (sort).

Excel caps out at 1,048,576 rows, so if your 1 GB file has more rows than that, it silently truncates. Google Sheets limits you to 10 million cells and rejects file uploads over about 50 MB. Even if the file fits within these limits, the memory overhead makes the application unresponsive.

The tools below take different approaches to avoid this bottleneck.

Approach 1: Command-line filtering with grep and awk

For quick, one-off filtering tasks, command-line tools are unbeatable. They process files line by line, streaming through the data without loading it into memory. A 1 GB file takes seconds, not minutes.

Simple text filtering with grep

If you need all rows containing a specific value, grep is the fastest option:

Terminal

# Find all rows containing "failed"
grep "failed" transactions.csv > failed_transactions.csv

# Case-insensitive search
grep -i "error" server_log.csv > errors.csv

# Include the header row in the output
head -1 transactions.csv > failed_transactions.csv
grep "failed" transactions.csv >> failed_transactions.csv

On modern hardware, grep processes roughly 1 GB every 5-10 seconds. It is pre-installed on macOS and Linux, and available on Windows through WSL or Git Bash.

Column-specific filtering with awk

When you need to filter based on a specific column (not just any text match), awk is the right tool. It splits each line by a delimiter and lets you reference columns by number:

Terminal

# Filter where column 5 equals "failed" (comma-separated)
awk -F',' '$5 == "failed"' transactions.csv > failed.csv

# Filter where column 3 (amount) is greater than 1000
awk -F',' '$3 > 1000' transactions.csv > large_transactions.csv

# Multiple conditions: column 5 is "failed" AND column 3 > 1000
awk -F',' '$5 == "failed" && $3 > 1000' transactions.csv > big_failures.csv

Caveat: awk splits on a simple delimiter. If your CSV has quoted fields containing commas (e.g., "New York, NY"), awk will mis-split those fields. For properly quoted CSVs, use csvkit or Python instead.

Sorting with the sort command

The Unix sort command can sort files much larger than available RAM by using temporary files on disk. It is surprisingly powerful for large-file sorting:

Terminal

# Sort by column 3 numerically (comma-separated)
sort -t',' -k3 -n transactions.csv > sorted.csv

# Sort by column 1 (date), reverse order
sort -t',' -k1 -r transactions.csv > sorted_desc.csv

# Sort a file larger than RAM (sort uses temp files automatically)
sort -t',' -k3 -n --buffer-size=512M transactions.csv > sorted.csv

A 1 GB file typically sorts in 30-90 seconds. The --buffer-size flag controls how much RAM sort uses before spilling to disk. Set it to about half your available RAM for best performance.

Approach 2: Python with pandas

If you need more complex filtering logic, multi-column sorts, or want to combine filtering and sorting in a reproducible script, Python with pandas is the standard tool.

Filtering a large CSV

Python

import pandas as pd

# Read the CSV (needs ~3-5x file size in RAM)
df = pd.read_csv('transactions.csv')

# Filter by column value
failed = df[df['status'] == 'failed']
print(f"Found {len(failed):,} failed transactions")

# Multiple conditions
big_failures = df[(df['status'] == 'failed') & (df['amount'] > 1000)]

# Save the result
big_failures.to_csv('big_failures.csv', index=False)

Sorting a large CSV

Python

import pandas as pd

df = pd.read_csv('transactions.csv')

# Sort by one column
df_sorted = df.sort_values('date')

# Sort by multiple columns
df_sorted = df.sort_values(['status', 'amount'], ascending=[True, False])

# Save
df_sorted.to_csv('sorted_transactions.csv', index=False)

When the file does not fit in memory

If your file is larger than available RAM, read it in chunks:

Python

import pandas as pd

# Filter in chunks — works for any file size
filtered_chunks = []
for chunk in pd.read_csv('huge_file.csv', chunksize=500_000):
    filtered = chunk[chunk['status'] == 'failed']
    filtered_chunks.append(filtered)

result = pd.concat(filtered_chunks)
result.to_csv('filtered.csv', index=False)
print(f"Filtered {len(result):,} rows")

Note: Chunked reading works well for filtering but is tricky for sorting, because you need to see all the data to determine the correct order. For sorting files that do not fit in memory, the Unix sort command or a tool like DuckDB is more practical.

DuckDB: SQL on CSV files

For the best of both worlds — SQL syntax with excellent large-file performance — DuckDB is worth knowing about:

Python

import duckdb

# Filter and sort a 1GB CSV with SQL — faster than pandas
result = duckdb.sql("""
    SELECT * FROM 'transactions.csv'
    WHERE status = 'failed' AND amount > 1000
    ORDER BY date DESC
""")
result.write_csv('result.csv')

DuckDB is designed for analytical queries on large files. It uses columnar processing and can handle files much larger than RAM without chunking.

Approach 3: Interactive filtering and sorting in Viztab

Command-line tools are fast. Python is flexible. But sometimes you need to explore data interactively — try different filters, sort by various columns, scan through results visually. That is where a graphical tool makes the difference.

Viztab handles GB-sized CSV files with the same point-and-click interface you expect from a spreadsheet, but without the crashes or row limits.

Import your file

Open viztab.com/app and drag in your CSV. A 1 GB file typically loads in under 30 seconds.

Filter interactively

Click any column header to filter. Type a value, select conditions (equals, contains, greater than), and combine multiple filters.

Sort and export

Click column headers to sort ascending or descending. When you have the data you need, export the filtered/sorted result as CSV or XLSX.

Because Viztab processes data locally in your browser, there is no file upload to a server. Your data stays on your machine. The streaming engine indexes the file as it loads, so sorting and filtering operate on pre-built indexes rather than scanning the entire file each time.

Filter and sort your CSV in Viztab →

Practical tips for large CSV operations

Always preserve the header. When filtering with command-line tools, extract the header first (head -1 file.csv) and prepend it to your output. Otherwise your filtered CSV has no column names.
Check your delimiter. Not all CSVs use commas. Tab-separated (TSV), pipe-separated, and semicolon-separated files are common. Use head -3 file.csv to visually inspect the delimiter before writing filter commands.
Watch for quoted fields. If values contain the delimiter character (commas inside quoted strings), simple tools like awk will produce wrong results. Use a CSV-aware tool: csvkit, Python, or Viztab.
Chain operations. Filter first, then sort. Filtering reduces the dataset, making the subsequent sort faster. This applies whether you are using command-line pipes or a graphical tool.
Profile before processing. Run wc -l file.csv to count rows and ls -lh file.csv to check file size. This tells you which approach is appropriate: command-line for quick checks, Python for scripted analysis, Viztab for interactive exploration.

Comparison: which tool for which task

Task	Best Tool	Speed (1 GB file)
Quick text filter	grep	5-10 seconds
Column-specific filter	awk / Viztab	10-20 seconds
Complex multi-filter	Python / Viztab	20-60 seconds
Sort entire file	sort / Viztab	30-90 seconds
Interactive exploration	Viztab	Instant after load
Repeatable pipeline	Python / DuckDB	20-60 seconds

Questions fréquentes

Comment trier un fichier CSV trop volumineux pour Excel ?

You have three main options: use the Unix sort command (sort -t',' -k3 -n file.csv), use Python with pandas (df.sort_values('column')), or use a dedicated large-file spreadsheet like Viztab that can sort millions of rows with a click. The command-line approach uses minimal memory; Python needs enough RAM to hold the data; Viztab streams data efficiently in the browser.

Can I filter a 1 GB CSV without loading it all into memory?

Yes. Command-line tools like grep and awk process files line by line using almost no memory. For example, grep 'pattern' file.csv > filtered.csv will stream through a 1 GB file in seconds. Python's pandas can also read in chunks with pd.read_csv('file.csv', chunksize=100000) to filter without loading everything at once.

Quel est le moyen le plus rapide de filtrer un gros fichier CSV ?

For simple text matching, grep is the fastest option — it can process a 1 GB file in under 10 seconds on modern hardware. For column-specific filtering, awk is nearly as fast. For complex multi-condition filters with a visual interface, Viztab provides instant filtering on large files with clickable column headers.

How long does it take to sort a 1 GB CSV file?

With the Unix sort command, sorting a 1 GB CSV typically takes 30-90 seconds depending on your hardware and the sort key. Python pandas takes roughly 20-60 seconds but requires enough RAM to hold the entire dataset (usually 3-5x the file size). Viztab sorts interactively, typically completing within a few seconds for files up to several gigabytes.

Filter and sort without the command line

Viztab gives you spreadsheet filtering and sorting on files that crash Excel. No code, no upload, no limits.

Ouvrir Viztab

Comment filtrer et trier un fichier CSV de 1 Go

Why normal spreadsheet tools choke on large files

Approach 1: Command-line filtering with grep and awk

Simple text filtering with grep

Column-specific filtering with awk

Sorting with the sort command

Approach 2: Python with pandas

Filtering a large CSV

Sorting a large CSV

When the file does not fit in memory

DuckDB: SQL on CSV files

Approach 3: Interactive filtering and sorting in Viztab

Import your file

Filter interactively

Sort and export

Practical tips for large CSV operations

Comparison: which tool for which task

Questions fréquentes

Filter and sort without the command line

Articles connexes