I picked an arbitrary database table and exported a million rows to a CSV file and also to a parquet file. It's 234 MB vs 7 MB.
Parquet files are nice!
@elric Packet capture data? (From WireShark?)
@DaveMasonDotMe Yes. I have to bury some malicious traffic in a bunch of grey traffic for obfuscation purposes (creating some training).
@elric Ah, gotcha. Creating sanitized data for training/demos can be a lot of work.
@DaveMasonDotMe whoa.
@DaveMasonDotMe is parquet a binary file? Sorta like how SQLite makes a db file? Seems like it from my quick search!
@baguette Binary file? Yeah, I think so. I keep seeing that parquet file data is "columnar".
I suspect it is similar in concept to the way Power BI data is stored/compressed via VertiPaq...or the way SQL Server stores/compresses data for columnstore.
@DaveMasonDotMe @baguette Parquet is a binary file that stores the data in columnar mode. So yes, very similar to Columnstore or Vertipaq, but optimized for Spark processing
@DaveMasonDotMe Funny, I'm exporting pcap to CSV to modify it and then trying to create a new pcap from it.