Wikipedia:Reference desk/Archives/Computing/2022 September 16

Computing desk
< September 15 << Aug | September | Oct >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


September 16

edit

Querying from multiple CSV files

edit

I have a load of CSV files with filenames that follow a distinct pattern like AAAA_BB_CC_2022_09_WM_DTL. If it helps I could put all such files into a single folder. Altogether, there are more than a hundred and more will be created each month. I can, one by one, import those files into Excel and read them and convert them from tab-delimited text into usable Excel data. Or, I could access them, again one-by-one, using Excel 2016's PowerQuery program. I'm not super familiar with PQ, but I assume I can figure out a way to muddle my way into converting it into a useful data dump. However, what I'd really like to do is not slog through stuff one by one at all. :) Is there a kind of short cut to telling Excel, "See here now, Excel, these are all data tables. I'd like you to read each one, summarize it by grouping columns A, B, and G, providing the totals you found in columns J and K. Good lad!" If they were all already in Excel format instead of CSV, I think I'd be able to construct SUMIFS formulas to read each file, but they're not so I can't. I think? Matt Deres (talk) 12:38, 16 September 2022 (UTC)[reply]

The format is not super-clearly specified, but that looks like a trivial task for AWK or a small Python script. --Stephan Schulz (talk) 22:48, 16 September 2022 (UTC)[reply]
I should have been more explicit: the files have a CSV extension, but the data itself is tab delimited. I assume that causes the somewhat unusual behaviour of Excel where is opens the file but makes no attempt to identify the columns and just gives me a pile of crap. Forcing the data to columns action works fine though. I'm on a Windows system and don't really have any programming experience. Matt Deres (talk) 02:06, 17 September 2022 (UTC)[reply]
It's not clear to me what you're asking for. Do you just want Excel to treat all your csv files as if they were concatenated together into one big file? You can do that by opening a cmd prompt and entering
type *.csv > bigfile.csv
and then import bigfile.csv into Excel. If that's not what you're asking for, could you give a small example to explain the behavior you are looking for? CodeTalker (talk) 04:41, 17 September 2022 (UTC)[reply]
Yes, that's why AWK comes to mind. It used tab as a field separator by default, so it's trivial to access the different columns and do stuff with them. I'm probably as good with Excel as you are with programming, so I have no idea about using that. But if the files have consistent data fields, doing simple summation and grouping would be easy. I think there are AWK implementations for Windows, but I think CodeTalker's suggestion might be a better match for an Excel-centric Windows workflow. --Stephan Schulz (talk) 21:48, 17 September 2022 (UTC)[reply]
Thank you all for the suggestions. My post was not very well defined because I didn't know what kind of options were out there. For now, I'll take CodeTalker's suggestion to combine the files and see where that takes me. Thanks again. Matt Deres (talk) 19:14, 22 September 2022 (UTC)[reply]