Re: Unusually high amount of data processed when p...

jnschan · 01-06-2022 08:57 PM

Hi all,

I want to break up a main table by 'advertiser' into different tables to a different project in Big Query.

I recently attempted to migrate this table of 27 GiB from one project to another by filtering the table into separate tables in a python script using pandas. I ran the script over night and the cost was that of 97 TiB worth of query processing.

Basically I did a double for loop in the script and I ran the query to process select statements through each "day" and each "advertiser" in that 27 GB worth of data, 28 million rows.

I'm wondering why the cost is so high and so different to what I expected the 27 GiB to be. (My for loop basically partitioned the data and should be the same...) The documentation said it would be less than 5 USD?

If I did the alternative and ran the query within the console (not with my script) and just ran a

"select * from table where ..." and set the query settings to create a new custom table with a custom name. Will it cost less than 5 usd or will I face the same 97 TiB cost?

Best wishes,
jnschan

Ismail

Hi Jnschan,

It seems that:

- 27 GiB of size of Table.

- 97 TiB is the size of the processed data.

Therefore, there should not be any difference in cost in running the same exact query, which will process the same amount of data, within the console.

For investigating this further, you would need to file a public issue and we'd be glad to help there as they would require from you further details (that would be sensitive to share here).

Unusually high amount of data processed when processing 27 GB Big query table with a script