How to create BigQuery tables on top of external data?
In this article let’s see how we can create external Big query table on top of the files present in the google storage bucket using python client.
Script location: https://github.com/abhr1994/Personal_Scripts/tree/main/BigqueryTables
We would need google-cloud-bigquery library installed before running the script. This can be installed by running below command,
pip install google-cloud-bigquery
There are 2 ways to create external Big Query tables on the existing data files,
First Method
Create external table using CREATE OR REPLACE EXTERNAL TABLE query. In this case the tables are created on top of the data residing in the google bucket. Data is still in original format and not BigQuery proprietary format.
The source file formats can be one of Parquet, CSV, JSON, ORC and the schema is inferred automatically.
Second Method
Load the external data file contents and create a new external BQ table. This method loads the data from source path into BigQuery.