[go: up one dir, main page]

Is GCS a storage for Hbase in Dataproc

Where does the writes happen for hbase in a Dataproc cluster?

Is it GCS or PD? 

 

0 3 395
3 REPLIES 3

glen_yu
Google Developer Expert
Google Developer Expert

GCS is an HDFS alternative for Dataproc via the Cloud Storage connector. 

 

BigTable is your alternative for HBase

If I setup gcs connector, where will the initial writes to dB happen is it in gcs or pd s?

Ok, so just to be clear here, HDFS is not a DB -- it's a distributed filesystem.  But if you do use the GCS Connector to use GCS as HDFS, that I believe is only for the final output.  Your Dataproc nodes still have HDD/SSD provisioned on them which is used for intermediate read/writes by the job you're running.


Using GCS as an HDFS alternative allows you to shutdown your Dataproc clusters when you don't need it and still retain the data.  Traditional HDFS is spread out over multiple machines (hence the "distributed" in its name) so you would have to leave machines running to maintain that HDFS.