*this is a study guide that was created from lecture videos and is used to help you gain an understanding of how the google cloud works.
GCP- google cloud platform
compute for virtualization, storage and other services
Compute Services- virtual machines, containers and container management (Docker)
and functions and lambdas.serverless computing
Storage Services- files, archival storage, persistent disks for virtual machines
Data Services- NoSQL, RDBMS, Hadoop and Spark
Big Data Services- data pipelines, data warehouses, data science notebooks, machine learning
Other services- security and identity, management and monitoring from Stackdriver, developer tools
Why Use Google Cloud?- Cloud services allow you to get to market faster. Scalable services save time for developers and DevOps. Automatic usage discounts equal less management of pricing.
Automatic scaling- build app and has as many customers as possible without running out of resources. The default is autoscale until you put a limit.
GCP is fast, has frequent improvements, autoscales massively and a good value. GCP has an engineering focus and has documentation challenges and pricing surprises.
Gcutil is a tool that is a part of the SDK and is used from the command line.
Use gmail account -> use free trial -> have credit card available
Working with the console- Google Cloud Console is a browser or web page that gives you the ability to work with cloud services that are available on GCP. Google works best in Chrome.
Setting the location, where physically are the virtual machines, databases and etc. This is called data center region.
Within GCP, A top level or container is called a project, inside a project you enable services and you set what service should be running. Projects have three associated values: ID, name or number. Find this information on the bottom right on your GCP project page.
Set a budget in GCP in the budgets & alerts tab to ensure you do not go over your spend. This is a crucial step when learning how the google cloud works.
In GCP, you want to think about making the container for the services. You would create your project, enable API’s and then set up the tools. Most API’s are turned on but some you have to enable manually.
Find API Services- GCP Console -> Home -> left side menu -> compute, storage, stackdriver, tools, big data
Pin the menu tab for easy access to services you use.
Enabling API’s- GCP Console -> API manager -> look to see which API’s are enabled. -> enable API -> Click the API you want to turn on -> Enable. You adjust API’s at the project level.
Google Cloud Security IAM is Identity and Access Management, which is where you can manage user roles and permissions. You can set quotas to disable auto scaling.
Adjust IAM- GCP Console -> menu -> IAM & Admin -> ‘compute engine’ is enabled by default -> Add or Edit Permissions of Users in this window -> Set Quotas in the left menu -> Create Service Account in left side menu. Service accounts are for services and IAM is for people or user accounts.
Download GCP SDK -> start gcloud from your terminal and project folder -> authenticate to gcloud -> list or set your project.
$ gloud version- command that will tell you the version of the client you have. Verify which version you have before you get started working.
$ gcloud auth login- authenticates the user
$ gcloud info –show-log – will show the log
$ gcloud config set project (my-project-id> – you can set your projects
$ gcloud init <my-project> – you can create a project
Make sure you are at the directory where your project is stored locally
Gcloud uses two dashes in arguments, copying and pasting can break the code.
Set up command line access with gcloud- download mac file -> extract the file -> run the install script -> change directory to script -> run install script
-> yes to set path -> set default -> close shell -> reopen shell and run $gcloud init command ->
->Login to Gmail to ensure you are authenticated
-> return to terminal and run $gcloud version to see the working version
Index into github by GCP for tutorials and code examples. –
Google Cloud github
GCP projects -> three dot menu on right side-> try an interactive tutorial -> select tutorial -> create a bucket on the google cloud for the tutorial
GCP to AWS Reference-
GCP vs AWS
Virtual Machines, Containers and Functions
Compute Services: App Engine, Compute Engine, Container Engine, Networking
Google Compute Service is a cloud hosted virtual machine that is an infrastructure as a service. It is enabled by default. You can create a Google Cloud Engine instance using the Google Cloud Engine Console, you can also use gcloud or scripting to regenerate instances in production environments. You can then connect to a Lynx machine and will be able to connect to the SSH, you can then use your virtual machine. For production, you would set other networking options such as security and firewalls
GCP Compute Engine -> create instance -> create -> connect via SSH
Customize machine type to save budget, you can change the OS to other than Linux Machines. You can set firewall rules to allow HTTP or HTTPS traffic. Scripting, metadata, and preamptibility, which is used for batch processing, are all features of a GCP Compute Instance.
Automation for launching customized virtual machines. These can be virtual machines with applications on the Compute Engine. These can be databases or components that are on preconfigured virtual machines. A deployment is a container for one or more services, used for reproducibility. Deployment manager prints output as deployment is occurring, it will take configuration and allow you to script it. The deployment manager wraps around the Google Compute Engine.
Use Cloud Launcher for setting up an IDE- GCP Home -> cloud launcher -> developer tools -> eclipse Che (open source developer workspace server and cloud IDE) -> launch on compute engine -> deploy -> wait for admin password to populate and copy it -> login to admin panel with admin user and password -> software will run on VM -> you will then have a browser-based IDE to code in Java that is being made on a Google Cloud -> go to compute engine you can see the instance of the VM you are running.
Instance Groups lets you setup virtual machines for load balancing. Load balancing is explained by letting you put how many instances that will be running and load balancing will monitor and send a health check to ensure you can create new instances with ease.
Instance Templates can be used within Instance Groups.
Compute and Storage are seperate. You will save money when Compute is turned off for instances you are not using. You can create a snapshot as a basis for a new VM.
Metadata is used for tags. Health checks allow you to run health checks against your instances. Zones will explain how many instances you have in the zones, which is used when there are many VM’s you are using. Operations is everything that has happened for activity history. Quota is how much resources you are using.
Deployment Manager -> click on deployment -> you can delete or adjust deployment of VM’s within deployment manager.
Application Virtualization is a more efficient method to store, compute and maintain certain applications. The hypervisor that is typically used for virtualization is replaced with the Docker Engine. This will remove the need for the guest OS to be put in each container.
Container Engine is Google’s product and is Google’s implementation of Docker. Kubernetes for container management is used as a way to manage application virtualizations after being setup.
Difference between containers and virtual machines- containers only need one operating system for as many containers. Containers still sit on top of virtual machines but they do not deal with a hypervisor to control the instances. Instead you use a container manager, which is the Container Engine for Google Cloud.
Containers use caching and allows for quicker application scaling up. Containers are easier to replicate because the setup is in code that you can access. It is easier to use a container to replicate an application than a virtual machine.
GCP Console Menu -> container engine -> create a container cluster -> fill out and create container cluster -> to see source code of container: GCP Console menu -> development
A Dockerfile is a simple text file where you put the configuration of the container.
GCP Console: Container engine, wait for cluster to complete with green check mark -> Access gcloud with button at top right -> (gcloud tool can be faster than authenticating through an installed client because you are logging into gcloud through being logged into your Google Console account) -> clobse sample code and paste into gcloud shell -> get gcloud credentials for clusters -> use docker command to build and push the application and bring down the image -> push ithe image into your cluster with a gcloud space docker push command (by pushing the image up, that is when application virtualization is happening) -> now that the image is in Kubernetes, use kubernetes control command to run the app from the image -> when you run kubernetes command to expose the deployed image then GCP will assign an external IP address to the container node, so you will be able to look at the service -> you can scale your application that containerized and running on the web -> use kubernetes commands to scale up and see if the scaling works..
Kubernetes is a cluster management service.
Create Container Cluster -> create zone to US and fill out information -> ‘Size’ is the amount of containers you are creating -> Click create -> Continue on right panel -> left panel Tools: Development -> Continue -> you will see the control files -> you should see, and click on redis-master-controller file -> go back to previous page (to the guestbook directory) -> click redis-master-service file -> Continue -> *wait for the cluster creation to finish -> left panel: Compute: Container Engine -> Click Activate Google Cloud Shell in top Panel -> once your cluster is available, -> Click Activate Google Cloud Shell in top right -> shell will open up, copy the code in the top right -> paste into the shell, this code is a gloud command, which allows you to clone from git repo: the source code for guestbook and viewing it -> copy new code ‘get- credentials’ to get gloud and Kubernetes setup in cloud -> paste into shell -> deploy redis-master file using the yaml file, by copying code redis-master-service -> paste into shell -> create a replication controller for Kubernetes, for the redis nodes -> List all the pods/application nodes by copying ‘get pods’ code -> copy and paste ‘get rc’ to redis master file -> copy get services and paste; you will see the multiple containers: Kubernetes and redis master -> Deploy Redis Read Replicas by pasting code to get redis workers/slaves -> Inspect your cluster: see what you have by using the Kubernetes controller; paste get pods -> paste get rc -> paste get services -> deploy the Service and Replication Controllers which is the guestbook front end, which is a php server that will talk to the slave or master slaves within redis (it is basically a website); paste create -f all-in-one into shell (this is a yaml or configuration file used) -> find the external IP by pasting code -> copy the External IP Address provided within loaded code in shell: -> paste copied IP into browser -> in left panel, Compute: Container Engine; it will show your new deployment that has multiple containers that are talking to each other -> Click on the cluster you newly created and you can find node pools, machine types and instance groups -> left panel Container Registry will be if you want to create your own container, for advanced engineers -> go back Container Engine to delete your created cluster by clicking on checkbox of container made -> click bottom ‘x’ in right corner, click tab to delete, click Delete button to fully delete the cluster and its resources.
This was application virtualization- clusters that are subsets of research usage on the vm that are being run by the container orchestration system (which we are using Kubernetes)
First Paas which offers application virtualization; it is a Google proprietary offering to implement containers and is competing with the open source way, Docker -> App Engine is beta that will allow you to supply your own Docker container using the App Engine (compared to using one from Google, as Docker seems to be the leading choice) -> go with Container Engine for larger projects as using Docker Container within App Engine is still in beta at this time. Google App Engine is a crucial component when understanding how the google cloud works.
Use App Engine
Using App Engine Compute: App Engine -> Select your first app -> select programming language -> click location where you want the server to be -> click Continue on right tab -> First Project: Continue -> left panel, Tools: Development, Continue -> click at YAML file (app.yaml) which is your configuration file -> click Continue in right tab to start your programming language -> go back into the root by clicking ‘/’ -> open Google Cloud Shell -> to begin creating application, clone by using code into shell (which is similar to the container process) -> test your app by using specific packages for your programing languages -> your app will then be running ->, access the App by clicking Web preview button and selecting port 8080 ->deploy app by using Google Cloud Shell command ‘gcloud app deploy’ -> wait for it to be finished updating -> go back to Compute: App Engine, once your app is ready you can access it through the provided URL. App Engine is great for simple websites.
Type of virtualized computing that takes virtualization of applications and allows it to be ‘sliced’ narrower to make virtualization cheaper. Functions (which are compared to AWS Lamba) are used to simply provide code to run a method or function. The cloud vendor itself, in this case Google, manages the spin up, spin down and the scaling of a number of containers around those functions call for you. Functions will abstract the virtual machine, the OS it is run on, and the containers.
This is similar to functional programming on the cloud. Functions are cheaper than virtualizing containers or virtual machines. Functional programming is a different approach compared to object oriented programming.
GCS- google cloud storage, where all files are stored, the basis for working with file based information; it is accessed or used by compute resources
GCE- google compute engine, virtual machines
GKE- google container engine, where the containers are held
GAE- google app engine is google own proprietary container version for application virtualization,
Flexible is working with Docker with Kubernetes containers
Google Functions is their version of Serverless or AWS Lambda
Google Cloud Storage/GCS- use buckets, which are folders or files that are located as associated to a specific project, and information is stored in several ways
Create a bucket- Storage: storage -> Create Bucket -> name bucket and then select storage class and location -> upload files
Use Google Cloud Storage JSON API
JSON API Explorer is used to program against the cloud storage API
Go to API Manager in left panel -> click on Google Cloud Storage API -> click try this API in the API Explorer, which is a client calling against the Google Cloud storage for a particular project -> authorize all the scopes(OAuth 2.0 scopes) for the API Explorer for practice, in production, enable the smallest possible scope needed -> API Explorer is used to quickly look at the API for a Google Cloud Service -> *you can use API Explorer when you are preparing to program against the Google Cloud, writing a website, writing to a client, writing to a big data pipeline -> to go back into a project, select Home using left panel and find the Project widget
Google Cloud SQL is a managed MySQL that has operations such as backup, updating, patching are included in the price. Google Cloud SQL is lighter and focuses on storing and accessing the data
Using Google Cloud SQL- left panel, Storage: SQL -> Create Instance -> Choose Cloud SQL Second Generation unless using legacy system -> fill out information as needed for the instance -> *you will not have root access to managed services offered by Google Cloud compared to installing a MySQL database on a Google managed VM -> select the created instance to edit, import, export, restart, stop or delete it, you can also connect with the Google Cloud Shell within the instance
Google Cloud Datastore is a NoSQL document database that provides autoscaling, uses typed entities and associated properties, and you can query and index with Google Query Language (which is somewhat similar to SQL). You can use ACID properties to adjust the transactional consistency.
For Cloud Datastore, data objects are called entities.
Go to Storage: Datastore -> go to entities page -> create entity -> fill out Namespace, Kind, Key Identifier and Properties -> click Create -> click on created entity to see details -> select Query by GQL to query using GQL (which is similar to SQL)
Bigtable is a NoSQL Storage designed to support wide column databases, instead of Google Cloud Datastore for document style databases. Wide column databases have a key and end number of values. It is designed for bigger volumes of information such as IoT scenarios or logs.
Bigtable supports Apache HBase and does not have any relationship to SQL. It is instead a storage mechanism for big, wide logging tables.
Storage:Bigtable -> create instance -> fill out instance information -> *you can have up to 30 nodes per instance -> click Create -> click Activate Google Cloud Shell -> unzip files with code in shell -> you can create tables within Bigtable using Google Cloud Shell -> *use Bash scripting within Cloud Shell as needed
Fully managed Query as a service using SQL. You can upload your data, use SQL commands and get results. Using Google BigQuery is an important skill learning how the google cloud works.
BigQuery is able to do ad hoc queries where you pay by the query and it can also function as a data warehouse. Amazon Redshift, which is a relational data warehouse and Amazon Aurora where you can run SQL queries against an HBased style S3 set of data called data lakes.
Big Data:BigQuery -> login to Google BigQuery account -> *Queries are SQL queries and Jobs are data load -> you can use public datasets with tutorials for practice -> click Compose Query -> use SQL commands within New Query console -> Run Query -> * you can save the query results as a CSV, JSON, table or to Google Sheets -> click Show Options to see information about running Queries -> you can load data into a table by using a Job -> save and unzip file you will be loading into BigQuery -> select Create new dataset
-> fill out dataset information -> click drop down from newly created dataset -> create table by filling out information and choosing file to load data -> Create Table -> once table is loaded you can preview using dropdown of dataset
Compute: virtual machines and containers
Storage: files, databases, pipelines
Data Pipeline Services
Cloud Pub/Sub Messaging- reliable, asynchronous topic based message service, many to many
Big Data:Pub/Sub -> create a new project -> activate Google Cloud Shell ->run gcloud alpha pubsub to create topic -> create a subscription -> run gcloud commands topics list and subscriptions list -> run gcloud commands ->
Cloud Dataproc- managed Hadoop scalable clusters, used to be the dump truck of data processing that is low level, at the beginning of the pipeline. You can run Hadoop, Spark, Hive or Pig jobs on your dataproc cluster
Big Data: dataproc -> create a cluster by filling out information -> you will have set up hadoop libraries on the virtual machines you created, you can get the services going to get a cluster with two nodes -> you can then run a job by the left panel Jobs -> select Job: -> add path to Jar File -> submit -> click line wrapping -> you should see result from selected job
Cloud Dataflow- implements Apache Beam which is big ETL, and it integrates with Cloud Storage, Cloud Pub/Sub, Cloud Datastore, Cloud Bigtable and BigQuery, it creates and runs data processing pipelines, and has managed scalability and monitoring -> access dataflow by Big Data: dataflow. You are putting pieces together to build a data pipeline
Google Genomics- genomic variant processing at scale, used for bioinformatics, genomic sequencing, personalized medicine and can be used in pipelines with BigQuery. It is a data processing service for the Healthcare vertical
Big Data: genomics -> Activate Google Cloud Shell -> use source data or provide your own data -> *used for bioinformaticians
Machine learning- productization of predictive analytic algorithms or data mining and statistical algorithms. Cloud ML will integrate with Tensorflow
Cloud Datalab- visualization of machine learning
You can make you own machine learning models with Cloud Machine Learning or use a specific job such as Vision API. Amazon Cloud ML is competition of Google Machine Learning
Using Cloud Vision API- uses classification, image labeling
Try an interactive tutorial: Introduction to Cloud Vision API
Using Cloud Datalab- Jupyter/iPython style notebook used for data science, it integrates with services including BigQuery, Google Cloud Storage and Google Compute Engine.
Google Cloud networking services- basic networking when working with Google Cloud Engine virtual machine instances; advanced networking is used with groups of services in production
Compute: Compute Engine -> Create Instance -> click on management, disk, networking, SSH keys -> allow http or https -> change subnet if needed -> click Compute: Networking to adjust all networking settings; for production allow the default network to use the default settings; create a new network if you are going to use non default settings -> you can edit firewalls and routes which are used so traffic can be routed between instances and servers.
The default internet gateway is used so you can have external clients be able to interact with your virtual machines created. -> to create network, Create Network -> fill out information needed
Dashboard will show usage of resources for management and monitoring into the Google Cloud Platform
There is Stackdriver: monitoring, debug, trace, logging, error reporting. You can monitor both Google Cloud or AWS projects with Stackdriver; you get reports for the Cloud project and there are many features you can get reports on to monitor.
Understand source code repositories
Enables you to private Git hosting on Google Cloud. Tools: Development -> Development: repositories -> you can also add plugins to your development, by going to Development: tools and plugins
Use reference architectures
Using Google Cloud services for buildable architectures for projects.
Google Cloud Icons
Disaster Recovery and Backup
Minimum viable product for implementing disaster recovery and backup
Website and API Hosting
GCE and load balancing, GKE/GAE and container management, Cloud DNS, Cloud SQL or other database
*used for IoT applications as example or to have an API service
Used for providing content closer to certain locations in the world, if needed.
Web Application on Google App Engine
*You would choose Google App Engine over Google Kubernetes Engine if you want autoscaling.
*Cloud SQL was meant to replace traditional on-site relational databases and OLTP databases.
Cloud Storage, BiqQuery, Cloud Dataflow or third party ETL such as Talend, Data Studio or or third party business intelligence/data visualization tool such as Tableau
You can use third party vendors such as Tableau or BIME for data visualization with BigQuery.
Used for IoT devices or other increasing volumes of source data.
Streaming is N at a time, whether it be one, 5 or a window at a time, while Batch is a larger group sent over a period of time such as thousands of records being sent every one hour, four hour or 24 hours. Streaming can be used for real time.
Internet of Things
Cloud Storage, Cloud Pub/Sub, Cloud Bigtable, Cloud Dataflow, BigQuery
MQTT is one type of protocol for IoT devices.
Launch Checklist for Google Cloud Platform
GCP Launch Checklist