What to remember if you decide to ingest logs using logging agent in Google Cloud

minherz
4 min readJun 7, 2022

Google Cloud provides a set of useful logging agents that run on almost all Google Cloud resources (a.k.a. services). You can install one out of the two logging agents on their virtual machines. The first is the recently released Ops agent that includes logging and monitoring capabilities but can be deployed only on resources hosted on Google Cloud. The second is the legacy logging agent that provides log capturing capabilities only, but it can be configured to run virtually everywhere including bare-metal workstations and servers and virtual machines hosted on other clouds and services. Google Cloud serverless resources (such as Cloud Build or Cloud Functions) also implicitly run logging agents.

What is great about the logging agents is that you do not need to think (too much) about log ingestion strategies. Just follow the XI principle of 12-Factor application (a.k.a. microservice architecture) and print your log payload to stdout or stderr. And the agent will take care the rest. It will capture the payload, add to it a metadata about the resource where the log was captured and ingest it into Cloud Logging. Since the agent runs externally, it does not consume the application’s resources and allows log ingestion asynchronously.

However, there are few things that important to know about the logging agents in Google Cloud.

Logs location

Cloud Logging uses a term logName that is composed of two components:

  • Location (e.g. project) where all log entries ingested into this log will be stored
  • Identificator of this log

When you ingest logs directly to Cloud Logging using, for example, Google client libraries, you have control over selection of the logName, However, when you use the logging agent, you don’t. In practice, it means that all logs that are ingested by the logging agent will be ingested into the project that hosts the resource on which the logging agent is running. To decompose this long sentence, if your application and logging agent are running on VM, the logs will be ingested into the project where the VM was created. Same goes for GKE cluster or any serverless resource. You still can forward the ingested logs to a desired destination. But you will need to use Log Router to do that.

Resource metadata

The logging agents do a wonderful work to capture metadata about the resource that runs your application and instrument each log entry they ingest with this metadata. The agents do it implicitly. You cannot add or update the metadata. Neither you can prevent the agents from doing it. The logging agents do not generate other metadata than that. You will need to leverage structured logging to ingest additional information with your log’s payload.

Structured payload and additional metadata

The logging agents capture payload as one line string. It means that if you print multi-line log to stdout or stderr it will be ingested as multiple log entries. You can either format the one line string with special characters or store the payload as Json structure. To support this, the logging agents can parse the Jsonified one-line string. The format of the Json can be found in the public documentation.

There is one caveat that is important to remember when using this format. The logging agent interpret any unknown field as a part of the Json payload of the log entry. It means that you do not have to format all your payload under message field. For example, printing the following to stdout:

{"message":"text of payload","latency":"12s","workflow":"main-1"}

will result in creating a log entry with the following payload field:

{
. . .
jsonPayload: {
latency: "12s",
message: "text of payload",
workflow: "main-1"
},
. . .}

There is a list of reserved fields that the agents will parse to populate the metadata fields in the log entry. Some metadata can be posted using different field names. For example, the timestamp can be posted using three different fields: time, timestampand a pair of timestampSeconds and timestampNanos. Mind that if you use several fields simultaneously, the parsing behavior is undefined.

Performance gains

The use of the logging agents for log ingestion differs from the side car strategy. Also the agents do not share process space with your application. In serverless environments such as Cloud Run or Cloud Functions, the agent does not consume CPU or memory of the resource and is able to operate after the resource is disposed. However, in IaaS environments like GKE, GCE VMs or AppEngine, the agent is installed on the resource and consumes the resource CPU and memory.

There is one more thing to consider about using one of the logging agents with your application — the agent’s productivity. Behind the scene, the agent captures output printed to stdout or stderr by tailing the relevant *nix system file, container log file or Windows pipe and processing each read line. In environments that run multiple processes in parallel like VMs or GKE work nodes, the amount of logs can be large enough for the agent to miss some of them. For example, container runtimes are configured to roll the log files for containers once they reach a certain size. The agents do not know to track the rolled file and continue tailing from the “primary” log file of the container. If the log file is rolled before the agent finished reading all lines the unread log lines will not be ingested. I would recommend to stress test your application if it is log intensive in non serverless environments at least.

--

--

minherz

DevRel Engineer at Google Cloud. The opinions posted here are my own, and not those of my company.