minherz
Google Cloud - Community
8 min readSep 27, 2023

--

Photo by Mika Baumeister on Unsplash

This solution shows how to import logs that were previously exported to Cloud Storage back to Cloud Logging. It can be useful in scenarios when additional log analysis for the exported logs is required. Organizations face demands for log analysis of previously exported logs as a result of incident investigations or other needs for audit of the past events. For example, you might want to analyze connections to your databases for the first quarter of the last year as a part of database access audit. You can read further this post or look into the reference at the Google Cloud architecture center.

The step-by-step instructions explain how to configure and run the log importing job. The following sections examine a few limitations that might be important in your scenario. This solution assumes that the logs you intend to import from a Cloud Storage bucket are organized in the export format.

Architecture

The following diagram depicts a flow diagram of Google Cloud services that are used in the solution.

  • The solution uses Cloud Run job to import logs.
  • It reads objects storing log entries from Cloud Storage.
  • Converts the objects into Cloud Logging API LogEntry structures.
  • Writes the converted log entries to Cloud Logging.
  • Further, users can feature use BigQuery SQL to run analytical queries on imported logs.

The provided import logs implementation supports finding exported logs for the specified Log ID in the requested time range based on the organization of the exported logs in the Cloud Storage bucket. The implementation aggregates multiple LogEntry structures into batches to reduce Cloud Logging API quota consumption. The implementation handles quota errors when necessary. The implementation is designed to run as Cloud Run job. The following sections describe how to create and run an import log job step by step.

Set up the logging import

Select destination project

The import process writes logs to Cloud Logging as described in the Logging routing and storage overview. You must have a destination project where logs will be stored. We recommend that you create a designated project for the imported logs. If you decide to use an existing project, review the filters on all the sinks in the project including the default _Required and _Default sinks, to ensure that the imported logs won't be routed to unwanted destinations.

Best practice: Create a new project for importing logs. Make sure that the project does not have log sinks inherited from organizations or folders.

Upgrade log bucket to use Log Analytics

The imported logs are stored, by default, in the _Default log bucket of the selected destination project. While you can run queries on the imported logs using Logging Query Language, Log Analytics provides even more flexibility by allowing you to use SQL queries. To use Log Analytics, you must first enable the use of Log Analytics on the log bucket.

Create import job

Create a new Cloud Run job. Use the ready-for-use image URI of the import logs implementation located at us-docker.pkg.dev/cloud-devrel-public-resources/samples/import-logs-solution. See instructions how to build your own image using the reference implementation of the solution on Github. Use the following values for other configuration parameters of the new job:

  • Maximum number of retries per failed task: 0 (no retries)
  • Number of CPUs: 2
  • Memory: 2Gi
  • Task timeout: 60m
  • Number of tasks: 1

Configure the following environment variables for the new job:

  • START_DATE a string in the format MM/DD/YYYY describing the beginning of the date range. Logs with timestamp older than this date will not be imported.
  • END_DATE a string in the format MM/DD/YYYY describing the end of the date range. Logs with timestamps newer than this date will not be imported.
  • LOG_ID an alphanumeric logging identifier a.k.a. Log ID. Log ID is a part of the logName field in the LogEntry structure that is used to store logs in Google Cloud Logging. For example: cloudaudit.googleapis.com
  • STORAGE_BUCKET_NAME a name of the Cloud Storage bucket where logs are currently stored (without gs:// prefix).

For information about the implications of using different values for CPU, memory, and number of tasks, see Limitations and Costs. If you are importing a large volume of logs from Cloud Storage, then consider increasing the number of tasks and the task timeout. Before using a timeout value of more than one hour, see Using task timeouts greater than one hour.

Create a service account to run your Cloud Run job

Configure a dedicated service account for your Cloud Run job. Grant the service account the following roles:

  • On the Cloud Storage bucket that contains the logs, grant at least the Storage Object Viewer
  • On the _Default log bucket or on the destination project, grant at least the Logs Writer

Run the import job

After you have created the Cloud Run, job, you can execute the created job. After running the job, you can track job execution details. You can also stop executing the job at any time. Stopping an executing job deletes the job. You should be aware of duplications, if you decide to re-run the sample job.

Limitations

Logs you import into Cloud Logging are subject to the Logging limits and quotas. The following sections describe limits and quotas you might encounter when importing logs.

30-day retention period and imported logs

Cloud Logging requires incoming log entries to have timestamps that don’t exceed 30-day retention period. Imported log entries with timestamps older than 30 days from the import time are not stored. The implementation validates the date range set up by the Cloud Run job configuration to avoid importing logs that are older than 29 days, leaving a one-day safety marging.

To import logs older than 29 days, you can modify the implementation as the following:

  • Remove the 30-day validation of the date range
  • Add the original timestamp as a user label to the log entry
  • Reset the timestamp label of the log entry to allow it to be ingested with the current time timestamp

You will have to use the label instead of the timestamp in your Log Analytics queries.

Log-based metrics calculation

The imported logs don’t affect any log-based metrics because their timestamps are in the past.

Note: if you use a label to import old logs, the timestamp on imported log entries records the import time. These entries are counted by associated log-based metrics, even though the data in them is old.

Logging API quotas and limits

The Cloud Logging API has a default quota of 120,000 write requests per minute, and a 10-MB limit on size of each write request. The implementation is designed to work within the default write quota. If your write quota is lower than the default, the implementation might fail. The implementation has a very basic retrying mechanism that might not suit your needs.

Performance

If you are importing a large volume of logs, then you might need to increase a number of tasks and CPU/memory in your job configuration. Increasing these configuration values might increase your incurred Cloud Run costs.

Log duplications

If you re-run the same job multiple times, then the already imported logs will be stored again, creating duplicate entries. You do not see duplicates when querying the imported logs, because Cloud Logging considers log entries from the same project, with the same insertion ID and timestamp, to be duplicates, and removes them from the query results. For more information, see the insert_id field in the Cloud Logging API reference. However, duplicated log entries might result in increased storage cost.

Imported log names

The solution ingests logs to the destination project with the log ID “imported_logs”. So, the log name of each imported entry will be “projects/PROJECT_ID/logs/imported_logs”. The original log name is stored as the user label with the key “original_logName”. You must account for this renaming when you query the imported logs.

Costs

The cost for importing logs by using the solution described in this document has multiple contributors. The following sections describe how costs are incurred.

Log storage

For information about Cloud Logging storage and retention costs, see Cloud Logging pricing. Additional storage costs can be incurred by log duplication and by unexpected routing tasks that result in storing log entries in additional destinations.

Running Cloud Run Job

You can estimate the cost of running Cloud Run jobs by using Google Cloud Pricing Calculator. When you use the recommended configuration and import a small volume of logs, the import job might qualify for the free tier.

Use of the Cloud Storage API

The calls to the Cloud Storage API used in the implementation are divided into pricing classes based on what they do:

  • Calls to determine the objects that store log entries use “class A” pricing.
  • Calls to read the objects use “class B” pricing.

For more information, see Cloud Storage pricing.

The total cost depends on the type of Cloud Storage buckets used and the total number of stored logs.

Additional Costs

Implicit costs can be incurred if the destination Google Cloud project has log sinks that route imported logs to a user-defined log bucket or other destinations. The implementation reads logs from the Cloud Storage bucket using the Cloud Storage API. If the bucket is located in a different region from where the import job is created, you can incur egress costs for reading data across regions.

Importing logs to a user-defined log bucket

All incoming logs pass through routing and storage. Imported logs can be routed to multiple supported destinations by the log sinks that are defined in the destination project. You can route imported logs to a custom log bucket by defining a new route. After you have selected the destination project, do the following:

  • Create a log bucket
  • Create a log sink to route logs to the new bucket

The subsequent sections describe these steps in more detail.

Create a log bucket

Follow the instructions to create a log bucket to create a log bucket in the destination project. If you want to use SQL to analyze your logs, upgrade the bucket to use Log Analytics.

Create a sink to move logs to the bucket

Follow the instructions to create a sink with the destination set to the created log bucket. Use the following expression that uses log_id function for the sink’s filter:

log_id("LOG_ID") AND timestamp < "YYYY-MM-DD"

where LOG_ID is the same as the log identifier you set for the environment variable LOG_ID and YYYY-MM-DD is today’s year, month and day.

Best practice: Exclude the imported logs from being stored in the _Default log bucket or disable the _Default log sink.

What’s next

Contributors

I want to express my appreciation for the help in writing this post to Mary Tabasko, a Technical Writer at Google Cloud and to Summit Tuladhar, a Sr. Staff Software Engineer in Cloud Operations at Google Cloud.

--

--

minherz
Google Cloud - Community

DevRel Engineer at Google Cloud. The opinions posted here are my own, and not those of my company.