November 3, 2022

Application logging improvements for Google App Engine Java 11 runtime

Author: Jinlin Yang, TerraTrue engineer

At TerraTrue, we run a number of backend services on the Google App Engine standard environment. Earlier this year, we migrated from the Java 8 runtime to the Java 11 runtime. As part of the migration to the Java 11 runtime, we faced a challenge with the default log based request traceability in the Java 11 runtime that we had to overcome. This blog post first describes the problem which didn’t exist with the Java 8 runtime and presents solutions. We have also open sourced one possible solution in case it may help others facing the same limitation with the Java 11 runtime on Google App Engine.

Problem

App Engine collects two types of logs:

Request logs: App Engine automatically produces one request log for a request that hits an App Engine application. The request log contains various metadata about the request.

App logs: an application can emit logs using one of the Java logging libraries such as JUL or SLF4J. They are usually the only records of the execution and therefore crucial for developers to debug an application.

The Java 8 runtime automatically bundles app logs with the corresponding request log. In the Logs Explorer, you can easily perform the following operations, which are important for diagnosing problems and configuring log based alert policies:

Expand a request log and see the app logs nested under it.
Perform full text search that queries both the request log and app logs.
Filter the logs by the metadata fields and severity.

The Java 11 runtime, however, no longer bundles app logs with the corresponding request log. The app logs are emitted as separate records, and as a result, the ones from different requests intermingle with each other. This makes it very difficult to debug issues because we cannot isolate the app logs of a request.

The Trace Metadata Approach

This section describes an approach we tried out first and explains why it doesn’t satisfy our needs.

For every request, App Engine attaches the x-cloud-trace-context header containing a unique trace id, which is included in the request log. After some research online, we found that if we set the trace metadata field in the app logs to the request’s trace id, then the app logs are grouped together under the request log. For a Spring Boot application, the spring-cloud-gcp-starter-logging library can automatically handle this.

Although this approach solves the grouping problem, it misses the following important functionality:

Ability to perform full text search of the app logs which is supported by the Java 8 runtime.
Ability to filter by the severity of the app logs.

After further research online and contacting Google support, we didn’t find any readily available solution to these problems.

The ProtoPayload Approach

Upon examining the request logs in the Java 8 runtime, we discovered that the protoPayload.line field contained the list of app logs. A request log in the Java 11 runtime, however, didn’t have the protoPayload.line fields and therefore didn’t include any app logs. We observed the Java 11 request log produced by the Trace Metadata approach didn’t have the protoPayload.line field either.

Based on these observations, we hypothesized that we needed to produce the request log so that its protoPayload.line field contained the list of app logs. We suspected this is the key for getting all the functionalities we needed to work. We cannot control the request log automatically generated by App Engine. So we decided to produce a separate request log.

We found the documentation of the RequestLog class, which is the Java representation of a request log. The RequestLog.line field contains a list of LogLine objects that are the Java representation of the protoPayload.line field in a request log. To produce our own request log, we need to create a RequestLog object, populate its line field, and then use the Cloud Logging client library to write it. However, we couldn’t find any publicly available Google library that includes this class.

Fortunately, we found the RequestLog’s protobuf specification in Google’s googleapis repo and followed the build instructions to generate the corresponding Java files.

To test our hypothesis, we created a test App Engine application with one endpoint that created a RequestLog object with multiple nested LogLine objects (app logs) and wrote the RequestLog using the Cloud Logging client library. Our test application successfully wrote the RequestLog and we saw the new request log in the Logs Explorer. We were also able to search and filter on the request log and the app logs. This proves the feasibility of this approach.

Productionization

To productionize the ProtoPayload approach, we have to address the following challenges:

How do we introduce this new way of logging without having to change our existing application code and with little performance impact?
How do we ensure that app logs are grouped by their corresponding requests when an App Engine instance of our application handles multiple requests in parallel?
How do we pass request specific information to the logging implementation and also signal to flush it at the end of the request?
How do we make sure that our application still works in the App Engine dev server for local development?

How do we introduce this new way of logging without having to change our existing application code and with little performance impact?

Our application uses Spring Boot that uses SLf4J with Logback for logging by default. Our application uses a mix of JUL and SLF4J logging. Internally, Spring Boot uses the jul-to-slf4j library to route all logs from JUL to the SLF4J. As a result, our logs are all emitted through Logback.

Google recommends using the Logback appender to integrate with Cloud Logging and provides a reference implementation of the Logback appender.

After configuring our application to use Google’s Logback appender, we could see our app logs emitted from both JUL and SLF4J logging in the Logs Explorer without any modification to how our application does logging. This indicates that we can rely on the Logback appender interface to decouple the logging code in our application and write the RequestLog objects to Cloud Logging. Specifically, whenever our application emits logs, the append(ILoggingEvent e) interface will be called. We need to implement our own custom Logback appender that converts a logging event to a LogLine object, includes it in a RequestLog object, and then writes the RequestLog object to Cloud Logging.

Our Logback appender extends the UnsynchronizedAppenderBase<ILoggingEvent> abstract class. It buffers the LogLines and creates a new RequestLog object and writes it to Cloud Logging only when either the buffer fills with 50 app log entries or we reach the end of a request. Because of the buffering, we have seen little impact on our application’s performance.

How do we ensure that app logs are grouped by their corresponding requests when an App Engine instance of our application handles multiple requests in parallel?

After addressing the first challenge, our app logs are correctly grouped in the request log we generate if our App Engine instance only processes at most one request at any time.

However, if our App Engine instance handles multiple requests in parallel, we saw the app logs for different requests were incorrectly grouped under the request logs. This is because all the logging events go to the same Logback appender, which cannot differentiate which requests a logging event belongs to. We need to address this because we configure our App Engine instances to handle multiple requests in parallel to achieve high throughput.

Recall from our first approach that every App Engine request has a unique trace id header. Our solution creates a Map inside our custom Logback appender with the trace id as the key and a list of LogLine for the same request (i.e., same trace id) as the value. Because this Map is accessed by multiple threads, we use the synchronized keyword to guard access to it. We present how we pass a request’s trace id into the Logback appender in the next section.

When our application emits a log, our implementation of append(ILoggingEvent e) is called. It converts the logging event to a LogLine object, finds the LogLine buffer from the Map using the current trace id, and appends the Logline to the buffer. If the buffer exceeds the 50 LogLine limit, the appender creates a new RequestLog object and writes to Cloud Logging.

The appender also flushes the log buffer of a trace id at the end of the corresponding request. We describe how it works in the next section.

Below is the snippet of our implementation.

public class CustomLoggingAppender extends UnsynchronizedAppenderBase<ILoggingEvent> {
  private static final int LOG_LINES_LIMIT = 50;
 
  // Accumulate log entries per trace id, which is unique per http request.
  // Any access to this HashMap must be protected by synchronized keyword
  private HashMap<String, List<LogLine>> traceId2LogLines = new HashMap<>();
 
  @Override
  protected void append(ILoggingEvent e) {
    // If there is no traceId, we drop the log
    final String traceId = MDC.get(Constants.MDC_KEY_TRACE_ID);
    if (Strings.isNullOrEmpty(traceId)) {
      return;
    }
 
    final LogLine logLine = toLogLine(e);
    final int currentSize = addLogLine(traceId, logLine);
 
    if (currentSize >= LOG_LINES_LIMIT) {
      List<LogLine> copy = swapOut(traceId, LOG_LINES_LIMIT, false);
      if (copy != null) {
        writeOut(copy);
      }
    }
  }
 
  // Add a LogLine in a thread-safe way
  private synchronized int addLogLine(final String traceId, final LogLine logLine) {
    if (!traceId2LogLines.containsKey(traceId)) {
      traceId2LogLines.put(traceId, Lists.newArrayList());
    }
    List<LogLine> logLines = traceId2LogLines.get(traceId);
    logLines.add(logLine);
    return logLines.size();
  }
 
  // Return a copy of the log lines for a specific traceId if it has at least min 
  // lines. This method is thread-safe
  private synchronized List<LogLine> swapOut(
     final String traceId, final int min, final boolean removeTraceIdEntry) {
    final List<LogLine> logLines = traceId2LogLines.get(traceId);
    if (logLines == null) {
      return null;
    }
 
    if (logLines.size() < min) {
      if (removeTraceIdEntry) {
        traceId2LogLines.remove(traceId);
      }
      return null;
    }
 
    List<LogLine> copy = Lists.newArrayList(logLines);
    logLines.clear();
    if (removeTraceIdEntry) {
      traceId2LogLines.remove(traceId);
    }
    return copy;
  }
 
  public void finalWrite() {
    // Return if there is no traceId
    final String traceId = MDC.get(Constants.MDC_KEY_TRACE_ID);
    if (Strings.isNullOrEmpty(traceId)) {
      return;
    }
 
    List<LogLine> copy = swapOut(traceId, 1, true);
    if (copy != null) {
      writeOut(copy);
    }
  }
}

How do we pass request specific information to the logging implementation and also signal to flush it at the end of the request?

We need to pass request specific information to our custom Logback appender for the following reasons:

We need to pass the trace id of the current request so that our Logback appender can associate a logging event with the corresponding trace id.
We want to populate some metadata fields in the request log so that we can filter for them in the Logs Explorer. For example, the request URL, http method, user agent, IP address, referrer, and http status are useful metadata fields for debugging.
We need to signal the appender to flush out the log buffer at the end of a request.

We solve these problems as follows:

We implement a servlet filter, TraceFilter, that extracts the request specific metadata and sets them in the Mapped Diagnostic Context (MDC). The MDC is a thread local key value pair that provides a Map like interface. All the interfaces in MDC are static methods and the MDC transparently manages the contextual data on a per thread basis.
We configure the TraceFilter to execute at the highest priority so that it is always executed first when serving a request. This allows us to capture most of the logs emitted during handling of the request.
If an App Engine application creates any new thread, we modify the code to pass the trace id when creating the new thread and save the trace id in the MDC at the beginning of the thread.
During the initialization of TraceFilter, we call LoggerFactory.getILoggerFactory().getLoggerList() to retrieve the list of Logger instances. Then we iterate through the Logger instance list to discover the custom appender that we have created and save it as a local variable in the TraceFilter.

private CustomLoggingAppender customLoggingAppender;

 @Override
    public void init(FilterConfig filterConfig) {
      logger.info(String.format("%s has started.", this.getClass().getName()));
      LoggerContext loggerContext = (LoggerContext)     
    LoggerFactory.getILoggerFactory();

      for (Logger logBackLogger : loggerContext.getLoggerList()) {
        Iterator<Appender<ILoggingEvent>> appenderIterator =     
    logBackLogger.iteratorForAppenders();
        while (appenderIterator.hasNext()) {
          Appender<ILoggingEvent> appender = appenderIterator.next();
          if ("CUSTOM_LOGGING_APPENDER".equalsIgnoreCase(appender.getName())) {
            logger.info("Found the CUSTOM_LOGGING_APPENDER.");
            customLoggingAppender = (CustomLoggingAppender) appender;
          }
        }
      }
    }

We implement a finalWrite method in our Logback appender. In the TraceFilter.doFilter method, we put the call to chain.doFilter(request, response) in a try-catch-finally block. In the finally block, we call the finalWrite method of the appender to flush out the log buffer.

How do we make sure that our application still works in the App Engine dev server for local development?

App Engine provides a dev server so that developers can test their application locally before deploying to the cloud. Being able to see the app logs from the dev server is important to debug an application locally. There is, however, no local dev version of Cloud Logging. So we cannot enable and test the functionality of our custom Logback appender locally.

To solve this problem, we leverage the following features of Logback and Spring:

Logback supports configuration of multiple appenders and flexible binding between appenders and various loggers.
Spring’s Logback configuration (logback-spring.xml) supports conditional rendering of different configurations based on Spring profile.

We define two different Logback configurations. One is enabled for the cloud Spring profiles and uses our custom Logback appender. The other is enabled for the dev Spring profile and uses the console appender.

Further Improvement Ideas

Because we use a servlet filter, TraceFilter, to set the trace id in the MDC, our approach cannot group app logs produced before TraceFilter starts. For example, it cannot group the app logs emitted during a warmup request when the Spring Boot application boots up. We emit such logs to the console appender that can be viewed under the stdout log name in the Logs Explorer.

In addition, our approach produces one more request log in addition to the one produced by App Engine. This is because there is no way for us to control the request log generated by App Engine, and therefore, we had to generate a new request log to nest the app logs. These two request logs can be correlated by querying the trace field in the Logs Explorer.

Our request log doesn’t have some metadata fields (e.g., megaCycles and cost) included in the one generated by App Engine because our application doesn’t have access to this information. In addition, our request log doesn’t have the accurate request latency because we can only measure when TraceFilter.doFilter is called. If you filter the logs based on these metadata fields, you will only see the request logs produced by App Engine.

Get the Code

For reference, you can find the source code of our custom Logback appender, TraceFilter, the logback configuration file along with a simple Spring Boot application here.

Acknowledgements

We want to thank Tom Harris and Rhys Evans for their collaboration on this project. Special thanks to Uday Sreekanth and Jad Boutros for their continual support and guidance.

Privacy · April 11, 2022

Privacy’s shifting left: Are you ready to move?

Privacy · June 21, 2022

Application logging improvements for Google App Engine Java 11 runtime

Problem

The Trace Metadata Approach

The ProtoPayload Approach

Productionization

How do we introduce this new way of logging without having to change our existing application code and with little performance impact?

How do we ensure that app logs are grouped by their corresponding requests when an App Engine instance of our application handles multiple requests in parallel?

How do we pass request specific information to the logging implementation and also signal to flush it at the end of the request?

How do we make sure that our application still works in the App Engine dev server for local development?

Further Improvement Ideas

Get the Code

Acknowledgements

Related Posts

Privacy’s shifting left: Are you ready to move?

Why integrations are essential to do privacy well

Platform

Solutions

Resources

Customers

Company

Application logging improvements for Google App Engine Java 11 runtime

Problem

The Trace Metadata Approach

The ProtoPayload Approach

Productionization

How do we introduce this new way of logging without having to change our existing application code and with little performance impact?

How do we ensure that app logs are grouped by their corresponding requests when an App Engine instance of our application handles multiple requests in parallel?

How do we pass request specific information to the logging implementation and also signal to flush it at the end of the request?

How do we make sure that our application still works in the App Engine dev server for local development?

Further Improvement Ideas

Get the Code

Acknowledgements

Related Posts

Privacy’s shifting left: Are you ready to move?

Why integrations are essential to do privacy well