This series of articles is devoted to log files (logs) of the Apache web server, their settings, format, commands, and special programs for analyzing web server logs are considered. The Apache HTTP server provides many different mechanisms for logging everything that happens on your server, from the initial request and URL mapping process to the final connection authorization, including any errors that may have occurred in the process. In addition, third-party modules may provide logging capabilities or insert entries into existing log files, and programs such as CGI programs, PHP scripts, or other processors may also send messages to the server’s error log. Web server logs contain a lot of interesting information! The server access logs can be used to create a collective portrait of the audience: in which countries and cities they live, which operating systems they use, which browsers they use to view the site, at what time they are most active.
From which sites came to you, which search engines prefer, how many pages are viewed for each visit to the site. And no less important are the logs for monitoring the state of the web server and sites: which pages were not found, web server errors, the degree of load, detection of bot activity, detection of malicious activity, search for traces of hacking, detection of hacking paths. In general, server access logs should be understood, configured, and used primarily by webmasters and system administrators who maintain the server. At the same time, the attacker, or the person investigating the consequences of the attacker’s actions, also needs to understand what exactly is stored in the web server log, what benefit can be gained from them, or how to cover their tracks, or how to analyze access log files to find problems , attacks and traces of hacking.
Different types of Apache logs are managed by different web server modules and have different control directives and options to specify the log string format. There are the following types of Apache web server logs:
Logs of execution of CGI scripts– If ScriptLog is not specified, no error log is generated. If ScriptLog is set, any CGI errors are logged to the file specified as an argument.
Per-module logging (module event logging) – The LogLevel directive allows you to specify the logging importance level for each module. That way, if you’re only troubleshooting one particular module, you can increase its log volume without getting too much information about other modules you don’t care about. This is especially useful for modules like mod_proxy or mod_rewrite where you want to know the details of what it’s trying to do.
The mod_log_config module provides flexible logging of client requests. Logs are written in a customizable format and can be written directly to a file or to an external program. Conditional logging is provided, meaning that individual requests can be included or excluded from the log based on the characteristics of the request. This module is key to ensuring the Access Log works. This module supports the following directives:
The TransferLog and CustomLog directives on each server can be used multiple times so that each request is logged to multiple файлах.
The server access log records all requests processed by the server. The location and contents of the access log are controlled by the CustomLog directive. The LogFormat directive can be used to simplify the selection of log content. This section describes how to configure the server to write information to the access log. Of course, storing information in the access log is only the beginning of log management. The next step is to analyze this information to obtain useful statistics. Log analysis is generally not part of the web server itself, but will be covered in one of the next articles in this series. Different versions of Apache httpd used other modules and directives to manage the access log, including mod_log_referer, mod_log_agent, and the TransferLog directive. The CustomLog directive now includes the functionality of all old directives. The access log format is easily configurable. The format is specified using a format string, which is very similar to the C-style printf(1) format string.
That is, from a practical point of view, Access Log is the same as mod_log_config, since it is this module that provides the functionality of Access Log. Additionally, Access Log uses mod_logio and mod_setenvif modules to extend functionality. For example, the mod_logio module allows you to log the exact amount of data sent and/or received during a user request and response. Since it is the same, the directives in Access Log and mod_log_config are the same.
The format argument to the LogFormat and CustomLog directives is a string. Based on this line, an entry will be generated in the log file for each request. This string can contain literal characters that will be copied to the log files as-is, and the C-style control characters “n” and “t” to record the new-line and tab characters.
Literal quotes and backslashes should be escaped with a backslash (). Different query characteristics are indicated by lines starting with the % character. In the log file, the following lines will be replaced with the following values shown in Screenshot 1, Screenshot 2, Screenshot 3.
Individual elements can be restricted to print only for responses with certain HTTP status codes by placing a comma-separated list of status codes immediately after the “%”. A list of status codes may be preceded by a “!” to indicate an objection. The “<“ and “>” modifiers are required to select whether the original or final query should be recorded. This can be used for requests that are internally redirected (Screenshot 4). By default, the % directives %s, %U, %T, %D, and %r look at the original request, while the rest look at the final request.
So, for example, %>s can be used to record the final state of a request, and % can be used to record the original authenticated user on a request internally redirected to an unauthenticated resource.
For security reasons, starting with version 2.0.46, non-printing and other special characters %r, %i, and %o are escaped using the sequences xhh, where hh represents the hexadecimal representation of the raw byte. Exceptions to this rule are ” and , which are escaped by adding a backslash, and all whitespace characters that are written using C-style notation (n, t, etc.). In versions prior to 2.0.46, for these escape lines were not performed, so you have to be quite careful when working with raw log files in these versions.Because in httpd 2.0, unlike 1.3, the %b and %B format lines do not represent the number of bytes sent to the client, but simply the size in HTTP response bytes (which will be different, for example if the connection is down) or if SSL is used).The %O format provided by mod_logio logs the actual number of bytes sent over the network Note: mod_cache is implemented as a fast handler, not standard handler. Therefore, the %R format string will not return any handler information when content caching is enabled. The “^” character at the beginning of three-character formats is irrelevant, but must be the first character of any newly created three-character format to avoid potential conflicts with log formats that use string literals adjacent to the format specifier, such as %Dus.
Context:
The BufferedLogs directive causes mod_log_config to store multiple log entries in memory and write them to disk together, rather than writing them after each request. On some systems, this can result in more efficient disk access and therefore better performance. It can only be installed once for the entire server; it cannot be configured for each virtual host. This directive should be used with caution as a failure may result in loss of log data.
Syntax:
Context:
The CustomLog directive is used to log requests to the server. The format of the log, the method of recording in the log is indicated, and here you can specify the condition based on the characteristics of the request using environment variables under which an entry in the log will be made. The first argument, which specifies the location to which the logs will be written, can take one of the following three types of values:
The second argument shows what will be written to the log file. It can specify either an alias defined by the preceding LogFormat directive, or it can be an explicit format string. For example, the following two sets of directives have exactly the same effect:
The third argument is optional and specifies whether or not to log a specific request. The condition can be the presence or absence (in the case of the ‘env=!name’ proposal) of a certain variable among the server. Alternatively, the condition can be expressed as an arbitrary logical expression. If the condition is not met, the request will not be registered. References to HTTP headers in an expression do not cause the header names to be added to the Vary header. Environment variables can be set per request using the mod_setenvif and/or mod_rewrite modules. For example, if you want to write requests for all GIF images on your server to a separate log file, but not to the main log, you can use:
Or, to reproduce the behavior of the old RefererIgnore directive, you can use the following:
Syntax:
Context:
Compatibility: Available in Apache HTTP Server 2.4.19 and later. The GlobalLog directive defines a log common to the primary server configuration and all configured virtual hosts. The GlobalLog directive is identical to the CustomLog directive except for the following differences:
GlobalLog not allowed in the context of a virtual host.
GlobalLog used by virtual hosts that define their own CustomLog rather than the globally defined CustomLog.
Context:
This directive defines the format of the access log file. The LogFormat directive can take one of two forms. In its first form, with only one argument, this directive sets the log format to be used by the logs specified in the following TransferLog directives. One argument can specify an explicit format, as discussed in the log user formats section above. Alternatively, it can use an alias to refer to the log format defined in the preceding LogFormat directive, as described below. The second form of the LogFormat directive associates an explicit format with an alias. This alias can be used in subsequent LogFormat or CustomLog directives instead of repeating the entire format string. The LogFormat directive that defines an alias does nothing else, i.e. it only defines an alias, it doesn’t actually apply the format or set it as a default. This will not affect subsequent TransferLog directives. Also, LogFormat cannot use one alias to define another alias. Note that the alias must not contain percent signs (%).
Example:
This directive has the same arguments and effect as the CustomLog directive, except that it does not allow you to explicitly specify the log format or log queries based on conditions. Instead, the log format is determined by the last specified LogFormat directive, which does not specify an alias. The common log format is used unless otherwise specified.
Example:
A typical configuration for an access log might look like this:
It sets the common alias and associates it with a specific log format string. A format string consists of directives with a percent sign, each of which tells the server to log a specific piece of information. Literal characters can also be placed in the format string and will be copied directly to the log output. The quotation mark character (“) must be escaped by placing a backslash in front of it so that it is not interpreted as the end of a format string. A format string can also contain the special control characters “n” for a new line and “t” for a tab. The CustomLog directive sets a new log file using a specific alias. The filename for the access log is specified relative to the ServerRoot unless it begins with a slash. The above configuration will write log entries in a format known as Common Log Format (CLF). This standard format can be created by many by various web servers and read by many log analysis programs Log file entries created in CLF will look something like this:
[day/month/year:hour:minute:second zone]
day = 2 * digits
month = 3*letters
year = 4*digits
hour = 2*digits
minute = 2 * digits
second = 2 * digits
zone = (`+’ | `-‘) 4*digits
You can display the time in a different format by specifying %{format}t on the log format line, where the format is the same as strftime(3) from the C standard library, or one of the supported special markers.
Another commonly used format string is called Combined Log Format. Can be used in the following way:
This format is the same as the Common Log Format, with the addition of two more fields. Each of the additional fields uses a %{header}i %{header}i directive, where header can be any HTTP request header. The access log in this format will look like this:
Multiple access logs can be created by specifying multiple CustomLog directives in the configuration file. For example, the following directives will create three access logs. The first contains basic CLF information, while the second and third contain referrer and browser information. The last two lines of CustomLog show how to simulate the effects of the ReferLog and AgentLog directives.
This example also shows that there is no need to specify an alias using the LogFormat directive. Instead, the log format can be specified directly in the CustomLog directive.
There are times when it is convenient to exclude certain access log entries based on the characteristics of the client request. This is easy to do with environment variables. First, an environment variable must be set to indicate that the request satisfies certain conditions. This is usually achieved using SetEnvIf. The env= clause of the CustomLog directive is then used to include or exclude requests that have the environment variable set. Some examples:
As another example, consider recording requests from English-speaking users to one log file, and if they do not speak English, to another log file.
In a caching scenario, I would like to know about cache performance. A very simple way to find out would be:
mod_cache will run before mod_env and, if successful, will deliver content without it. In this case, the cache will cause a record to appear, and if there is no cache, then a 1 will be written. In addition to the env= syntax, LogFormat supports logging values of variables that depend on the HTTP response code:
In the first example, User-agent will be logged if the HTTP status code is 400 or 501. Otherwise, the literal string “-” will be logged instead. Similarly, in the second example, the Referer will be logged if the HTTP status code is not 200, 204, or 302 (note the “!” before the status codes). Although we have just shown that conditional logging is very powerful and flexible, it is the only way to manage the content of the logs. Log files are more useful when they contain a complete record of server activity. In most cases, it is easier to process the full log files to extract only the data you need or remove certain information.
Even on a moderately loaded server, the amount of information stored in the log files is very large. The access log file typically grows by 1MB or more than 10,000 requests. Therefore, it is necessary to periodically rotate the log files by moving or deleting existing logs. This cannot be done while the server is running because Apache httpd will continue writing to the old log file while it keeps that file open. Instead, the server must be restarted after moving or deleting log files to open the new log files. Using graceful restart, the server can be commanded to open new log files without losing any existing or pending connections from clients. However, to do this, the server must continue writing to the old log files until the old requests are finished serving. Therefore, you need to wait some time after the restart before doing any processing of the log files. A typical script that simply rotates logs and compresses old logs to save space:
Apache httpd is capable of writing access and error log files over a pipe (through a pipe) to another process rather than directly to a file. This capability greatly increases the flexibility of logging without adding code to the host server. To write logs to a pipe, simply replace the filename with the pipe character “|” followed by the name of the executable file that should accept the log entries on its standard input. The server will start the piped-log process when the server starts, and restart it if it crashes while the server is running (this last feature allows the technique to be called “reliable piped logging”.) Pipeline log processes are spawned by a parent Apache httpd process and inherit the ID of that process This means that pipeline logging programs are usually run as root. Therefore, it is very important that the programs are simple and safe. One important use of pipelined logs is to allow log rotation without restarting the server. The Apache HTTP server includes a simple rotatelogs program for this purpose. For example, to rotate logs every 24 hours, you can use:
Note that the quotation marks are used to enclose the entire command that will be called for the pipe. Although these examples pertain to the access log, the same method can be used for the error log. As with conditional logging, pipeline logs are a very powerful tool, but should not be used where a simpler solution is available, such as offline post-processing. By default, a piped log process is spawned without calling a shell. Use “|$” instead of “|” to run with a shell (usually from /bin/sh-c):
This was the default behavior for Apache 2.2. Depending on the specifics of the shell, this may result in an additional shell process for the lifetime of the log feed application and signal handling problems on restart. For reasons of compatibility with Apache 2.2, the notation “||” is also supported and equivalent to using ‘|’. Note for Windows: Please note that on Windows you may experience problems starting many logging processes, especially when HTTPD is running as a service. This is caused by a lack of desktop heap space. The desktop space allocated to each service is specified by the third argument of the SharedSection parameter in the HKEY_LOCAL_MACHINESystemCurrentControlSetControlSessionManagerSubSystemsWindows registry value. Change this value with caution; the usual caveats for modifying the Windows registry apply, but you can also exhaust the desktop heap space pool if the number is set too high.
When running a server with many virtual hosts, there are several options for working with log files. First, you can use logs just like you would on a single-host server. By simply placing logging directives outside the <VirtualHost> sections in the primary server context, all requests can be logged to a single access log and error log. This method does not allow easy collection of statistics from individual virtual hosts. If CustomLog or ErrorLog directives are included in the <VirtualHost> section, all queries or errors for that virtual host will be logged only to the specified file. Any virtual host that does not have logging directives will still send its queries to the master server’s logs. This method is very useful for a small number of virtual hosts, but if the number of hosts is very large, it can be difficult to manage. Also, it can often cause problems with insufficient file descriptors. There is a very good trade-off for the access log. By adding virtual host information to the log format string, you can log all hosts in one log and then split the log into separate files. For example, consider the following directives.
%v is used to log the name of the virtual host serving the request. A program such as split-logfile can then be used to further process the access log to split it into a single file for each virtual host.
Anyone who can write to the directory where Apache httpd writes the log file can almost certainly get access to the uid the server is running from, which is usually root. DO NOT give people write access to the directory where the logs are stored without knowing the consequences. In addition, log files may contain information provided directly by the client without escaping. Therefore, malicious clients can insert control characters into log files, so care must be taken when working with raw logs.
In normal operation, Apache starts as root and switches to the user specified by the User directive to service requests. As with any command executed by the root user, you should ensure that it is protected from modification by non-root users. Not only the files themselves must be writable only by root, but also the directories and the parents of all directories.
For example, if you choose to place ServerRoot in /usr/local/apache, it is recommended to create this directory as root using the following commands:
It is assumed that /, /usr, and /usr/local can only be modified by the root user. When installing the httpd executable, you should ensure that it is protected in a similar way:
You can create a subdirectory of htdocs that can be modified by other users – since root never executes any files from there and should not create files there. If you allow non-root users to modify any files that root executes or writes to, you open your system to root compromise. For example, someone could replace the httpd binary so that it executes arbitrary code the next time it is run. If the log directory is writable (a non-root user), someone can replace the log file with a symbolic link to any other system file, and then root can overwrite that file with arbitrary data. If the log files themselves are writable (a non-root user), then someone can overwrite the log itself with fake data.