GoAccess

Why GoAccess?

GoAccess was designed to be a fast, terminal-based log analyzer. Its core idea is to quickly analyze and view web server statistics in real time without needing to use your browser (great if you want to do a quick analysis of your access log via SSH, or if you simply love working in the terminal).

While the terminal output is the default output, it has the capability to generate a complete real-time HTML report, as well as a JSON , and CSV report.

You can see it more of a monitor command tool than anything else.

Installation

GoAccess can be compiled and used on *nix systems.

Download, extract and compile GoAccess with:
$ wget http://tar.goaccess.io/goaccess-1.2.tar.gz
$ tar -xzvf goaccess-1.2.tar.gz
$ cd goaccess-1.2/
$ ./configure –enable-utf8 –enable-geoip=legacy
$ make
# make install

Build from GitHub (Development)
$ git clone https://github.com/allinurl/goaccess.git
$ cd goaccess
$ autoreconf -fiv
$ ./configure –enable-utf8 –enable-geoip=legacy
$ make
# make install

Docker

Prior to run GoAccess’ Docker container, place and set your GoAccess configuration file goaccess.conf inside your $HOME/data directory, which will be used by Docker to configure goaccess.

A minimal GoAccess configuration file for a Docker container with a real-time HTML report would need at least the following options to be set log-format , log-file , output , real-time-html and ws-url .

Note: Docker will bind to 0.0.0.0:7890, which means that GoAccess WebSocket server is using port 7890 and reachable from 127.0.0.1 in addition to your host IP. Ensure ws-url= points to the Docker host public IP address, otherwise it will attempt to establish a connection to localhost.

Once you have your configuration file all set, then you may run:
docker run –restart=always -d -p 7890:7890 \
-v “$HOME/data:/srv/data” \
-v “/path/to/logs:/srv/logs” \
-v “/path/to/report:/srv/report” \
allinurl/goaccess

If everything goes fine, the generated report should live under /path/to/report .

Another thing to note is that if you ever need to run it on a different port, e.g.,
docker run –restart=always -d -p 8080:7890 …

Then you can simply set the external port in ws-url . e.g., ws-url ws://localhost:8080 and keep GoAccess’ internal port in your config file set to port 7890 .

Distributions

It is easiest to install GoAccess on Linux using the preferred package manager of your Linux distribution. Please note that not all distributions will have the lastest version of GoAccess available

Debian/Ubuntu
# apt-get install goaccess

NOTE: It is likely this will install an outdated version of GoAccess. To make sure that you’re running the latest stable version of GoAccess see alternative option below.

Official GoAccess Debian & Ubuntu repository
$ echo “deb http://deb.goaccess.io/ $(lsb_release -cs) main” | sudo tee -a /etc/apt/sources.list.d/goaccess.list
$ wget -O – http://deb.goaccess.io/gnugpg.key | sudo apt-key add –
$ sudo apt-get update
$ sudo apt-get install goaccess

Note:
•For on-disk support (Trusty+ or Wheezy+), run: sudo apt-get install goaccess-tcb
• .deb packages in the official repo are available through https as well. You may need to install apt-transport-https .

Fedora
# yum install goaccess

Arch Linux
# pacman -S goaccess

Gentoo
# emerge net-analyzer/goaccess

OS X / Homebrew
# brew install goaccess

FreeBSD
# cd /usr/ports/sysutils/goaccess/ && make install clean
# pkg install sysutils/goaccess

OpenBSD
# cd /usr/ports/www/goaccess && make install clean
# pkg_add goaccess

OpenIndiana
# pkg install goaccess

pkgsrc (NetBSD, Solaris, SmartOS, …)
# pkgin install goaccess

Windows

GoAccess can be used in Windows through Cygwin. See Cygwin’s packages.

Storage

There are three storage options that can be used with GoAccess. Choosing one will depend on your environment and needs.

Default Hash Tables

In-memory storage provides better performance at the cost of limiting the dataset size to the amount of available physical memory. By default GoAccess uses in-memory hash tables. If your dataset can fit in memory, then this will perform fine. It has very good memory usage and pretty good performance.

Tokyo Cabinet On-Disk B+ Tree

Use this storage method for large datasets where it is not possible to fit everything in memory. The B+ tree database is slower than any of the hash databases since data has to be committed to disk. However, using an SSD greatly increases the performance. You may also use this storage method if you need data persistence to quickly load statistics at a later date.

Tokyo Cabinet On-Memory Hash Database

An alternative to the default hash tables. It uses generic typing and thus it’s performance in terms of memory and speed is average.

Command Line / Config Options

See options that can be supplied to the command or specified in the configuration file. If specified in the configuration file, long options need to be used without prepending — .

Examples

DIFFERENT OUTPUTS

To output to a terminal and generate an interactive report:
# goaccess access.log

To generate an HTML report:
# goaccess access.log -a > report.html

To generate a JSON report:
# goaccess access.log -a -d -o json > report.json

To generate a CSV file:
# goaccess access.log –no-csv-summary -o csv > report.csv

GoAccess also allows great flexibility for real-time filtering and parsing. For instance, to quickly diagnose issues by monitoring logs since goaccess was started:
# tail -f access.log | goaccess –

And even better, to filter while maintaining opened a pipe to preserve real-time analysis, we can make use of tail -f and a matching pattern tool such as grep , awk , sed , etc:
# tail -f access.log | grep -i –line-buffered ‘firefox’ | goaccess –log-format=COMBINED –

or to parse from the beginning of the file while maintaining the pipe opened and applying a filter
# tail -f -n +0 access.log | grep -i –line-buffered ‘firefox’ | goaccess -o report.html –real-time-html –

MULTIPLE LOG FILES

There are several ways to parse multiple logs with GoAccess. The simplest is to pass multiple log files to the command line:
# goaccess access.log access.log.1

It’s even possible to parse files from a pipe while reading regular files:
# cat access.log.2 | goaccess access.log access.log.1 –

Note that the single dash is appended to the command line to let GoAccess know that it should read from the pipe.

Now if we want to add more flexibility to GoAccess, we can do a series of pipes. For instance, if we would like to process all compressed log files access.log.*.gz in addition to the current log file, we can do:
# zcat access.log.*.gz | goaccess access.log –

Note: On Mac OS X, use gunzip -c instead of zcat .

REAL TIME HTML OUTPUT

GoAccess has the ability the output real-time data in the HTML report. You can even email the HTML file since it is composed of a single file with no external file dependencies, how neat is that!

The process of generating a real-time HTML report is very similar to the process of creating a static report. Only –real-time-html is needed to make it real-time.
# goaccess access.log -o /usr/share/nginx/html/your_site/report.html –real-time-html

By default, GoAccess will use the host name of the generated report. Optionally, you can specify the URL to which the client’s browser will connect to. See http://goaccess.io/faq for a more detailed example.
# goaccess access.log -o report.html –real-time-html –ws-url=goaccess.io

By default, GoAccess listens on port 7890, to use a different port other than 7890, you can specify it as (make sure the port is opened):
# goaccess access.log -o report.html –real-time-html –port=9870

And to bind the WebSocket server to a different address other than 0.0.0.0, you can specify it as:
# goaccess access.log -o report.html –real-time-html –addr=127.0.0.1

Note: To output real time data over a TLS/SSL connection, you need to use –ssl-cert= and –ssl-key= .

WORKING WITH DATES

Another useful pipe would be filtering dates out of the web log

The following will get all HTTP requests starting on 05/Dec/2010 until the end of the file.
# sed -n ‘/05\/Dec\/2010/,$ p’ access.log | goaccess -a –

or using relative dates such as yesterdays or tomorrows day:
# sed -n ‘/’$(date ‘+%d\/%b\/%Y’ -d ‘1 week ago’)’/,$ p’ access.log | goaccess -a –

If we want to parse only a certain time-frame from DATE a to DATE b, we can do:
# sed -n ‘/5\/Nov\/2010/,/5\/Dec\/2010/ p’ access.log | goaccess -a –

VIRTUAL HOSTS

Assuming your log contains the virtual host field. For instance:
vhost.io:80 8.8.4.4 – – [02/Mar/2016:08:14:04 -0600] “GET /shop HTTP/1.1” 200 615 “-” “Googlebot-Image/1.0”

And you would like to append the virtual host to the request in order to see which virtual host the top urls belong to
awk ‘$8=$1$8’ access.log | goaccess -a –

To exclude a list of virtual hosts you can do the following:
# grep -v “`cat exclude_vhost_list_file`” vhost_access.log | goaccess –

FILES & STATUS CODES

To parse specific pages, e.g., page views, html , htm , php , etc. within a request:
# awk ‘$7~/\.html|\.htm|\.php/’ access.log | goaccess –

Note, $7 is the request field for the common and combined log format, (without Virtual Host), if your log includes Virtual Host, then you probably want to use $8 instead. It’s best to check which field you are shooting for, e.g.:
# tail -10 access.log | awk ‘{print $8}’

Or to parse a specific status code, e.g., 500 (Internal Server Error):
# awk ‘$9~/500/’ access.log | goaccess –

SERVER

Also, it is worth pointing out that if we want to run GoAccess at lower priority, we can run it as:
# nice -n 19 goaccess -f access.log -a

and if you don’t want to install it on your server, you can still run it from your local machine:
# ssh root@server ‘cat /var/log/apache2/access.log’ | goaccess -a –

INCREMENTAL LOG PROCESSING

GoAccess has the ability to process logs incrementally through the on-disk B+Tree database. It works in the following way:
1.A data set must be persisted first with –keep-db-files , then the same data set can be loaded with –load-from-disk .
2.If new data is passed (piped or through a log file), it will append it to the original data set.
3.To preserve the data at all times, –keep-db-files must be used.
4.If –load-from-disk is used without –keep-db-files , database files will be deleted upon closing the program.

Examples
// last month access log
# goaccess access.log.1 –keep-db-files

then, load it with
// append this month access log, and preserve new data
# goaccess access.log –load-from-disk –keep-db-files

To read persisted data only (without parsing new data)
# goaccess –load-from-disk –keep-db-files

Contributing

Any help on GoAccess is welcome. The most helpful way is to try it out and give feedback. Feel free to use the Github issue tracker and pull requests to discuss and submit code changes.

Enjoy!
https://github.com/allinurl/goaccess

发表评论

电子邮件地址不会被公开。 必填项已用*标注