These forums are read-only!
What's a current method of logging http services?
  • Is the best practice for logging apache still the tried and true log file and then running something like awstats on the logs? It's been years since I've thought about using anything more than Google Analytics. The log file + awstats method seems really crude, even back then. I believe webtrends is supposed to be pretty nice, but costs $$$.

    What are my options and what is considered a best practice?

    Thanks
  • Logging usually becomes a bottleneck when traffic increases. It depends on what you're trying to achieve with logging as well. Google analytics is pretty solid, and free.

  • For comparison, haveamint.com looks great. It's designed by someone well respected and has great features.

    I guess the question is, do these options completely make log parsing obsolete?

    When using google analytics, what I find lacking is real-time data (zomg, how much traffic is hitting right now).

    Also, I am disappointed in the lack of cross sectional data analysis of logs.
    - "Of new, unique users today, what are the page counts by content?"
    - "Of visitors from this referrer, how many are returning vs. new?"

    And the lack of performance metrics bugs me. Should I be looking for something for cacti for that?
    - "What pages took longest to load and deliver?"
    - "Map my traffic during this time period to cpu load, network traffic, etc etc"
  • I've been using awstats here at work. It definitely seems rather dated compared to what I've seen of mint...but maybe that's just the "look" of it...
  • I'm curious about the performance issue. If your server gets hit with a ton of requests, won't it be a performance drag to make a call out to Google on every page? Depending on the responsiveness of Google's servers, couldn't that be even more of a performance hit than logging to the local disk?
  • Sam, google analytics works via javascript and therefore introduces zero load to your server.

  • Yes, but if that JavaScript takes a while to do its job (due to an unresponsive Google server), won't the client's browser block? I've seen page load times slow down when google analytics bogs down. Or am I imagining that?

    And if apache processes start getting tied up serving pages that are waiting on Google's servers, then the web server could easily hit a MaxClients or other memory-related limit and slow to a crawl. Basically, anything that prevents that apache process from freeing up to handle another request, be that time spent logging or time spent making a JavaScript request to an overloaded Google server, could have a negative effect on web server performance.
  • Would Webalizer (a logfile analyzer) work for you? I find the reporting to be reasonable, and pretty enough to be used for reports to my clients.. It's written with C++ now and not in Perl anymore, so it's faster than ever now.. It also has the magic sound of the word Free... 8-)
  • @gadget - i love the word free :)



    Posted By: artageswYes, but if that JavaScript takes a while to do its job (due to an unresponsive Google server), won't the client's browser block? I've seen page load times slow down when google analytics bogs down. Or am I imagining that?

    And if apache processes start getting tied up serving pages that are waiting on Google's servers, then the web server could easily hit a MaxClients or other memory-related limit and slow to a crawl.


    Finally something I know can can contribute back to these forums! Google Analytics works by instructing you to include a script block which requests a javascript file from google. The request is made by the client (browser) and not your webserver. Your webserver is only concerned with handing the client the text file output (along with other media which is sourced to you).

    For example, check out the source at slicehost.com. At the end of the page you see this:
    <script src="http://www.google-analytics.com/urchin.js&quot; type="text/javascript">
    </script>
    <script type="text/javascript">
    _uacct = "UA-512554-1";
    urchinTracker();
    </script>
    </body>
    </html>


    That src="http://www.google-analytics.com/urchin.js" bit tells the browser to request urchin.js from google-analytics.com. Other resources on that page are requested from slicehost (like images).

    So that's how it doesn't affect your webserver's performance at all.

    Now on the otherhand, you are experiencing a bit of a hang at the end of pageloads as they wait for google sometimes. But no worries, that load is on google, not the originating webserver. That's actually why they tell you to put it at the very end. So that all your other elements load first.


    As for the original discussion, I doubt that any flatfile log analysis system is capable of the features I'm looking to find. e.g. cross sectional analysis, etc since it's not stored in a relational db. Unfortunately I haven't seen any free packages which work off a database. And it seems like a fairly uncommon practice to get apache to use mod_log_mysql to log to a db.
  • @combhua

    If you haven't tried the "new" js version of google analytics, I highly recommend it. It's an update the the old urchin product which merges some cool features from measuremap .
    The ecommerce tracking is pretty darn cool too.

    At my day job, we pay for stats from Omniture at about $54k per year (crazy) and google analytics is the closest thing I've seen in a free product to what you get for mass dollars. The only real problem is the 24hr lag time that you have to wait to see the results.

    Personally I still keep server side logs to keep track of what's going on error/access wize.
  • Posted By: combhua
    Finally something I know can can contribute back to these forums! Google Analytics works by instructing you to include a script block which requests a javascript file from google. The request is made by the client (browser) and not your webserver. Your webserver is only concerned with handing the client the text file output (along with other media which is sourced to you).

    Doh! I knew that. I guess it's been a long day. Thanks for setting me (and other readers) straight, combhua.
  • While we're on the topic, does anybody know of a tool to simply analyze the amount of bandwidth that a particular website is using each day/month/year etc? Google Analytics has been perfect for all my other needs and generating all these other stats from the log seems like it's a little overkill for my situation.
  • Webalizer fits that bill.. :-D
  • @Gadget:

    I agree with you; Webalizer is a decent way of viewing your logs, and has a price that is hard to beat. I've never seen any (free-as-in-beer) analyzer come even close. I believe that if you want something better you'd have to pay for it or roll your own.