Who am I?

  • Linux Kernel
  • Rackspace:
    • I work on Cloud Monitoring
    • API Driven, Monitoring as a Service.
    • Distributed across all Rackspace data centers.
    • Async events trigger support issues.
    • When things break, we can be called in.

What is monitoring?

What is monitoring?

Awareness of the system

  • External monitoring
  • Internal monitoring

The goal:

  • Move past top
  • Move past ping

What is external monitoring?

  • ad-hoc monitoring
    • ping, nmap, curl
  • pollers
    • nagios, ganglia, cacti, zenoss, noitd

What is internal monitoring?

  • run scripts
  • gather statistics
  • push to monitoring server

  • examples of internal monitors

    • NRPE, NSClient++, lots of proprietary ones

Alerting and Historical Data

Using monitored data: historical

  • Trending data
    • Immediate, Hourly, Daily
    • etc
  • Analyze for long running problems
  • Generate pretty graphs

Using monitored data: alerting

  • Pipeline looking at data
  • Language to define alert states
  • Notifications sent on alert
    • SMS
    • Webhooks
    • Email

What does a monitoring agent do?

  • Enables white box monitoring
  • Report CPU/Memory/Disks/Database
  • User provided plugins or scripts
  • Daemonized or ran on cron

Virgo

Design constraints

  • Low memory usage (< 5 Mb)
  • Simple secure “proxyable” protocol
  • High level scripting language
  • Statically linked requirements
  • Windows, Linux, and OSX

Design decisions

  • Avoid C++ & use Lua
  • Use Lua
  • Only depend on libc
  • SSL + JSON newline + JSONRPC
  • libuv

Memory Usage

Protocol Overview

Protocol Overview: Hello Request

{
  target = "endpoint",
  source = "agentA",
  id = 0,
  params = {
    agent_id = "agentA",
    process_version = "f451d7097edb197a9e08fa05cf5b0556ed15d7c7",
    token = "0000000000000000000000000000000000000000000000000000000000000000.7777",
    bundle_version = "0.1-75-gf451d70"
  },
  v = "1",
  method = "handshake.hello"
}

Protocol Overview: Hello Response

{
  target = "agentA",
  source = "endpoint",
  id = 0,
  result = { 
    heartbeat_interval = 1000
  },
  v = "1"
}

Protocol Overview: Check Schedule

{
  "v": "1",
  "id": 2,
  "source": "endpoint",
  "target": "agentA",
  "result": {
    "checks": {
      "id": "ch1234",
      "type": "agent.cpu",
      "details": { "foo": "foo" },
      "period": 30,
      "timeout": 30,
      "disabled": false
    }
  },
  "error": null
}

Protocol Overview: Proxyable

How is it built

Luvit

Untechnical Overview

Luvit is a platform for building your app

  • Scrawny
  • Awkward
  • Space Themed (lua)
  • <3 community
  • Familiar node APIs

Technical Overview

  • lua using luajit
  • low memory footprint
  • I/O driven event loop
  • Small simple C API
  • crypto, ssl, zlib, json bindings
  • tcp, http, dns protocol support
  • Windows, Linux, FreeBSD and OSX

HTTP Server Example

local http = require("http")

http.createServer(function (req, res)
  local body = "Hello world\n"
  res:writeHead(200, {
    ["Content-Type"] = "text/plain",
    ["Content-Length"] = #body
  })
  res:finish(body)
end):listen(8080)

print("Server listening at http://localhost:8080/")

lua

Lua - Javascript’s Long Lost Brazilian Cousin

  • Dynamic language
  • Floating point numbers only
  • First class functions
  • Lexical closures
  • Metatables
  • Embeddable

Example code

GroundControl = {}

function GroundControl.new()
  obj = {}
  obj.heard_major_tom = false
  setmetatable(obj, { __index = GroundControl })
  return obj
end

function GroundControl:heard()
  print(self.heard_major_tom)
end

a = GroundControl.new()

a:heard()
a.heard() -- this will error

libuv

Basic idea

  • Two types of events in the loop:
    • I/O on file descriptors
    • Timers for future events
  • Callbacks are attached to these events
  • epoll()/completion ports/kqueue() wait
  • callback is called on the correct event

Event loop pseudo code

while (1) {
  nfds = poll(fds, next_timer());

  if (nfds == 0)
     timer_callback();

  for(n = 0; n &lt; nfds; ++n) {
     if (fds[n] == READY)
        callbacks[n]();
  }
}
      

Zip

Zip

  • Lua code lives in a zip file
  • Small file to do upgrades
  • Fewer filesystem headaches

Sigar

Sigar

  • Process information
  • Network configuration
  • CPU, swap, load average
  • Usage statistics

Thats Virgo.

Demo: Fixture server just for OSCON <3

<Thank You!>

http://github.com/racker/virgo