(fingerprint: B650 E003 4C51 B305 AC51 A17F B2C8 55ED 4C91 E8E3)
northern Los Angeles area
Designing, building, operating and securing Internet-scale services since before there was a cloud, for clients in a wide range of industries (from finance to advertising to entertainment to health care to ISPs to language) for 20 years. I specialize in leading teams on the design of new application infrastructures and the planning and execution of datacenter and cloud migrations.
I’ve worked in orgs from 20 to 10,000 people, and built infrastructures that range from a few dozen servers in a closet, to many thousands of servers in colocation facilities (or AWS availability zones!) on multiple continents around the world serving tens of millions of end users. I’ve worked on teams of one (as an independent consultant) to several dozen, usually in a leadership role.
I know Internet systems, services and protocols backwards and forwards, and can leverage them to support any business objectives.
Will trade elegant solutions for interesting technical challenges.
Specialties: Amazon Web Services, system and application architecture, security, legal/regulatory compliance, scalable infrastructures, troubleshooting/detective work/software archaeology, Internet operations, technology leadership.
AWS Certified Solutions Architect Associate - AWS-ASA–29910
SDL - Maidenhead, U.K. (July 2013 - )
Principal Hosting Engineer and Team Lead (February 2014 - )
- I lead a team of engineers that manages a large-scale machine translation infrastructure. My team is also responsible for the design, build-out, consolidation and migration of this and other environments around the world, in support of the company’s mission to help businesses engage with their customers on a global basis, across channels, cultures and languages.
- I was technical lead on a datacenter migration that won “Cloud Journey of the Year” in 2016 from Datacenter Dynamics. SDL took first place from among dozens of entrants for our migration from collocation to private cloud, which was managed by a small team and completed on time and under budget.
Senior Hosting Engineer (July 2013 - February 2014)
- supporting both the pure science R&D environments and production infrastructure for machine learning and statistics-based language translation. I spend my time managing our NetApp and VMware infrastructures and doing config mgmt, capacity planning, technology roadmap/strategy, performance testing, production troubleshooting, and building/consolidating datacenter environments. F5, vSphere, NetApp, Linux (CentOS/RHEL), SaaS, VMware private cloud, Hadoop, Ansible, ahoy!
LivingSocial - Washington, D.C. (April 2013 - July 2013 )
Senior Ops Engineer
- managed one of the Internet’s larger MySQL installations (dozens of clusters each containing a multi-master setup and between 2–6+ slaves, dozens of TB of data stored, managed and backed up) in one of the Internet’s larger Ruby on Rails shops, using chef and GitHub Enterprise. Migrated authoritative and recursive DNS services from BIND to PowerDNS on Rails. Working on an entirely distributed operations team spanning 5 time zones.
OpenX, Inc. - Pasadena, CA (June 2012 - April 2013)
Site Reliability Engineer
- member of a six-person SRE team that grew an infrastructure from 4000 to 5500+ servers in six months using slack, Perl, MySQL, SSH and rsync. That infrastructure (in six colo facilities on three continents) enabled apps built on Erlang, Cassandra, Riak, Hadoop, memcache and Akamai to serve tens of millions of end users (at peak, around 200K rps) in under 200 milliseconds per request (the SLA threshold set by most of our clients). We managed petabytes of storage (in multiple Hadoop grids of 400–600 servers each) and multiple gigabits of sustained traffic to the Internet. OpenX served over a trillion requests the month after I started.
Intuit Financial Services - Westlake Village, CA (November 2008 - May 2012)
Senior App Ops Engineer (Compliance Service Delivery)
Senior member of Compliance Service Delivery, the team responsible for shepherding new applications from inception/design/planning through build/test and QA into production release and handoff to the maintenance team, with a focus on apps that faced legal or regulatory compliance (PCI/SOX/FFIEC/SSAE16/etc.).
My team acted as both technical leads for all questions and as a “glue layer” or liason between the other teams involved in project design: engineering, business, security, project management, networking, technical support, operations and customer service. The nature of the team’s role required all members to have a very broad knowledge that encompassed all the other teams we coordinated with, and an ability to effectively translate - we needed to have a strong understanding of engineering, security principles, business and product requirements, network design and interaction with a multitude of existing legacy products, all in service of flawless delivery for the customers (on time, under budget, reliable, scalable, no outages).
*highlight 1*: technical lead on the project team responsible for the design and implementation of the company’s first-ever highly-available, horizontally- scalable platform for Internet banking. Along the way, we discovered and squashed countless bugs in both our own legacy platforms, as well as in Apache itself (several were sent back upstream to be included in future Apache releases, where appropriate). As a prerequisite, we designed and implemented a new centralized logging system to handle the audit trail for IB activity.
*highlight 2*: played “scout” role as initial team member involved in planning and migration from legacy datacenters to new, active-active platform in new datacenters (transforming app platform while migrating to new facilities). The difficulty involved was magnified by the complexity, size and byzantine nature of the legacy platforms and networks. Wrote voluminous documentation to train the rest of my team on SOPs, best practices and gotchas related to the transition to the new datacenter platform.
Lynda.com - Ventura, CA (March 2008 - October 2008)
part of Lynda.com’s first formal IT team (following 12 years of intermittent IT oversight); significant challenges involved in maintaining service availability while building out replacement infrastructure.
designed and built split-horizon DNS; built RedHat template for VMware ESX; built and migrated to new mailserver and collaboration software; built, compared and maintained a number of local and remote-hosted network, systems and service monitoring utilities; built VMware ESX and VirtualCenter systems; deployed IP mgmt software; deployed CVS repository; deployed RANCID config mgmt for network gear; designed and implemented performance shootout among replacement CDN candidates while simultaneously testing competing web metrics providers.
maintained a 120TB XSan, Cisco firewalls/routers, HP switches, wireless and wired links between multiple campus locations, a 40TB hot mirror of the critical XSan subsets (Sun X4500 running my own custom-built synchronization software), and several pieces of critical interoperability gear serving a Linux/OS X/Windows mixed environment.
automated, consolidated, documented and simplified almost every aspect of what was initially a highly chaotic technology environment; significantly improved reliability and service quality to both internal and external customers; planned future site buildouts and network expansions for both SAN and IP networks.
Move, Inc. - Westlake Village, CA (April 2006 - March 2008)
UNIX System Administrator IV - Team Lead
During my tenure at Move, I played a lead role in one of the largest data center migrations of my career, as well as a number of other complex and high- visibility projects:
hired as a UNIX sysadmin and transitioned to the storage engineering team during my second year, while still training new sysadmins on the UNIX team. I trained half a dozen new sysadmins in my time at Move, and mentored many others outside my team on a wide variety of technology topics.
heavily involved in network engineering, site performance and capacity planning; made frequent contributions to the security team; was lead on the VMware and infrastructure (UNIX/core services) teams and coordinated several cross-org projects between software dev, ops, data aggregation, storage and network engineering.
created and maintained the infrastructure team’s wiki and CVS repository (and wrote the majority of the docs and nearly all of the code); reverse-engineered and rebuilt undocumented legacy systems following the departure of other admins; and worked on a team that reduced front-page load times from 4+ seconds to 0.8 seconds (I did the R&D and managed transition of media assets from in-house webservers to commercial CDNs).
I was part of the core team that designed, built and transitioned to what was, at the time, the world’s largest VMware VI3 production installation. We migrated hundreds of applications, dozens of verticals and scores of terabytes of production data and live sites from LA to Phoenix, with near-zero downtime, and with significant increases in performance, reliability, scalability and security.
I was responsible for much of the next-gen architecture for homepics.realtor.com (now p.rdcpix.com), a 260-million object, 2.5TB data set that, prior to redesign, was crashing the fastest NetApp filers on the market. I personally tracked down the source of the problem, at the filesystem/memory allocation level, and worked with NetApp engineering to develop workarounds while designing a permanent solution with our software engineering and data agg teams. (A solution that scales to 16 billion objects while maintaining a constant memory footprint!)
previous positions listed in my linkedin profile
Skills (from familiar to highly proficient, in no particular order)
- AWS Certified Solutions Architect Associate - AWS-ASA–29910
- VMware certified on VI3 (VCP #10952)
- ITIL 3.0 certified (2011)
- big data/Internet scale: the majority of my experience in the last decade has been in building and operating web-scale infrastructures that served millions of end users.
- UNIX derivatives: FreeBSD, OpenBSD, Linux, OS X, Solaris, AIX
- open source software: Apache, nginx, RDBMSes (MySQL, PostgreSQL, Oracle), NoSQL (Cassandra, Riak, redis), mail servers (sendmail, postfix, qmail), version control (CVS, subversion, git)
- scripting: Perl, Bourne shell, Python, PHP, Ruby
- storage: NetApp, ZFS, EMC SANs; iSCSI, NFS and SAN environments
- scalability: load balancers (F5, Citrix, Radware, haproxy), CDNs (Limelight, Akamai, Panther, CDNetworks, etc.)
- rule zero: it has to work; rule one: use the right tool for the job. (see also: the rules)
Missouri Southern State University - 1996 - 1999
- B.A. Mass Communications, 1999
- partial credit towards two additional degrees in computer science and network administration
John Brown University - 1994 - 1996