Please enable JavaScript.

Coggle requires JavaScript to display documents.

CHAOS ENGINEERINGCompanies, People, Tools & Practices (Training…

- - - - People
        
        Yury Izrailevsky
        VP, Cloud Computing and Platform Engineering
        
        https://fr.slideshare.net/AmazonWebServices/ent101-embracing-the-cloud-final
        
        Ariel Tseitlin
        Investor, entrepreneur, and accomplished technology executive
        Former Cloud Director at Netflix
        
        https://fr.slideshare.net/atseitlin/aws-reinvent-2012-chaos-monkey-the-netflix-simian-army
        
        Nora Jones
        Senior Chaos Engineer at Netflix, formerly at Jet.
        Co-author Chaos Engineering (O'Reilly 2017)
        https://fr.slideshare.net/InfoQ/choose-your-own-adventure-chaos-engineering
        
        Free eBook at O'Reilly
        
        Casey Rosenthal
        Philosopher. Traffic and Chaos Engineering Manager
        
        Principles of Chaos Engineering
        
        Aaron P Blohowiak
        Co-Author of O'Reilly's "Chaos Engineering". Work on distributed system reliability and design @ Netflix.
        O'Reilly Velocity San Jose 2017: Precision Chaos
        
        Lorin Hochstein
        Putting the engineering in computer science
        and the science in software eng. Academic refugee.
        Chaos engineer, Netflix
        
        Ali Basiri
        Senior Software Engineer
        Wreaking Havoc
        
        Greg Orzell
        Cloud Distributed Systems Architecture Consulting, at Crispy Mountain GmbH
        Founded the Simian Army
        
        Luke Koweski
        Senior Software Engineer and a founding member of the Traffic & Chaos team at Netflix
        https://fr.slideshare.net/InfoQ/chaos-kong-endowing-netflix-with-antifragility
      - Tools
        
        Simian Army
        
        Latency Monkey
        By introducing communication delays at the communication layer level, a tool that allows to test the tolerance to the loss of performance of an external component whose system is dependent upon, up to the simulation of a complete cut - an infinite delay ; without having to ask the partner concerned to cut his service.
        
        Chaos Monkey
        The first tool developed by Netflix, it allows random selection of instances in the production environment and deliberately put them out of service.
        
        Chaos Kong
        King of Gorilla and drop a full Amazon Region
        
        ChAP : Chaos Automation Platform
        ChAP enables engineering teams to run Chaos Engineering experiments on live traffic in production in order to build confidence that their service will degrade gracefully when non-critical downstream services fail.
        https://arxiv.org/pdf/1702.05849.pdf
        
        FIT: Failure Injection Testing
        Platform that simplifies creation of failure within our ecosystem with a greater degree of precision for what we fail and who we will impact. FIT also allows us to propagate our failures across the entirety of Netflix in a consistent and controlled manner.
    - - Tools
        
        Waterbear
        “application resilience” as a service
        
        LinkedOut
        Framework and tooling to test how user experience will degrade in different failure scenarios associated with downstream calls. It provides a seamless way to simulate failures across our application stack with minimal effort.
        
        FireDrill
        Provides an automated, systematic way to trigger/simulate infrastructure failure in production, with the goal of helping build applications that are resistant to these failures.
        
        Simoorg
        Open Source Failure Induction Framework
      - People
        
        Bhaskaran Devaraj
        Senior Director, Site Reliability Engineering at LinkedIn
    - - Tools
        
        Hailstorm drives integration tests and simulates peak load during off-peak times
        
        uDestroy intentionally breaks things so we can get better at handling unexpected failures
    - - Tools
        
        Nemesis
        Simulate error conditions using "disruptors"
      - People
        
        Shay Holmes
        Sr. Director, Engineering Services
        
        Suresh Visvanathan
        Nemesis Architect & Lead
    - - Tools
        
        Minions Bestiary
        
        https://fr.slideshare.net/madrockriss/paris-chaos-engineering-meetup-1
        
        Chaos Monkey
        Allows random selection of instances in the production environment and deliberately put them out of service.
        
        Processkiller Monkey
        Cousin of Chaos Monkey, e little more definitive...
        
        Latency Monkey
        By introducing communication delays at the communication layer level, a tool that allows to test the tolerance to the loss of performance of an external component whose system is dependent upon, up to the simulation of a complete cut - an infinite delay ; without having to ask the partner concerned to cut his service.
        
        Fulldisk Monkey
        Allows to full a disk and test resilience of application, specillay logging
        
        Properties Monkey
        Allows to modify properties of an application and test resilience of application.
        
        Monké Go
        Not a monkey, but a automation platform to run monkeys during integration testing
      - People
        
        Christophe Rochefolle
        Experienced IT executive providing technology & organization to improve quality & agility of IT systems, Chaos Engineering fan
        
        https://fr.slideshare.net/madrockriss/paris-chaos-engineering-meetup-1
        
        Benjamin Gakic
        SRE Architect
        IT & #ChaosEngineering
        
        https://fr.slideshare.net/madrockriss/paris-chaos-engineering-meetup-1
    - - Experiment
        
        Chaos Monkey
  - - - Tools
        
        Kube-monkey
        An implementation of Netflix's Chaos Monkey for Kubernetes clusters
    - - Tools
        
        Pumba
        Chaos testing and network emulation for Docker containers (and clusters)
      - People
        
        Alexei Ledenev
        Chief Research Officer at Codefresh
        
        https://fr.slideshare.net/alexLM/chaos-engineering-for-docker
    - - Tools
        
        Search Chaos Monkey
        Search Chaos Monkey has been instrumental in providing a deterministic framework for finding exceptional failures and driving them to resolution as low-impact errors with planned, automated solutions.
      - People
        
        Charles Torre
        Chaos Engineering, Programming, Technical Leadership
        https://msdevshow.com/2016/11/chaos-engineering-with-charles-torre/
        
        James Hamilton
        AWS VP, Ex-Microsoft Research
        About testing in production, 2007
        
        Heather Nakama
        Software Engineer at Microsoft
        https://azure.microsoft.com/en-us/blog/inside-azure-search-chaos-engineering/
    - - Tools
        
        Chaos Lemur
        Cousin to Chaos Monkey, but built for Pivotal Cloud Foundry
      - People
        
        Paul Harris
        Staff Software Engineer
        
        Sergiu Bodiu
        Passionate IT craftsmanship #blitzscaling, avid student of life, autodidact, #cloudnative evangelist.
        
        https://fr.slideshare.net/sbodiu/from-resilient-to-antifragile-chaos-engineering-primer-devseccon
    - - Tools
        
        Chaos Gopher
        Chaos testing/engineering in GO
      - People
        
        Matthew Campbell
        Ex-General Purpose GO Hacker at DigitalOcean
        Cofounder at Loom Network
        https://www.slideshare.net/MatthewCampbell7/presentationchaosmonkey
    - - People
        
        Bruce M. Wong
        Stitch Fix Eng - keeper of chaos, breaker of systems :: formerly practiced at Twilio, Netflix, Adobe
        
        https://fr.slideshare.net/BruceWong3/the-journey-of-chaos-engineering-begins-with-a-single-step
        
        James Burns
        Software Architect at Stitch Fix
        Former Tech Lead at Twilio
- - - - Experiment
        
        Storm
        To prepare for the loss of a datacenter, Facebook regularly tests the resistance of its infrastructures to extreme events. Known as the Storm Project, the program simulates massive data center failures.
      - People
        
        Jay Parikh
        Vice president and head of engineering and infrastructure
    - - Experiment
        
        Disaster Recovery Program (DiRT)
        Google runs an annual, company-wide, multi-day Disaster Recovery Testing event—DiRT—the objective of which is to ensure that Google's services and internal business operations continue to run following a disaster.
      - People
        
        Kripa Krishnan
        Director, Cloud Ops & Site Reliability Engineering
        Google's Queen of Chaos
    - - Experiment
        
        DRT : Disaster Recovery Test
      - People
        
        Tammy Butow
        Site Reliability Engineering Manager
        Now at Gremlin Inc.
        
        Thomissa Comellas
        SRE causing chaos at Dropbox,
        previously at StanfordEng, TeslaMotors.
    - - Experiment
        
        Too big to test: Breaking a production brokerage platform without causing financial devastation *
        https://cdn.oreillystatic.com/en/assets/1/event/124/Too%20big%20to%20test_%20Breaking%20a%20production%20brokerage%20platform%20without%20causing%20financial%20devastation%20Presentation%202.pdf
      - People
        
        David Halsey
        VP, Performance Engineering, Fidelity Investments
        
        Kyle Parrish
        Innovative, multi-dimensional leader focused on Technology Risk and Information Security in Financial Services
  - - - Gameday AWS Interactive, six-part series to get hands-on cloud computing experience https://fr.slideshare.net/AmazonWebServices/game-days-crash-test-your-application-and-your-team
        
        Gameday AWS at Veolia Water Technologies https://www.slideshare.net/D2SI/aws-summit-paris-2017-gameday-veolia
        
        Gameday by DiUS https://fr.slideshare.net/DiUSComputing/gameday-achieving-resilience-through-chaos-engineering
        
        The Ultimate Ressources to prepare your Gameday by DiUS
        
        Gameday at TIAD by D2SI
        
        https://fr.slideshare.net/TIADParis/tiad-2016-gameday-aws
      - Days of Chaos
        Inspired by AWS GameDays to test the resilience of its applications, teams volunteer applications in a Day of Chaos. Every 30 minutes, operators simulated failures in pre-production. Teams earned points based on detections, diagnoses and resolutions. This type of gamified event helps to introduce development teams to the concept of resilience.
        
        Experiment
        
        Days Of Chaos
        
        https://www.slideshare.net/devopsrex/days-of-chaos-le-dveloppement-de-la-culture-devops-chez-voyagessncfcom-laide-de-la-gamification-80396202
    - - Jesse Robbins
        Former Amazon « Master of disaster »
        OrionLabs Founder and CEO
        Creator of Gameday AWS
        
        https://fr.slideshare.net/jesserobbins/ameday-creating-resiliency-through-destruction
        
        Former fireman
    - - Gremlin Fault Injection Tool
- - - - Gremlin
        Framework to safely, securely, and easily simulate real outages with an ever-growing library of attacks.
        Chaos Engineering: the history, principles, and practice
    - - Kolton Andrus
        CO-FOUNDER & CEO
        
        https://fr.slideshare.net/KoltonAndrus/breaking-things-on-purpose-with-gremlin
      - Matt Fornaciari
        CTO - avid practitioner of #chaosengineering
        Former at Salesforce and Amazon
      - Matt Jacobs
        Engineering
        Previously at Netflix
        https://fr.slideshare.net/MattJacobs11/using-hystrix-to-build-resilient-distributed-systems-58836753
  - - - Sylvain Hellegouarch
        Engineering and Learning Chaos ; ChaosIQ founder & CTO
        
        https://fr.slideshare.net/SylvainHellegouarch/mucon-2017-build-confidence-in-your-system-with-chaos-engineering
      - Russ Miles
        Chaos Engineering Officer (CEO) of ChaosIQ.io
        https://fr.slideshare.net/russmiles/chaos-engineering-101-by
    - - Chaos Toolkit
        Free, open source project that enables you to create and apply Chaos Experiments to various types of infrastructure, platforms and applications.
      - ChaosIQ
        Platform for your teams to apply Chaos Engineering to their rapidly evolving, business critical Cloud Native microservices and platforms so they can build confidence that those systems won't fail your users.