My name is Jonathan McCaffrey and I work on the infrastructure team here at Riot. This is the first post in a series where we’ll go deep on how we deploy and operate backend features around the globe. Before we dive into the technical details, it’s important to understand how Rioters think about feature development. Player value is paramount at Riot, and development teams often work directly with the player community to inform features and improvements. In order to provide the best possible player experience, we need to move fast and maintain the ability to rapidly change plans based upon feedback. The infrastructure team’s mission is to pave the way for our developers to do just that - the more we empower Riot teams, the faster features can be shipped to players to enjoy.

当然,这比做了更容易!鉴于我们部署的多样性性质,我们出现了一系列挑战 - 我们在公共云中的服务器,私人数据中心和像腾讯和Garena这样的合作伙伴环境,所有这些都是在地理上和技术上的多样化。当他们准备发货时,这种复杂性就会造成巨额负担。That’s where the infrastructure team comes in - we’ve recently made progress in removing some of these deployment hurdles with a container-based internal cloud environment that we call ‘rCluster.’ In this article I’ll discuss Riot’s journey from manual deploys to the current world of launching features with rCluster. As an illustration of rCluster’s offerings and technology, I’ll walk through the launch of the Hextech Crafting system.



一路上,我们走向s leveraging厨师对于许多常见部署和基础架构任务。我们还开始使用越来越多的公共云,为我们的大数据和网络努力。这些演变触发了我们多次网络设计,供应商选择和团队结构的变化。

我们的数据中心包含数千台服务器,其中几乎每种新应用程序安装了新的服务器。新服务器将在自己的手动创建中存在VLAN.使用手动路由和防火墙规则,可在网络之间启用安全访问。虽然这个过程帮助我们安全和明确定义了故障域,但它仍然艰巨而耗时。为了复制这种设计的痛苦,当时的大多数新功能都被设计为小型网络服务,因此我们的生产中的独特应用数量LOL Ecosystem飙升。

在此之上,我们的开发团队对自己的应用程序的能力缺乏信心,特别是当它在部署时的配置和网络连接等问题时。拥有如此密切地绑定到物理基础架构的应用程序意味着生产数据中心环境之间的差异未在QA,暂存和中复制PBE。Each environment was handcrafted, unique, and in the end, consistently inconsistent.



The infrastructure team set a goal to solve these problems for players, developers, and Riot for Season 2016. By late 2015, we went from deploying features manually to deploying features like Hextech Crafting in Riot regions in an automated and consistent fashion. Our solution was rCluster - a brand new system that leveraged Docker and Software Defined Networking in a micro-service architecture. Switching to rCluster would pave over the inconsistencies in our environments and deployment processes and allow product teams to focus squarely on their products.



  • 战利品服务- 通过HTTP / JSON REST API提供LOOT请求的Java应用程序。

  • Loot Cache.- A caching cluster using Memcached and a small戈兰旁章用于监视,配置和启动/停止操作。

  • 抢劫DB.- 具有主设备和多个从站的MySQL DB群集。

When you open the crafting screen, here is what happens:

  1. A player opens the crafting screen in the Client.

  2. 客户端是RPC.call to the frontend application, aka “feapp” which proxies calls between players and internal backend services.

  3. feapp调用Loot服务器

    1. Feapp在“服务发现”中查找战利品服务,以查找其IP和端口信息。

    2. FeApp使HTTP调用LOOT服务。

    3. The Loot Service checks the Loot Cache to see if the player’s inventory is present.

    4. The inventory isn’t in the cache, so the Loot Service calls Loot DB to see what the player currently owns and populates the cache with the result.

    5. LOOT服务回复了接听电话。

  4. Feapp将RPC响应发送回客户端。

Working with the Loot team, we were able to get the Server and Cache layers built into Docker containers, and their deployment configuration defined in JSON files that looked like this:

Loot Server Json示例

{“姓名”:“euw1.loot.lootserver”,“服务”:{“appname”:{loot.lootserver“,”位置“:”lociot.ams1.euw1_loot“},”容器“:[{”图像“:"compet/lootserver", "version": "0.1.10-20160511-1746", "ports": [] } ], "env": [ "LOOT_SERVER_OPTIONS=-Dloot.regions=EUW1", "LOG_FORWARDING=true" ], "count": 12, "cpu": 4, "memory": 6144 }

Loot Cache Json示例

{“姓名”:“euw1.loot.memcached”,“服务”:{“appname”:“loot.memcached”,“位置”:“lociot.ams1.euw1_loot”},“容器”:[{“名称”:"loot.memcached_sidecar", "image": "rcluster/memcached-sidecar", "version": "0.0.65", "ports": [], "env": [ "LOG_FORWARDING=true", "RC_GROUP=loot", "RC_APP=memcached" ] }, { "name": "loot.memcached", "image": "rcluster/memcached", "version": "0.0.65", "ports": [], "env": ["LOG_FORWARDING=true"] } ], "count": 12, "cpu": 1, "memory": 2048 }

但是,为了实际部署此功能 - 并且真正在减轻早期概述的痛苦方面取得了进展 - 我们需要创建可以在北美,欧洲和亚洲等地支持世界各地的Docker的集群。这需要我们解决一堆难题,如:

  • 调度容器

  • Networking With Docker

  • 连续交货

  • 运行动态应用程序



We implemented container scheduling in the rCluster ecosystem using software we wrote called Admiral. Admiral talks to Docker daemons across an array of physical machines to understand their current live state. User make requests by sending the above-mentioned JSON over HTTPS, which Admiral uses to update its understanding of the desired state of the relevant containers. It then continually sweeps both the live and desired state of the cluster to figure out what actions are needed. Finally, Admiral makes additional calls to the Docker daemons to start and stop containers to converge on that desired state.


海军上将类似于开源工具马拉松, so we are currently investigating porting our work over to leverage Mesos, Marathon, and DC/OS. If that work bears fruit, we’ll talk about it in a future article.

Networking with Docker



{“inbound”:[{“source”:“loot.loadbalancer:lolriot.ams1.euw1_loot”,“ports”:[“main”]},{“源”:“riot.offices:globalriot.earth.alloffices”,“端口”:[“main”,“jmx”,“jmx_rmi”,“bproxy”]},{源“:”hmp.metricsd:globalriot.ams1.ams1“,”ports“:[main',“logaUrous”]},{“source”:“platform.gsm:lolriot.ams1.euw1”,“端口”:[“main”]},{“source”:“platform.feapp:lolriot.ams1.euw1”,“端口”:[“main”]},{source“:”platform.beapp:lolriot.ams1.euw1“,”端口“:[”main“]},{”源“:”store.purchase:lolriot.ams1.euw1“,”端口“:[”main“]},{”source“:”pss.psstool:lolriot.ams1.euw1“,”端口“:[”main“]},{”源“:"championmastery.server:lolriot.ams1.euw1", "ports": [ "main" ] }, { "source": "rama.server:lolriot.ams1.euw1", "ports": [ "main" ] } ], "ports": { "bproxy": [ "1301" ], "jmx": [ "23050" ], "jmx_rmi": [ "23051" ], "logasaurous": [ "1065" ], "main": [ "22050" ] } }


{ "inbound": [ { "source": "loot.lootserver:lolriot.ams1.euw1_loot", "ports": [ "memcached" ] }, { "source": "riot.offices:globalriot.earth.alloffices", "ports": [ "sidecar", "memcached", "bproxy" ] }, { "source": "hmp.metricsd:globalriot.ams1.ams1", "ports": [ "sidecar" ] }, { "source": "riot.las1build:globalriot.las1.buildfarm", "ports": [ "sidecar" ] } ], "ports": { "sidecar": 8080, "memcached": 11211, "bproxy": 1301 } }

When an engineer changes this configuration in GitHub, a transformer job runs and makes API calls in Contrail to create and update policies for their application’s private network.

Contrail使用调用的技术实现这些私人网络Overlay Networking。In our case, Contrail usesgr计算主机之间的隧道和网关路由器来管理进入的流量并将覆盖隧道留出,并转到其余的网络。OpenContrail系统受到标准MPLS L3VPN的启发和概念性地非常相似。可以找到更多深度建筑细节这里


  • Integration between Contrail and Docker

  • Allowing the rest of the network (outside of rCluster) to access our new overlay networks seamlessly

  • 允许从一个群集允许应用程序与另一个群集交谈

  • 在AWS之上运行覆盖网络

  • 在覆盖层中构建HA边缘的应用


Max Stewart之前posted about Riot’s use of Docker in continuous delivery, which rCluster also leverages.




By this point we’ve talked through how we build and deploy features like Hextech Crafting, but if you’ve spent much time working with container environments like this, you know that's not the whole problem.



  • How do we monitor the application if its capacity and endpoints are changing all the time?

  • 如果它一直在变化,一个应用程序如何知道另一个的终点?

  • How does one triage application issues if you can’t ssh into the containers and logs are reset whenever a new container is launched?

  • 如果我在构建时烘烤我的容器,我如何配置我的数据库密码等事情,或者为土耳其vs北美切割了哪些选项?






第三部分:Part Deux:带OpenContrail和Docker的网络
第四部分:动态应用 - 微服务生态系统
第V部分:动态应用程序 - 开发人员生态系统
Part VI: Products, Not Services

发布由Jonathan McCaffrey