亚搏在线登陆在骚乱运行在线服务:第一部分

My name is Jonathan McCaffrey and I work on the infrastructure team here at Riot. This is the first post in a series where we’ll go deep on how we deploy and operate backend features around the globe. Before we dive into the technical details, it’s important to understand how Rioters think about feature development. Player value is paramount at Riot, and development teams often work directly with the player community to inform features and improvements. In order to provide the best possible player experience, we need to move fast and maintain the ability to rapidly change plans based upon feedback. The infrastructure team’s mission is to pave the way for our developers to do just that - the more we empower Riot teams, the faster features can be shipped to players to enjoy.

当然,这比做了更容易!鉴于我们部署的多样性性质,我们出现了一系列挑战 - 我们在公共云中的服务器,私人数据中心和像腾讯和Garena这样的合作伙伴环境,所有这些都是在地理上和技术上的多样化。当他们准备发货时,这种复杂性就会造成巨额负担。That’s where the infrastructure team comes in - we’ve recently made progress in removing some of these deployment hurdles with a container-based internal cloud environment that we call ‘rCluster.’ In this article I’ll discuss Riot’s journey from manual deploys to the current world of launching features with rCluster. As an illustration of rCluster’s offerings and technology, I’ll walk through the launch of the Hextech Crafting system.

一点历史

当我7年前在Riot开始时,我们没有大部分部署或服务器管理进程;我们是一个具有重要想法,小预算的初创企业,需要快速移动。随着我们为联盟的生产基础设施进行了建立了基础设施,我们对游戏的需求进行了追加,要求支持我们开发商的更多功能,以及我们区域团队在全球新界发布的需求。我们手动站出服务器和应用程序,几乎没有考虑到指导方针或战略规划。

一路上,我们走向s leveraging厨师对于许多常见部署和基础架构任务。我们还开始使用越来越多的公共云,为我们的大数据和网络努力。这些演变触发了我们多次网络设计,供应商选择和团队结构的变化。

我们的数据中心包含数千台服务器,其中几乎每种新应用程序安装了新的服务器。新服务器将在自己的手动创建中存在VLAN.使用手动路由和防火墙规则,可在网络之间启用安全访问。虽然这个过程帮助我们安全和明确定义了故障域,但它仍然艰巨而耗时。为了复制这种设计的痛苦,当时的大多数新功能都被设计为小型网络服务,因此我们的生产中的独特应用数量LOL Ecosystem飙升。

在此之上,我们的开发团队对自己的应用程序的能力缺乏信心,特别是当它在部署时的配置和网络连接等问题时。拥有如此密切地绑定到物理基础架构的应用程序意味着生产数据中心环境之间的差异未在QA,暂存和中复制PBE。Each environment was handcrafted, unique, and in the end, consistently inconsistent.

虽然我们在生态系统中进行了这些挑战,但在生态系统中具有越来越多的应用程序,Docker.开始在我们的开发团队中获得人气作为解决配置一致性和开发环境困境的问题的手段。一旦我们开始使用它,就很明显我们可以使用Docker做更多更多,并且它可能在我们接近基础设施的方式发挥关键作用。

季节2016年及以后

The infrastructure team set a goal to solve these problems for players, developers, and Riot for Season 2016. By late 2015, we went from deploying features manually to deploying features like Hextech Crafting in Riot regions in an automated and consistent fashion. Our solution was rCluster - a brand new system that leveraged Docker and Software Defined Networking in a micro-service architecture. Switching to rCluster would pave over the inconsistencies in our environments and deployment processes and allow product teams to focus squarely on their products.

让我们潜入技术有点探讨RCLUSTER如何支持幕后的HEXTECH制作等功能。对于上下文,Hextech制作是联盟中的一个功能,为玩家提供了一种新的方法解锁游戏内容

该功能在内部已知为“Loot”,并且由3个核心组件组成:

  • 战利品服务- 通过HTTP / JSON REST API提供LOOT请求的Java应用程序。

  • Loot Cache.- A caching cluster using Memcached and a small戈兰旁章用于监视,配置和启动/停止操作。

  • 抢劫DB.- 具有主设备和多个从站的MySQL DB群集。

When you open the crafting screen, here is what happens:

  1. A player opens the crafting screen in the Client.

  2. 客户端是RPC.call to the frontend application, aka “feapp” which proxies calls between players and internal backend services.

  3. feapp调用Loot服务器

    1. Feapp在“服务发现”中查找战利品服务,以查找其IP和端口信息。

    2. FeApp使HTTP调用LOOT服务。

    3. The Loot Service checks the Loot Cache to see if the player’s inventory is present.

    4. The inventory isn’t in the cache, so the Loot Service calls Loot DB to see what the player currently owns and populates the cache with the result.

    5. LOOT服务回复了接听电话。

  4. Feapp将RPC响应发送回客户端。

Working with the Loot team, we were able to get the Server and Cache layers built into Docker containers, and their deployment configuration defined in JSON files that looked like this:

Loot Server Json示例

{“姓名”:“euw1.loot.lootserver”,“服务”:{“appname”:{loot.lootserver“,”位置“:”lociot.ams1.euw1_loot“},”容器“:[{”图像“:"compet/lootserver", "version": "0.1.10-20160511-1746", "ports": [] } ], "env": [ "LOOT_SERVER_OPTIONS=-Dloot.regions=EUW1", "LOG_FORWARDING=true" ], "count": 12, "cpu": 4, "memory": 6144 }

Loot Cache Json示例

{“姓名”:“euw1.loot.memcached”,“服务”:{“appname”:“loot.memcached”,“位置”:“lociot.ams1.euw1_loot”},“容器”:[{“名称”:"loot.memcached_sidecar", "image": "rcluster/memcached-sidecar", "version": "0.0.65", "ports": [], "env": [ "LOG_FORWARDING=true", "RC_GROUP=loot", "RC_APP=memcached" ] }, { "name": "loot.memcached", "image": "rcluster/memcached", "version": "0.0.65", "ports": [], "env": ["LOG_FORWARDING=true"] } ], "count": 12, "cpu": 1, "memory": 2048 }

但是,为了实际部署此功能 - 并且真正在减轻早期概述的痛苦方面取得了进展 - 我们需要创建可以在北美,欧洲和亚洲等地支持世界各地的Docker的集群。这需要我们解决一堆难题,如:

  • 调度容器

  • Networking With Docker

  • 连续交货

  • 运行动态应用程序

随后的帖子将更详细地潜入rcluster系统的这些组件,因此我将在这里简要地触摸。

Scheduling

We implemented container scheduling in the rCluster ecosystem using software we wrote called Admiral. Admiral talks to Docker daemons across an array of physical machines to understand their current live state. User make requests by sending the above-mentioned JSON over HTTPS, which Admiral uses to update its understanding of the desired state of the relevant containers. It then continually sweeps both the live and desired state of the cluster to figure out what actions are needed. Finally, Admiral makes additional calls to the Docker daemons to start and stop containers to converge on that desired state.

如果集装箱崩溃,海军上将会看到所需的活V可见,并在另一个主机上启动容器以纠正它。这种灵活性使得管理我们的服务器更容易,因为我们可以无缝地“排除”它们,进行维护,然后重新启用它们以获取工作负载。

海军上将类似于开源工具马拉松, so we are currently investigating porting our work over to leverage Mesos, Marathon, and DC/OS. If that work bears fruit, we’ll talk about it in a future article.

Networking with Docker

一旦容器运行,我们需要在Loot应用程序和生态系统的其他部分之间提供网络连接。为了做到这一点,我们利用了opencontrail.要为每个应用程序提供私有网络,并让我们的开发团队在Github中使用JSON文件管理他们的策略。

LOOT服务器网络:

{“inbound”:[{“source”:“loot.loadbalancer:lolriot.ams1.euw1_loot”,“ports”:[“main”]},{“源”:“riot.offices:globalriot.earth.alloffices”,“端口”:[“main”,“jmx”,“jmx_rmi”,“bproxy”]},{源“:”hmp.metricsd:globalriot.ams1.ams1“,”ports“:[main',“logaUrous”]},{“source”:“platform.gsm:lolriot.ams1.euw1”,“端口”:[“main”]},{“source”:“platform.feapp:lolriot.ams1.euw1”,“端口”:[“main”]},{source“:”platform.beapp:lolriot.ams1.euw1“,”端口“:[”main“]},{”源“:”store.purchase:lolriot.ams1.euw1“,”端口“:[”main“]},{”source“:”pss.psstool:lolriot.ams1.euw1“,”端口“:[”main“]},{”源“:"championmastery.server:lolriot.ams1.euw1", "ports": [ "main" ] }, { "source": "rama.server:lolriot.ams1.euw1", "ports": [ "main" ] } ], "ports": { "bproxy": [ "1301" ], "jmx": [ "23050" ], "jmx_rmi": [ "23051" ], "logasaurous": [ "1065" ], "main": [ "22050" ] } }

Loot缓存网络:

{ "inbound": [ { "source": "loot.lootserver:lolriot.ams1.euw1_loot", "ports": [ "memcached" ] }, { "source": "riot.offices:globalriot.earth.alloffices", "ports": [ "sidecar", "memcached", "bproxy" ] }, { "source": "hmp.metricsd:globalriot.ams1.ams1", "ports": [ "sidecar" ] }, { "source": "riot.las1build:globalriot.las1.buildfarm", "ports": [ "sidecar" ] } ], "ports": { "sidecar": 8080, "memcached": 11211, "bproxy": 1301 } }

When an engineer changes this configuration in GitHub, a transformer job runs and makes API calls in Contrail to create and update policies for their application’s private network.

Contrail使用调用的技术实现这些私人网络Overlay Networking。In our case, Contrail usesgr计算主机之间的隧道和网关路由器来管理进入的流量并将覆盖隧道留出,并转到其余的网络。OpenContrail系统受到标准MPLS L3VPN的启发和概念性地非常相似。可以找到更多深度建筑细节这里

当我们实施这个系统时,我们必须解决一些关键挑战:

  • Integration between Contrail and Docker

  • Allowing the rest of the network (outside of rCluster) to access our new overlay networks seamlessly

  • 允许从一个群集允许应用程序与另一个群集交谈

  • 在AWS之上运行覆盖网络

  • 在覆盖层中构建HA边缘的应用

连续交货

Max Stewart之前posted about Riot’s use of Docker in continuous delivery, which rCluster also leverages.

对于LOOT应用程序,CI流程如下:

这里的一般目标是,当更改主仓库时,会创建一个新的应用程序容器并部署到QA环境。通过此工作流程,团队可以在其代码上快速迭代,并查看在工作游戏中反映的更改。这种紧密的反馈循环使得可以快速地改进体验,这是骚乱玩家专注于球员的重要目标。

运行动态应用程序

By this point we’ve talked through how we build and deploy features like Hextech Crafting, but if you’ve spent much time working with container environments like this, you know that's not the whole problem.

在RCLUSTER模型中,容器具有动态IP地址,并且不断上下旋转。这是一个完全不同的范式,而不是我们以前的静态服务器和部署方法,因此需要新的工具和程序是有效的。

提出的一些关键问题是:

  • How do we monitor the application if its capacity and endpoints are changing all the time?

  • 如果它一直在变化,一个应用程序如何知道另一个的终点?

  • How does one triage application issues if you can’t ssh into the containers and logs are reset whenever a new container is launched?

  • 如果我在构建时烘烤我的容器,我如何配置我的数据库密码等事情,或者为土耳其vs北美切割了哪些选项?

要解决这些问题,我们必须建立一个微服务平台要处理服务发现,配置管理和监控等内容;我们将深入了解有关该系统的更多详细信息以及在我们的最后一系列中,它对我们解决了我们解决的问题。

Conclusion

我希望这篇文章能够概述我们试图解决的问题,以使骚乱更容易提供球员价值。如前所述,我们将遵循这一系列专注于Rcluster使用调度,网络与Docker和运行动态应用程序的文章;由于这些文章被释放,请在此处更新链接。

如果您在类似的旅程中,或者想成为谈话的一部分,我们很乐意在下面的评论中收到您的意见。


有关更多信息,请查看此系列的其余部分:

第一部分:简介(本文)
第二部分:调度
第三部分:带OpenContrail和Docker的网络
第三部分:Part Deux:带OpenContrail和Docker的网络
第四部分:动态应用 - 微服务生态系统
第V部分:动态应用程序 - 开发人员生态系统
Part VI: Products, Not Services

发布由Jonathan McCaffrey