Trading in the Public Cloud: Panelist Q&A

By Larry Tabb, Larry Ryan, Greg Allen and Evan Bauer, BJSS

Security and Resiliency are paramount concerns for Trading - what are the key challenges and possible solutions in this context in your opinion? And where do you see the role of Regulation?

Larry Tabb: People are worried about security, but I think to a certain extent, the Public Cloud guys have been on top of security, probably even more than the private data centre. The security needs to be configured properly and people need to be on it, but I’m not sure if this is any more or less of a challenge than having your own data centre.

I think the question would be what’s the confidence in AWS or Google or wherever the Cloud provider is to provide a secure and reliable environment? Can you insulate my stack of infrastructure and the exchange stack of infrastructure from everything else, such as Netflix swallowing all bandwidth when a new movie is released? And I think the answer is probably yes.

Hopefully the Cloud Service Providers (CSPs) can shift workload in a heartbeat if a set of servers, for instance, goes down. That may mean that they need to create a financial services Cloud to support seamless failover in a secure environment. It might not be a bad idea having one major data centre in the New England area and another one mid-Atlantic, or elsewhere.

To a certain extent, the CSP’s main business is providing the hosting centre and the security facility, whereas a bank’s main job is to either trade or make loans or process deposits or, do the business of banking. Their data centre and their Cloud business are an adjunct facility to a bank. Yes, it’s core to their business, but it’s not core to how they make money. However, the Cloud infrastructure is absolutely core to how Google, Oracle, AWS and all these CSPs make money.

Larry Ryan: Cloud Service Providers (CSPs), such as Amazon Web Services (AWS), Azure and Oracle Cloud Infrastructure (OCI) provide a number of services that enables development of secure and resilient systems. From a resiliency standpoint, these CSPs offer multiple locations distributed globally with the ability to quickly, and in some cases transparently, shift workload between data centres. Azure and AWS have published white papers recommending best practices in order to comply with Reg SCI. From a security standpoint, these CSPs also offer a number of access control features and features to encrypt and protect data at rest and in transit.

Greg Allen: In particular, CSP deployed resources are more secure than our data centres because the CSP have a far bigger budgets to spend on cybersecurity, and they’re privy to patches before us, often before we even hear about the exploits.

For low latency trading platforms. It really comes down to the right application architecture and availability of a low latency interconnect. As far as the role of regulators goes; that really doesn’t change. In effect, this is an outsourcing arrangement; the exchanges are still regulated in terms of security and resiliency. The fact that you outsource services to a CSP doesn’t change that commitment.

Evan Bauer: The application has to be architected to take advantage of Cloud infrastructure appropriately, existing applications designed for on–premise deployment don’t do. They don’t take advantage of the unique resiliency and the security capabilities of the Cloud. As has always been the case, infrastructure architecture and application architecture need to be seen together. Furthermore, there’s a real opportunity to define your infrastructure as code, i.e., to compose your infrastructure. This allows integrated testing and deployment of infrastructure and application together. As one example, OpStack measured extraordinary variance in the performance of identically provisioned cloud VMs, requiring a deployment strategy that tests each VM before start-of-day and discards the rejects – this is a tactic unique to the Cloud. The use of Public Cloud infrastructure provides new opportunities, challenges and unique options.

Larry Tabb: It will be interesting to observe how the regulators manage the CSPs going forward. God forbid AWS or Microsoft or Oracle or whatever goes down. There could be a wide swath of infrastructure across all industries to go down. Years ago, and today, trading venues deploy fault-tolerant architectures that they controlled. In the world of Cloud, the CSPs provides that kind of redundancy, which means it is operating under one infrastructure, which is a central point of failure. Eventually, the CSP data centres will be audited by regulators. The CSPs may want to corner off these areas because they don’t want the SEC, Bank of England and FSA wondering around their data centre.

Are we differentiating between Private Cloud vs Public Cloud here? What specific benefits are we expecting to be driving the move to cloud deployment?

Larry Ryan: One key benefit is broadening market access. Today market access is through dedicated proprietary connections between market participants and the exchange. A CSF removes this physical constraint. In addition, you can collocate CSP services, such as machine learning models to discover opportunities and assess risk.

Evan Bauer: I think that for trading systems because of their variety of sensitivities, deployments will straddle on-premise and CSP. The real advantage to the Public Cloud is for those pieces where you don’t need guaranteed extraordinary low latency. You have access to vast reserves of capacity available in a reasonable time period. The buy side can now use variable OpEx to continue their arms race – but now the sell-side can take advantage of the Public Cloud’s cost advantage without making uncomfortable capital-intensive investments. Another advantage to all participants is access to a variety of scalable services, such as database-as-a-service, which you can scale up when you need to perform specific functions, such as analysis of trade history and then scale down when not in use. These services are easily deployed and orchestrated via application programming interfaces (APsI).

Larry Ryan: The fundamental reason for markets is to facilitate the safe and fair exchange of assets, but there are other market participants, such as high frequency and day traders, trying to make money on these transfer by application of trading models and faster response to market changes. When considering new approaches to trading, such as migrating trading to the Public Cloud, we need to keep this idea in mind.

Greg Allen: I think the buy side is probably going to be ahead of the sell side on this because they probably are already in the Cloud and a bit more progressive thinking in this kind of aspect.

The main value is it levels the playing field and reduces the barrier to entry. It just becomes a commodity or utility. In fact, this trading utility enables participants to return to their business of asset transfer instead of the technology implementation.

Two questions: 1) Deterministic latency seems to be solved, how about security and 2) Cloud means “servers” can be anywhere, how is timestamping going to work with CATS?

Please read the discussion around the first part of this question in the section above. That discussion comprehensively answers this question.

Larry Tabb: I think the time stamping of CATs is going to happen and it’s going to be multiple timestamp sources. The Exchanges will timestamp the message when it’s received, when it’s processed and when it’s responded to. The participant also timestamps the message multiple times, but these measurements may not align with the timestamps recorded by the Exchanges. CATs will consolidate this and allow a message replay from different perspectives. You can only hold each participant accountable for the timestamps that they can act upon.

Larry Ryan: Clock synchronization is an important consideration for CATS, and for MiFID II. MiFID II requires a one microsecond clock precision and clock accuracy < 100 microseconds for a low-latency trading environment. About 18 months ago, we measured the clock accuracy between two different servers in a CSP and discovered that 84% of the time the clocks differed by < 100 microseconds. 99% of the time, clock accuracy between two servers were < 250 microseconds. Stanford University and Google have developed a clock sync protocol that they claim keeps accuracy to < 1 microseconds for servers in the same data center.

Larry Tabb: 84% is not really that great. It’s good, but you really need to get up to 99%.

With just several key Cloud providers, Cloud is (to be very soon) deemed systematically important – what is the way forward? Cloud as (regulated) utility, decentralization with in-built resiliency, more provides (multicloud ++), etc?

Larry Ryan: Is the market fair is an important question to ask. How does an exchange prove fairness in a shared responsibility model where they don’t fully control the infrastructure? I don’t think this requires dedicated kit in specified locations. I think a trading system can be deployed in a CSP and support a fair market.

Greg Allen: A lot of angles on this but I basically agree. Regulators don’t really care about the technology; they regulate the exchanges. And exchanges are regulated to provide a fair and a reliable market. If you decide that you’re going to outsource parts of your market, you’re still responsible for providing a fair reliable market. So, if an exchange decides to run their market in the cloud, it’s just a form of outsourcing, nothing changes. The regulators don’t really care what you’re doing behind the scenes as long as that market is fair, secure and reliable.

Evan Bauer: Traditionally, in the last 12-15 years, it’s been an arms’ race around reducing latency to access pricing and submit orders into the matching engine. If the regulatory mandate for fairness is migrating competition towards being smarter, not faster, than in fact investments in hardware will go down and instead be allocated to the quality of the mathematics and computer science that go into trading strategies. It starts to look like greater investment in people and in AI, as compared to investment in processors and switches.

Greg Allen: I think that is exactly the case; the arms race is over, effectively, and we’re moving towards better algorithms, where the Cloud provides more value to algorithm. It’s about levelling the playing field and removing the speed race, at least in a continuous market. In this environment, you can get close to fairness, level the playing field and lower the barrier to entry. You don’t have to have a huge budget to buy a bunch of specialized hardware and put it in an expensive ‘colo’ location. Instead, you install your algorithms onto the trading platform, i.e. Platform as a Service (PaaS), and you start trading. It comes down to algorithms, maybe executing quicker, but offering a differentiating business value proposition that attracts your customers.

Evan Bauer: Managing your risk, especially in today’s volatile market, is also a key differentiator. Even for markets where the actual trading engine needs to co-locate with the venue, the use of Public Cloud services to host risk management technologies can speed innovation and reduce infrastructure costs. Most risk management systems are not busy 24x7x52 and can be tens of milliseconds away from the matching engine – a perfect match for public cloud implementations.

Larry Tabb: Multi-cloud makes sense given that AWS is such a big competitor, firms want multiple sources of Cloud services, but firms are also concerned about physical gaps between infrastructure. And maybe we’ll see some of these CSPs share data centers, but then you want to ensure there is sufficient isolation between them, such as connected to different power grids.

Have you thought of approaching Public Cloud vendors to offer an industry-focused private Cloud region, similar to GovCloud in AWS? Cloud vendors like AWS provide the option to store the data to dedicated or reserved instances which mitigates the risk of sharing data in Public Cloud, do you think that gives some comfort moving into Cloud infrastructure?

Larry Tabb: I’m not aware of any CSP creating an industry focused Private Cloud, such as GovCloud. It’s not a bad idea.

Gregg Allen: We have not approached a CSP, but they are asking that question, and in effect, this becomes more like a Private Cloud. It’s essentially a private area in in a CSP infrastructure that includes seclusion of data and resources. CSPs seem to have the edge on security and resiliency, simply because of their scale, but they seem to lag a bit when it comes to low-latency interconnects.

OCI seems to be ahead with their low-latency interconnect because it serves their high-performance database, but they’re all going in the same direction.

High-performant trading engines (matching engines) need to be deployed on dedicated bare-metal single memory space colocation in order to handle the high throughput and low-latency requirements of trading. They are flat out all day and it does not make sense to scale them during the trading day or use virtual machines (VM). These systems don’t need a lot of hardware and perform all work in memory. Therefore, elasticity does not provide much value.

Evan Bauer: Greg is talking about the matching engine, which is about 5% of the allocated CPU and memory of the overall infrastructure. Other elements, such as the trade history database, testing of trading algorithms and risk models, require the majority of compute resources. Expensive commercial software can be migrated from license-based deployments to the Cloud and charged based on usage for substantial cost savings. Also, the peak compute capacity required for risk management is typically only required for 8-15% of the trading day. Cloud allows you to consume CPU as needed and allows smaller trading firms to run their models at capacities they could never before afford, avoiding capitalizing millions of dollars of hardware. The Public Cloud is like leasing a car. Instead of buying a car, you can lease the latest model for a fixed duration. When the next Ferrari comes out, you don’t have to lose money on the trade, just return the leased vehicle and get the next one. And if you occasionally need a truck, you can rent one on-demand.

Larry Ryan: CSPs have recently made progress in Low-Latency interconnects. OCI supports RoCe, Azure supports InfiniBand and AWS has built a proprietary low-latency network that is exposable through APIs, such as Libfabric. We’ve conducted raw speed tests and measured latency in the single microsecond level for OCI and Azure.

Storing data on a reserved or dedicated instance does not enhance data protection. You can restrict access to data by configuring Firewalls and network access lists (NACLs), but a common best practice to protect data in a CSP is to encrypt it in transit and on disk. CSPs can maintain keys in a cloud-based Hardware Security Module (HSM) or request keys from an on-premise HSM. The advantage of keeping keys on-premise is you control data access. Encrypting / decrypting data increases latency, however the latency increase is equal across all participants and does not impart an unfair advantage to any participant.

Evan Bauer: As an analogy, if you consider different types of sailboat racing. There is the America’s Cup where millions are invested to gain a unique advantage (carbon fibre masts, stiff sails...). Alternatively, Olympic class sailing, like the Lightning, everyone races with boats that comply to a precise specification. What matters is how good is the sailor.

Co-location was mentioned before. Given that, how will it work with buy-side clients? Will Sellside participants pull their buy-side clients to the Public Cloud or will it be the other way around?

Larry Ryan: The question is who initiates the migration to a CSP-based trading system? Market places attract traders because they host pools of liquidity, but at least in the United States, there are multiple trading venues. It’s easy for participants to redirect flow to a different market if they don’t like the Exchanges value proposition. Will the buy-side drive this change because they direct flow? Fee structures and Reg NMS best price heavily influence flow.

Larry Tabb: The sell-side would move first because they create the algorithms, are members to the exchanges and they clear the transactions. The buy side is not really the executing party and they instruct the brokers to do what they need to do. So, the Exchange would probably need to be in the Cloud first. That would then drag the brokers and then the buy side may or may not get there.

The other question is would the brokers go there without the Exchanges and that would mean they would need very high-speed connectivity between their Cloud and the physical data center at NASDAQ, NYSE or wherever exchange.

Given the current discussion on Cloud servers, any distinction between getting IaaS “servers” or a full solution or SaaS?

Larry Ryan: We were able to migrate a production trading system to OCI in < 1 week using an IaaS approach. In my experience, moving on-premise systems to the Cloud is doable, but as Evan mentioned to take full advantage of the Cloud you need to reframe your application and infrastructure architecture into a cohesive architecture. When you do this, you can start adopting Platform as a Service (PaaS) and Software as a Service (SaaS) capabilities provided by CSPs and through CSP marketplaces. But this needs to be configured and deployed in a manner that supports a fair market.

Greg Allen: I’ve got my own ideas about building a fair market that might be considered a bit radical. This is going to be running on bare metal in the cloud. You solve fairness problems by running everything in a single memory space on a relatively large server. That single memory space guarantees fairness and the exchange really becomes the platform provider, i.e. Platform as a Service (PaaS). This approach solves the timestamp problem and the latency problem by eliminating the cable length issue. There are other issues to be considered, such as security within a single memory space.

Some trading firms are concerned about the “race to the bottom” in terms of speed. If the Cloud can give competitive speed performance (and it sounds as though it can) how does this affect the current drive to implement speed bumps?

Larry Tabb: Speed bumps are an interesting phenomenon, but they’re not very popular, at least in deterministic high-speed market. For example, IEX has 2% market share, at most and the New York Stock Exchange implemented an IEX type speed bump and they pulled it out. Symmetric speed bumps have been implemented in FX and in some of the over the counter markets, but that’s more like a last look type market speed bump, and I think that can be implemented, you know, the same.

Asymmetric and Symmetric speed bump is just software, and therefore I don’t think would be a significant issue when migrating trading to the Cloud. The biggest and more difficult issue will be market access: is there a way to gain a speed advantage? You could achieve an advantage by making your software better, faster and more intelligent, but your colocation advantage would probably go away. How you access your market data maybe interesting; there might be things you can do there.

Gregg Allen: I don’t think about speed bumps anymore, in a continuous exchange-provided PaaS using a single memory space, the speed advantage is levelled so the purpose of speed bumps goes away. Alternatively, you can move to an auction-type market.