Uncategorized

>> ANNOUNCER: Please welcome Mark Russinovich >> MARK RUSSINOVICH: Good morning, everybody Good morning, everybody Try that again Thanks for coming It is Friday morning, a long week here at RSA conference My name is Mark Russinovich I’m the Chief Technology Officer of Microsoft Azure, and I’m here to talk about — let’s see if I can make the slide go back here About — okay, title slide is missing I’m here to talk about open source security That might seem an odd thing to some of you for a Microsoft person to be coming up here to talk about open source security, those of you that haven’t been watching what Microsoft has been doing with open source security Microsoft, for those of you that haven’t been paying attention, is deeply an open source company at this point But then how does that relate to me being the CTO of Microsoft Azure? Why am I here to talk about open source security? The reason is that when we take a look at Azure as a Cloud platform, a ton of it is built on top of open source We consume open source We also publish open source And so when you think about the Cloud provider’s responsibility to our customers, they don’t really care what we build our platform on What they care about is that we’re keeping them secure And that means that with whatever we use to build that platform, the buck stops with us If there is a problem in our own software that we create internally or a software that we’re consuming or a hardware we’re consuming from third parties, if there is a problem, customers expect us to deal with it and keep them secure And so this is very near and dear to my heart in terms of what can we do to make Azure more secure It leads into the supply chain of Azure, which includes open source I’m also here because the COVID-19 thing hasn’t gotten too crazy yet, so I thought I’d show up today and risk it But imagine a world where a hugely popular open source package gets compromised and people are consuming it and deploying it and operating it and the world becomes compromised because of this open source dependency It sounds like something that is theoretical, but, in fact, it has happened and it’s happened multiple times and it continues to happen These are just some of the examples of open source being compromised and then being consumed in that compromised state Some of you might have seen these These are both from the last two years Webmin, an incredibly popular web portal administration toolkit, this was compromised for a whole year with a DDoS backdoor and a credential stealing backdoor It allowed an attacker to get into the web server that’s being managed by this thing and deploy arbitrary software onto it Downloaded a million times during that year that it was compromised Rest-client This is a package that a lot of developers used to authenticate to rest APIs This was backdoored to steal the credentials that came as part of that authentication Event-Stream, a package that was downloaded 1,000 times or 1,500 times a week This thing was backdoored And then finally, VestaCP, another control panel platform for managing websites, this was backdoored in a source control system and also served as a DDoS malware distribution platform for the attackers that got into it These are all examples, and there are many more Here is an example of one timeline for a very popular open source package called Boostrap-sass This package has been downloaded in its lifetime 28 million times It was compromised you see on this date The package was compromised in the MPM repository — in Ruby Sorry, it’s a Ruby package It was compromised by an attacker, the package, that got control of it, removed the last good one, and then published one that had an exploit in it And you can see that at the time that it was discovered, at that point, there was no last known good to fall back on because they’d removed it And so at this point, the Ruby package management was scrambling to figure out how to get people that knew that they were compromised to get back into a secure state and a good one was eventually published But you can see a total of eight days where really anybody

dependent on the upstream Ruby package management system was in a state where they couldn’t get fixed if they downloaded the vulnerable package Event-Stream is an interesting one because this one has a story where it shows us we’re so dependent on people that we don’t know producing the software that we consume In the case of Event-Stream, the maintainer for this Event-Stream package — like I said, it’s incredibly popular – just felt he didn’t have time to maintain it anymore He got emailed by somebody saying, hey, you know what, I see you’re really busy I’m really passionate about this thing Why don’t you let me help? The maintainer gave management control to this volunteer This volunteer made some changes, published an update to it, everything looked good, and then they introduced after that a dependency, that dependency was a malicious dependency that was aimed at stealing Bitcoin for anybody that was at a company called Copay off of whatever wallets were sitting there on the servers that it happened to get installed on That’s an example of somebody trust — kind of getting trust inside the open source ecosystem supply chain and then being able to infiltrate and get a backdoor Fortunately, like this package is incredibly popular, but the fact is that it was only targeted at this particular Copay Bitcoin It wasn’t going after a general purpose compromise, in which case the effect would have been much broader Let’s talk a little bit about the agenda because I think the way that you have to frame the problem — and by the way, I want to make it clear that the problems that I’m talking about aren’t specific to open source in many cases They’re general software supply chain problems The fact is that open source is such a massive ecosystem that we need to go after it specifically and there are some specific implementation points in the supply chain that need to be addressed for open source Let me talk by framing the kind of whole problem about how we consume software as a supply chain, and then I’ll get into the various steps of that supply chain, what the issues are, examples, and what is happening for open source ecosystem to help address those points in the supply chain I’ll start here by talking about supply chain in the context of a food supply chain, which is one that we’re all familiar with A farmer is growing crops, a buyer is purchasing those crops, a distributor is taking and shipping the products into stores, and then a customer is buying them If you map this onto software, an open source software, you’ve got open source developers, there are source code repositories where they are storing their source code, there are application developers that are consuming the output of that source code, and then they are providing an end product to their customers This is the simplified form It is actually much, much more complex and this is what makes the problem especially challenging because along the supply chain there could be dozens, hundreds, or even thousands of participants at different points of different source code repositories, package management systems, consumers, products, and other distribution points There are many loops here as well But if you take a look at it from this framing, then we can go after specific parts of it The question when you’re consuming from the supply chain, and this one applies to us in Azure, is how do we prevent unwanted products from entering the supply chain How do we make sure that only packages and source that we’ve got some assurance are trustworthy enter our supply chain? How do we know what’s in our supply chain? Kind of that begs the question And how do we ensure that what we know about that supply chain is reliable? When somebody says this is trustworthy, hey, I’ve done code reviews on it, I’ve got MFA in place for checking the source code, how do we know that that’s really truly the case? And then what do we do when there is a vulnerability? How do I identify what is affected by that vulnerability and how do we roll back to the good version? Those are all the kinds of questions that we’ve got to answer when we take a look at this from a supply chain perspective First, how do we find vulnerabilities, how do we help the open source ecosystem find vulnerabilities in the source code at the source, kind of where the crops are being made? And if you take a look at it from a crop perspective, this means how do we end up finding that there’s E.coli somewhere in those crops before it gets out We need to stop it from even getting in there in the first place Or if it gets in, stopping that from getting down into the supply chain And when we talk about vulnerabilities in source code, this applies to open source and closed source as well, there could be credentials in the code, there could be improper validation of inputs, there could be ability for an attacker

to upload arbitrary code and for that to be executed by the code, and there could be denial service opportunities where you’re taking a dependency with a customer-facing piece of source code, piece of open source or closed source, and it has got a vulnerability that lets somebody bring down your whole service One of the ways that something can get into the supply chain that’s malicious is a technique called typo squatting This has been known as a problem, especially in the Node ecosystem where package names are extremely short Has anybody ever used one of these packages in your open source projects? Maybe you have Maybe you haven’t But they all kind of look generic They look like, if you’re looking for something specific, like I’m looking for the dateutil for Python 3, and you know that there’s a popular one out there, you might go and pull this one The fact is that these are variants on well-known packages that are popular in the Python community You can see cross-env, dateutil instead of Python 3 dateutil, and this kind of typo squatting is a way that we’ve seen compromised open source get into the supply chain There have been efforts to go figure out how we can prevent typo squatting There is something called the Lichtenstein Delta which looks at how close words are to each other and there have been proposals to say, hey, we don’t let packages come in that are too close to existing popular packages in terms of that delta But the fact is when it comes to Python and Node and others where they’ve got very short names, it makes it very difficult because this graph shows that a huge number of packages have very short distances between each other of well-known packages This is a tough, challenging problem, how do we keep developers from getting compromised versions of things that have names close to what they’re looking for Static code analysis is another very popular way to go and address vulnerabilities, looking for the vulnerabilities in the code, scanning automatically during a CI/CD pipeline, and really stopping the vulnerabilities from getting into the supply chain in the first place There are a few different open source efforts or products, services that open source publishers can take advantage of and open source consumers can take advantage of to make sure that vulnerabilities haven’t been entering them through this One of them is called Synopsys which looks at every line of code and execution path looking for vulnerabilities and flagging them You can see that it’s got a huge number of vulnerabilities you can see that it will check for As is common in many cases, for open source projects, this service is completely free This is something that’s just easy for somebody developing open source to take advantage of It’s why not do this if it’s going to make your code more secure Microsoft purchased GitHub a few years ago, and one of the purchases GitHub has made since that time is a company called Semmle Semmle is a source code vulnerability analysis platform based off of language that’s been called — renamed the CodeQL for code query language that lets you do variant detection on common vulnerabilities And so the idea here is that if a vulnerability is detected in a piece of source code of a certain pattern that you can write a CodeQL query that will find now just that specific version of the vulnerability but also variants that are close to it through the query language that has regular expression type syntax as part of it One of the services that GitHub is offering for open source developers is something called Security Lab which, for open source projects, is free, and for research projects for research is also free, which lets you go and scan open source projects using Semmle queries, CodeQL queries, a growing ecosystem of them There is a full ecosystem of CodeQL queries that look for very common and exist a kind of vulnerabilities that have already been seen in source code and lets you find them And then there is snyk which is a way to stop open source vulnerabilities from getting into your supply chain, ones that you might be consuming as a developer If you’re going to pull from a package and that package has a known vulnerability, this will let you know It will also scan your source code repos looking for package dependencies and let you know if you’ve got vulnerabilities And, again, this is something that is completely free for open source repos I want to show you a quick demo of CodeQL so you get an idea for the power of this for open source projects

I have got a GitHub repo here for a project called java-test, and you can see I’ve made a pull request on it which is labeled with a name that describes just how bad this thing is A lot of users submitted content If I click on that pull request, because I’ve enabled Semmle scanning or CodeQL scanning on this, those scans have identified a problem This pull request introduces one alert when merging, view it on lgtm.com, which is where the free CodeQL service is hosted And at this point, what you can see is it will show me right in the source code the vulnerability and the path of the vulnerability I’m not going to spend time diving into the source code here, but what is happening here is that it’s taking unsanitized input from the user, deserializing that, and then executing it That’s this execute right up here, Mona.dance It’s basically deserialized this untrusted input into this dance method This is very obvious, of course, but just here for demonstration purposes If I say show pass, this takes me and shows just the snippets of code from the source here where you get input stream right there, which is the untrusted input, into the deserialization, and then finally how that gets packaged up and made part of that method All of that is just right there at your fingertips You can see which queries you’ve enabled on your open source repo, and the built-in java queries that have been enabled on this one includes, for example, character passed to StringBuffer or StringBuilder constructor This comes with a description of the type of vulnerability being scanned for with an example of it as well as recommendations on how to avoid or fix these kinds of problems Very powerful tool Like I mentioned, it is an ecosystem of these kinds of queries that has been built up Here is the GitHub repo for the CodeQL queries that are available and a bunch of different languages available here for CodeQL to go query Again, completely free for open source development Let’s go back to the presentation One of the other types of vulnerabilities very common both in closed and open source software is uninitialized variables And one of the ways to fall into the trap of uninitialized variables is uninitialized heap and stack, which is come hit, especially native code, C and C++ code, very commonly Heartbleed is one great example of it where uninitialized buffer, you could trick the open SL server into passing a buffer back that was longer than it had been filled in with legitimate content, and then see what was beyond it, and what was beyond it could be information that would allow you to steal credentials and keys and compromise other users of that site There have been efforts here that have taken place The reason that I mention this, it’s going after eliminating entire bug classes and doing it in a way that gets to the heart of open source ecosystem supply chains, which include compilers like Clang And so this is an effort that’s been supported by Microsoft and Google and others to go and modify these to default to initialization One of the reasons why these compilers historically have no initialized is because of performance concerns Going and zeroing memory is a waste if you’re going to just immediately override it, but then you end up being susceptible for something like a Heartbleed The goal here is to change the default to initialize to zeroes, and then somebody that sees a performance issue with that understands that they have not exposed themselves by disabling it for particular initialization buffer can go and disable it if they need to for performance reasons Here is just an example of a tweet by one of my heroes, actually Just let me pull it back John Carmack himself getting bit by these things Number of days since uninitialized C++ variable caused me grief, zero This is something that even professional developers that have been in the business for a long time get hit by Another way to go find vulnerabilities is through a technique called fuzzing that I’m sure everybody in this room is aware of Lots of different types of vulnerabilities can be detected through fuzzing The question is how do I get access to a fuzzer and how do I get access to the platform that will fuzz my code automatically In the open source world, they’re Google stepped up here, and it’s created a platform called OSS-Fuzz This is a free service for open source projects

You go and request access to it You provide points to the repo You provide an email address with the maintainer And then once you get onboarded to it, you get — your code gets automatically fuzzed by this service, and you can see that it’s been incredibly successful, 17,000 bugs and 250 open source projects already detected through this fuzzing service It supports a number of different fuzzing engines It supports C++ It supports Rust and Go This is an example of the kinds of efforts that we really support to get the whole ecosystem in a better place Let’s talk a little bit about dependency management now because we’ve taken care of trying to keep the vulnerabilities from getting into the supply chain in the first place in those crops, but how do we then determine when there is a vulnerability that’s been exposed, where that came from, and what’s impacted by it This is all about supply chain Here is a great example of supply chain in action of open source ecosystem Electron, which is the rendering engine underneath a bunch of popular products, including Visual Studio Code, Atom, Slack, Discord This had a vulnerability introduced into it inadvertently that allowed the integration of no JS to be enabled By default, when you’re running on one of these platforms like this, it’s disabled, and it’s disabled because enabling it allows somebody accessing the interface to deploy arbitrary node code, JavaScript code, and executed as system in the context of that app The vulnerability that was introduced here allowed an attacker to override that control and turn the setting to allow execution of arbitrary node code and that would then allow the attacker to take over the site You can see the downstream dependencies on something like that Every one of these projects that depended on that upstream Electron with that vulnerability now had to go and remediate But the first question is: How do you even know that you’ve got this dependency? How do you know that there’s a vulnerability? And then how does the whole supply chain through all the dependency chains update in a way that gets everybody as secure as possible And one of the questions this highlights is — one of the issues this highlights is just how deep dependency chains are in the open source world Here is a little snipped of code where I am actually depending on the express web server This looks like I have introduced one single dependency Now all I have to do is monitor express, and if express is showing that it’s not got any vulnerabilities, I’m good But, actually, the story is a little bit deeper than that, because once I import express, I actually import a whole bunch of other things with it that it’s dependent on In fact, here is the dependency list for express It’s got 48 packages that it imports for its use And so what I’ve just done by importing express is import all of these I’ve imported all of the source code development practices of all of these packages I’ve imported basically all of the contributors to all of these packages as well and their intentions which are hopefully benevolent Now I’m dependent on all of that I came across this tweet last month This is NPM install, what somebody likens it to, because the node ecosystem is one of the ones that has the deepest and broadest number of dependencies, and the reason is because people just import little snippets, and so a single project could literally import thousands or tens of thousands of these little node packages to make up a reasonably sized piece of software In fact, let’s just do a survey here What’s the average number of packages trusted by installing one node package? Anybody want to take a guess? Forty? Anybody else? Sixty? The answer is 80, so not too far off The interesting trend we’re seeing with this in these ecosystems is that when something gets popular, it gets even more popular This is the average package reach over time You can see the packages in this ecosystem have gotten more popular The reach of these packages in terms of the number of services, other packages that take a dependency on them has continued to grow Eighty was just the average, but you can see the tail goes up

into the hundreds And here is the top five most referenced packages in the node supply chain You can see that you’re talking now 150,000 dependent packages on one of these Vulnerability in one of these and the downstream effect is pretty monstrous Here is another great example Do you remember when we took a picture of a black hole last year? That was kind of a cool event, and it was done all with open source software Here is the project, ehtim, eht-imaging The interesting thing to note about this is if you take a look at the number of contributors, 15 contributors on this project involved with developing a software that could go analyze these images and basically photograph a black hole But if you look underneath the hood, because of all the dependencies they took on other packages, there were really 24,500 contributors to this project This truly was a much broader ecosystem effort, really, if you take a look at everything that was used to build this It went way beyond the original 15 contributors So how do you know what your upstream dependencies look like? It depends on the type of software that you’re using, of course One of the tools in the node ecosystem for looking at this, which is a free tool to go use, it is an interactive tool where you can go take a look and get an idea for what you’re dependent on, but there are many automated tools out there for the different ecosystems that you should consider using Let’s go take a look at this one You can see here the URL is mpm.broofa.com Up here, you can enter a module name and it will go look at the dependencies it’s got and show them to you, and it will also show you the contributors that were involved in making those dependencies Anybody got a node pack — anybody here a JavaScript developer? Just raise your hands Okay Name a package that you use Singular? Angular. Okay. Angular I don’t know why that’s not coming up Zero dependencies I don’t know That doesn’t make sense Let me try and make sure this is working Yeah I don’t know why it came up with zero dependencies for Angular Maybe it’s fully self-contained But Express, which we were looking at earlier, 48 dependencies, 56 maintainers I was curious, like NPM itself, what that looks like When you run NPM for NPM install, you’re actually taking a huge set of dependencies, 380 dependencies, and the number of developers, 188 maintainers I want to make clear that that’s maintainers, not contributors The number of contributors is probably in the thousands or tens of thousands All of this isn’t feeding into your supply chain When you run NPM on your server, you’re dependent on all of that All right Let’s go back to the deck The takeaway for this section is automate your dependency mapping and your build pipelines, apply those tools that can go identify dependencies so you understand what you’re dependent on, which will come in use when you’re looking for vulnerabilities It also becomes useful when you’re looking at importing things and making sure that they adhere to the policies you define and that they’re also being maintained in a way that doesn’t allow vulnerabilities to get into your supply chain Let’s take a look now at build systems and package managers because this has been another area where in the open source ecosystem itself, this is kind of like going to the grocery store and compromising the product there by putting some chemical on it and now people are walking away with it Because the farmer did the right thing, made sure there’s no E.coli, the shipper got it to the store in okay shape, but at that point now the consumer is walking in and picking up a bad product because the attacker was in the grocery store Same thing can happen here There have been multiple examples here of the open source supply chain getting compromised at this point in the supply chain Here is an example here where an account was compromised and this allowed somebody to put a backdoor into a specific package, the rest client that I talked about earlier This rest client, incredibly popular The kind of cool thing about this is that the developer, the maintainer of the rest client package basically lived up and

stepped up and said, hey, this is my fault, here’s the hacker news thread where they’re like, hey, hey, everybody probably knows right now that rest client has been compromised, the reason why, I was reusing credentials, it’s a project that I started ten years ago before I had password manager, and that password was compromised through some other site, and I reused password and so that let somebody get in and compromise this rest client They were producing good software, but at this point, the attacker was able to get in and compromise the package in the Ruby-Gems package manager Here’s another example This one, though, goes right to the heart of the source code distribution for Ubuntu’s Canonical’s Ubuntu This isn’t a problem just for the guy that’s maintaining something in their spare time and kind of not applying best practices, but even major companies that are very involved with the open source ecosystem have major impacts on it can make these kinds of mistakes How do we help everybody step up here? The answer is to step up and to get into multifactor authentication Now all of the package managers, the popular package managers support multifactor authentication All of the popular source code repositories support multifactor authentication This is something that we absolutely have to start moving towards mandatory enforcement on It is just beyond the point where we can just allow people to introduce problems through the supply chain because they just feel that multifactor is too onerous for them because they’re having an effect on this massive downstream ecosystem at this point Another example where problems can show up is in build tampering The artifacts get produced, and at that point as the build system is packaging things up for release, the attackers compromise the build servers or the release management servers and introduce these problems The Webmin case that I talked about earlier was an example of this The source code in the repo, the master repo for Webmin and GitHub was fine, but in SourceForge, it wasn’t, and this allowed the attacker to get in and put a script in because they put a script into the build server to inject the backdoor into it This is an example of kind of in the middle of the supply chain the producer of the software is doing the right things, but the server was compromised The server could be compromised even if they were following all the best practices for securing that server But how do you detect that is kind of the question One of the tools that we’ve got at our disposal for detecting these kinds of problems is something called reproducible builds, and this is something that Microsoft is involved with pushing the industry towards Microsoft has been involved with reproducible builds for its own software for some time Windows is now reproducible, or at least large parts of it What reproducible builds give you is the ability to take some source code, and if somebody can rebuild it and know that given the compiler, given the artifacts that are pulled into it, that when they do a build, the artifacts are going to have specific hashes, then they can verify that if somebody says here is a legitimate build, here’s the hashes for it, they can verify that nothing has been tampered with by rebuilding from the source, that nothing got into the middle of that supply chain from build to release management Linux itself is moving towards the reproducible builds itself The problem with reproducible builds is it’s a very tough, challenging technical problem The build has to be entirely deterministic One of the challenges Microsoft had in making reproducible builds was getting time stamps out of the Windows binaries that were signed as part of the signatures And so if you take a look at the time stamps now in Windows binaries, they’re completely garbage, and that’s because they needed to be made garbage Effectively, they’re not time stamps anymore because we needed to get to reproducible builds And so that’s just one example of the efforts now the whole ecosystem needs to take to get to a place where we can have reproducible builds across all these different technologies Another way that here in the supply chain problems can go wrong is somebody yanking or compromising the distribution units, the package manager, and this is a great example of this one This package called left-pad, that’s a node package called left-pad It’s just about a dozen lines of code for left-padding a string, got incredibly popular It’s been leveraged by basically the whole web was dependent on it at this point back in 2016 The maintainer of that little snippet of code was — had

another package called kick and there was a product called kick The lawyers for that product said, hey, you’re violating our trademark The maintainer of left-pad said I don’t care So the lawyers went to node and said, hey, pull this guy’s kick package off of node with a legal order to do so in violation of that trademark Node pulled that kick package When the maintainer found out — left-pad and kick found out about that, they got upset and they said, okay, fine, I’m pulling everything that I’ve got off of the node package manager, and they were maintaining at that point about 100 different packages They pulled them all off, including left-pad, which ended up breaking the world And so at this point, you can see that Node took this very unusual step to unpublish — un-unpublish that package That’s down there in that tweet right there We took the unprecedented step of un-unpublishing I think at this point the maintainer wanted to un-un-unpublish the package, but they’d already lost control of it and so they couldn’t do that, unfortunately But this restored people’s build systems now that were so dependent on this left-pad little snippet of code The lesson here is to mirror your repositories This is an effort that we’ve been taking internally at Microsoft, and this is a very tough one, to go find all the places where people are pulling directly from public repos into their build systems versus setting up mirrors and pulling from the mirrors The fact is that mirrors have a couple of different very valuable properties One of them is availability If the upstream package repo has a problem, it goes down for availability reasons, or it goes down because it gets cut off from your build system for some reason, you’ve got a place to go to But the other kind of more serious reason for security purposes is if there is a compromise of that upstream system, you can shield yourself from it You’re one level of a direction and so you can sever and stop mirroring the compromised vulnerability In the case of that package that was pulled, you’ve got a copy of it locally In the case of that package, the Bootstrap-saas package that the developer pulled the last known good version, you have got a copy of it, so you can continue to build securely in the face of a compromise somewhere in the supply chain It is just a very good best practice to mirror your repos Now this next topic, I think it really gets to the heart of addressing this problem because we’ve talked about ways that developers can go scan their source code for vulnerabilities We’ve talked about the way that package managers — the way that package managers should use multifactor authentication and the way that you should be using mirrors if you’re consuming open source software to insulate yourself from problems at that point in the supply chain But one of the challenges we talked about earlier in the supply chain is really how do I make sure when I’m consuming things that it has followed best practices, that the developers have used MFA, that there was vulnerability scanning that was performed How do I check all that? How do I ensure that it was done with a reproducible build? And what were the reproducible build artifacts that were used to build this? All of these questions require the supply chain together to cooperate and having this information flow downstream to whoever eventually is going to consume it The effort that has started in this space is something called software supply chain, software bill of materials, which lets you answer these kinds of questions Where do things come from? Who produced it? Is there a strong identity? Do I trust them? That way I can say I trust these publishers What is the product itself? This is one of the challenges in open source ecosystem If you’ve ever seen vulnerability in package version blah, what’s the source code that fit into package version blah? Maybe I consume that directly How do I know that that’s been fixed? How do I know what other dependencies that source has, what other packages depend on that vulnerable piece of source? Those are all questions that having some kind of tracing of this source was used to build these things would let you go back up and answer those kinds of questions Licensing is one of the questions that enterprises have around when they’re consuming open source software or when

they’re modifying open source software and then redistributing it Is it MIT? Is it PSD? Is it — what license is it? Am I violating it? Am I being conformant with it? And then, like I mentioned, how is the product created? And then what materials were used to build the product? Software bill of materials which tracks all this information as it flows down the supply chain Let me give an example of how it might defend against an advanced persistent threat sitting on a build server Developer commits some code, the build system builds the artifacts in the normal path, the release management system takes those artifacts, packages them up, and then sticks them in a package manager or it releases it as part of a product In the malicious case, we have got somebody that’s tampering with those artifacts in the build system This is like the Webmin case, for example The release management system creates a malicious build from that release and then everybody downstream is compromised by that If we had software bill of materials in this flow, developer commits the code, the build system compiles the code, it generates and signs the software bill of materials and publishes the build artifacts The attacker then tampers with those build artifacts The release management creates a malicious release from that build It’s malicious because some of those have been compromised Now, downstream, the release management system attempts to verify that the malicious artifacts was against that signed SBOM and sees that the artifacts don’t match The ones that came from the build system don’t match the ones that were published by the release system At that point, it says, wait a minute, something has been compromised upstream, so I’m not going to release this Another example is just making sure that there are policy gates in place The software that I’m consuming had to go through specific code verification, either static vulnerability analysis or check for credentials in the code How do I know when I’m consuming the software that that’s the case? And, again, with software bill of materials in the mix here, along the way, they make sure that what’s being consumed downstream matches those policies We won’t release software that doesn’t adhere to these policies I don’t want to trivialize this effort If you take a look at what’s required here, it requires massive changes to infrastructure everywhere to support software bill of materials But the fact is that we, Microsoft, Google, others, have all come to the conclusion that this is absolutely necessary And so this massive industry-wide effort at this point to first step define a software bill of materials and then standardize it There are a couple of projects already in existence for this that could be suitable for this One is called SPDX, Software Package Data Exchange, which has declared a format It was originally targeted at licensing, tracking licensing through supply chain But they’re now looking at extending this to support the requirements for software bill of materials, including strong signatures and identity and policies that can go along with it Another one is called in-toto In-toto was created specifically to address or to target the security side of the supply chain with the software bill of materials where they’ve got different personas and those different personas can apply policies, can leverage the in-toto tools through their build systems or release systems to augment the artifacts with in-toto SBOMs, and those SBOMs then can be verified against policies using, again, the in-toto tools The goal here with all these companies looking at this is to come up with let’s come up with a standard SBOM specification and let’s get that adopted as an industry standard through this or some other means, and then let’s have the whole ecosystem start building tooling and putting this in their infrastructure so that we now have this information flowing for everybody I want to give you a quick demo of in-toto so you kind of — what it can do So I’ve got Visual Studio code here I’ve got a package here that you can see I’m using the in-toto run I’m going to clone a GitHub repo called demo project Jekyll I’m going to CD into it What happened right there, by the way, when I did that clone, in-toto, because it was monitoring it, is keeping track of every command along the way and the artifacts that were produced in that command, including their hashes Detailed tracking I’m going to move that into this folder called verification And the next step is I’m going to run a linter on it right

here, an HTML linter This would be me doing a static code verification on it Again, that is going to end up creating this linker artifact, linter artifact for that command, the same kinds of artifacts there that we saw annotations I’m going to copy that here into the verification folder And then the final step is to do a docker build on this thing to build it into a container And now I’ve got my docker build and I have the one last there in-toto artifact SBOM artifact that I’m going to drop into the verification folder if I can reach it from here There we go And now at this point, I’ve got all those artifacts there What I can do is run this in-toto verify on the whole project It looks like I’ve made a mistake What should happen is that this doesn’t find any errors I have made some mistake in the flow here But you get the idea Really, this is what this should come back with is the verification of the artifacts against those hashes I can even have a policy file that says I want these policies, which are also signed as part of those artifacts to be enforced, and in-toto would ensure that that’s the case That’s a quick look at in-toto Finally, let’s talk about responding to the threats Once there has been a compromise someplace, and there will be more compromises even with all of this When it comes to food compromise, everybody gets food safety alerts The goal is to find the original crop where the E coli came from and ban only everything that came downstream for that, not just all lettuce in general, ideally Same thing happens here is we need to understand both this impact, the severity of the vulnerability, as well as what parts of the supply chain are impacted by it This is part of tracking the contaminants through the supply chain Everything that I talked about earlier up to this point feeds into this, especially SBOM With SBOM, with the strong naming, with the hashes, with the source attribution, we can — ideally, in a world where everything is flowing through SBOMs, downstream, at your code, you can check and look at all of the BOMs that have come with the things that you’ve pulled dependencies on and go all the way back upstream with automated tooling back up to the source files I asked that question a little bit earlier When there is a vulnerability in a package and that vulnerability came from a piece of source code, what other packages depending on that source code, that is something that SBOMs ultimately can answer just through automated walking back through the chains of dependencies in the information in this metadata So just want to emphasize how useful this is Now that’s not to say that some of the other things we have already talked about that are in place and you can take advantage of today, like these automated dependency tools, to say this package I depend on, what projects of mine are leveraging it so that I can go make sure that they’re all patched and using the latest version, are useful in the — today before we get to this world of software supply chain But at least for the projects that you start up if you’re producing open source, recommend that you already start looking at getting SBOMs just for your own hygiene That’s software bill of materials So brings me to one point I want to make here, and that is the way that we look at open source security is this isn’t something that we want to differentiate on when it comes to Azure Really, our goal is to lift all boats in the whole ecosystem here This is why we’re working with a bunch of other companies, including some that I mentioned, on this problem and why I’m here to socialize this problem and to tell you that there are growing efforts across the industry that you should be on the lookout for chances to participate in The fact is that none of us are competing on the security of open source We all want to make it better Making it better keeps us — our software more healthy, our services more healthy, and keeps our customers secure and healthy as well Now just the concrete things you should take away from this today besides the fact, knowledge and the fact that things are happening and that you should be looking out for ways that you can participate more deeply in these things, like the SBOM effort If you’re producing open source software — actually, if you’re doing anything with anything in software or systems, enabled

multifactor authentication, run static analysis tools, onboard your project to reproducible builds, understand your dependencies And as a consumer, know what you’re consuming, automate the mapping of your open source project dependencies, learn more about software bill of materials And this is a very critical one, I think If you’ve got a mission-critical business where you need to be able to build something at a moment’s notice or you’re building product and you need to be able to update it regularly and you’re dependent on upstream open source is mirror those repos, mirror those package repositories That’s one of the big takeaways from that That brings me to the conclusion of the talk I hope you found this informative I hope this opened your eyes if you hadn’t been aware of the kind of issues here that we need to go address together, this is more real for you, and that now you’re walking away with something concrete to help your open source security posture and even proprietary security posture as well But looking for, again, big takeaway is participation from the whole ecosystem is required to get us to a better place With that, I want to thank you very much I hope you had a great RSA conference I hope you stay healthy yourself Thanks

You Want To Have Your Favorite Car?

We have a big list of modern & classic cars in both used and new categories.