Uncategorized

understanding content marketing, content management key points

the most important aspect of content fundamentals is really defining and understanding the importance of content marketing so there’s a lot of noise about content marketing today everyone’s talking about it there’s a ton of content out there about it but really a lot of marketers are not totally sure how to define content marketing and how to really fit it as part of their marketing strategy so content marketing is the process of creating valuable relevant content to attract acquire and engage your audience and now this is just one of the many possible definitions on content marketing but it’s one that I think really defines and exemplifies some of the great aspects of what content marketing can do for your business businesses today need content marketing it’s a critical aspect of being visible in today’s digital busy world so why there’s an abundance of information out there on the internet for your customers your customers are sifting through thousands and hundreds of messages each day whether it’s doing their own research online you know push messaging that’s coming across from marketers emails that are coming in to their mailbox there’s a ton of information out there and it’s extremely difficult for you as a marketer to get heard through all of that noise we also live in a multi-device world your customers move seamlessly through their laptops to their phones to their tablet they can carry information around with them in their pocket and access it any minute at any time in a 24/7 world so because there is such an access to information and tons of these multi devices it’s difficult for you as a marketer to get heard from your customers and we live in a multi-channel world as well so not only are your customers accessing information on multiple devices but they’re also accessing information across channels so what do I mean by that your customers are on social media they’re on search they’re on your website they’re on review sites like Yelp and Google there’s really a ton of different channels today that your customers live and as a marketer it’s important to be present and be relevant on all those channels so what does all of this mean all this information these multi devices multi channels it’s a very complex world out there and all of that equals attention scarcity what that means is that your customers attention is really divided amongst all of these you know different areas and therefore the attention that they actually have for you and your messaging is scarce but content marketing helps you actually break through that noise because you’re providing Salt leadership and you’re differentiating your brand from all the other brands out there by creating valuable relevant and educational content you can be seen in front of your customers on those different channels on those different devices and really become that trusted resource through all of that information content marketing has these seven qualifications so as you’re thinking about planning out your content marketing strategy and as you’re thinking about content marketing in general for your business make sure that these seven elements are present so number one content marketing engages individuals on their own terms so this means being available and being relevant where your audience lives so that’s that concept of multi-channel multi-device your content marketing is based on interactions with your buyers so you should be creating content that is relevant to what your buyers want to hear and what they’re searching for your content marketing should tell a continuous story your content should tell not only your business story but also the story of your customers and it really does have to be a story it has to be a narrative that’s interesting and engaging make sure that content marketing is the right fit for your channels so you need to create different types of content for different channels one type of content that’s great for your social media channel might not be right for email marketing or a paid program that you’re doing so make sure that the content that you’re creating is you know you think about being channel specific number five your content has a clear purpose so this means that all content that you create have a goal should have a purpose in mind you’re creating content for thought leadership brand awareness lead generation all of these things make sure it has a purpose and that leads me to number six your content marketing has predefined metrics you should be creating content with end goals in mind whether that’s X amount of new customers

maybe its new followers on your social networks whatever your predefined metrics should be make sure you determine those before you go in and actually create that content and number seven content marketing is almost always evergreen so this means keep in mind your entire life cycle when you think about your different content content should last month’s content should last years it’s not totally always like this like for instance sometimes you’ll create a piece of content on a trend related item for social media that might not be evergreen but your large pieces your ebooks you want to think about content themes and arcs that span multiple quarters so that you can get the most bang for your buck so just some additional stats to keep in mind as you’re thinking about your content strategy and as we define content marketing further 71 percent of consumers trust solutions that provide useful information without trying to sell something so your content should be educational it should be thought leadership for the purpose of building that trust in building those relationships over time 62 percent of consumers trust solutions that provide information and best practices for tools they’ve bought you want to teach your customers how to use your products and services better you want to teach them best practices and you want to educate them this over time will really enable that lifelong value and that lifelong relationship building from your customers given our definition of content marketing and these seven qualifications take some time to sit down and determine how you can fit your content marketing strategy into each of these seven items by really mapping out your reasons for content marketing and defining what content marketing means for your business you can truly create a content marketing strategy from A to Z understanding the benefits of content marketing is critical to creating a content marketing strategy that works for your business a lot of the marketing activities that you partake in with your company such as attending a tradeshow doing a banner ad paying for a pay-per-click ad this is all essentially renting attention you’re paying a fixed fee to rent the audience of another vendor for instance if you’re at a tradeshow you’re paying a certain amount of money to have a booth and to get the benefits of that audience for that tradeshow in contrast content marketing is like owning your own attention you’re creating that thought leadership in-house you’re creating that content in-house even though initially there might be a spend to begin with you are great gaining your own attention over time you’re promoting your own content out you’re creating your own audience so it’s essentially like owning your own attention you’re not continuously spending money on renting other people’s thought leadership in the audience that they have built brands today need to become their own publishers and content marketing is how to do that and create your own Salt leadership because your audience is out there doing their own research they’re downloading information off online and why not be the place where your customers are finding that information why not be the creators of that research and that’s how brands today become their own publishers so what are some additional benefits for content marketing benefit number one brand awareness up to 93% of the buying journey starts with a search online content builds organic awareness through search and social you want to be available when your buyers are searching for you they are doing that research so you want to make sure that your content is front and center and by creating content you build that brand awareness so that when your buyers do search your content in your educational materials come up benefit number two create brand preference thought leadership builds trust in brand preference people are more likely to purchase from companies who they trust by creating that content in becoming that educational resource for customers you start to build that relationship and build that trust therefore if a customer is constantly searching out content on a certain subject and your content is the content that’s continuously coming up when that customer is going to make that purchase they will most likely find your content and think about you first and that’s how you’re building that trust when it comes to purchasing decisions benefit number three reach more buyers at a lower cost unlike renting attention where you’re paying to rent other people’s audience the return on content happens over a longer time even though you have that initial spend because your content is evergreen and built to last and because you’re building your own audience and owning your own audience you can reach

more buyers at a lower cost over time so given the benefits that we’ve discussed I hope that you can now apply this to your own content marketing and really start thinking about how content marketing can help your business over time take these benefits and approach your internal stakeholders so that you can really start creating that plan for content marketing now that you have a solid plan in place you have to start writing and editing to actually get that content done writing content can be extremely hard work and building a repeatable process helps so let’s take a look at some steps to take in order to build that repeatable process so that you’re ensuring all your written content is consistent on brand and that is engaging so step one find a subject-matter expert so you need to determine who in your organization knows about your topic and then you can schedule a brain dump and record a brain dump can be a thirty to one hour session whether in person or over the phone where you’re asking that subject matter expert various questions about the topic essentially having them do a brain dump if there is no subject matter expert be prepared to do the research yourself step number two create an outline creating an outline is extremely important to keep you organized and on track with your content creation make sure you include your thesis in this outline your thesis is the point that you’re trying to get across with your particular content asset make sure you set up your different sections particularly if it’s an e-book and then socialize your outline to people outside of your content team to make sure that what you’re writing about makes sense and it’s the right thing note that it has the title it has the different parts an explanation of what the different parts might be and some links so that you have places to reference when you need to go write the full piece step 3 write your first draft this is arguably one of the most difficult parts to content creation make sure your thesis is clear make sure this thesis is clear upfront and that it’s clear throughout your document and also make sure you’re constantly referring back to your main point break your content up with h1s h2s and other headers you can find these in your word processing program that you’re using like word it’s important to break your document up in this way so you know where each section lives use lots of bullets lists and numbers this is important for scan ability note that your reader won’t necessarily read every single word of your document so by using bullets and lists it could be easy to scan make sure you have an intro in a conclusion you want your document to flow step number four always review this is an extremely critical part of content creation the more people you have review the better off you are so typically have one to two people review each draft more if you have the ability each person will generally catch different mistakes refer to a style guide make sure that you have a style guide created that goes over tone and brand and that you’re always referring back to that always copy edit for grammar and structure you want to make sure your content is grammatically correct and you want to make sure it’s structurally sound and then always edit for content and concepts make sure you going through the document to determine what makes sense and what doesn’t and then use track changes in word and commenting for optimal editing this is definitely recommended particularly if you have more than one to two people writing the document and reviewing it this is important for collaboration step number five write a second draft once you have your first draft created and you have all of your edits incorporate all of them read over it an additional one to two times to ensure that it’s a final copy and then make sure all your stakeholders have seen the copy make sure everybody knows that the content is correct and everybody that needs to has signed off on that content and once you have your final draft then you’re ready to send a design if you are a Content marketer with a small budget or a small team consider working with lean content creation most content marketers are on a budget that’s simply a fact and you have to do more with less especially with the increasing demands on marketing teams and content marketers today we’re being asked to do more with less resources so how can you really get it all done I like to think about the turkey dinner analogy that was put in place by Rebecca liebe from Altima tour group this basically says that you can create one bit of large piece slice and dice it to create additional smaller supplemental pieces so that you can get more content with less so let’s take a look at how this

works in action step 1 create a big rock content asset so a big rock content asset is a Content piece that you put a lot of effort into this could range anywhere from 50 to 160 pages in length if you have the time it’s really worth the time and effort because from this one piece you can generate a ton of additional content and if you can generate a ton of additional content that’s just more for all of your programs step 2 create ebooks from the chapters so what we did is we took that one piece and we broke out each of the individual chapters to create additional ebooks each of them have different covers and different titles so they differentiate themselves you can see the common theme going throughout all the ebooks however they are shorter and they are on the different topic so it is more content that you can source out and promote step number three create cheat sheets cheat sheets are smaller more digestible pieces of content that range in length from about one to two pages go throughout your big rock piece of content and pull out items that you think could make a good cheat sheet this is a simple easy exercise that was about one to two pages and then mapping lead generation to your sales funnel another quick cheat sheet some of these can even be print out by your readers and posted on their desks the idea is to give them something easy to digest and scannable step number four create visuals you want to really extend the life and value of your big rock content campaigns so you want to make sure that you’re creating visuals from each of these pieces a visual could be an infographic a visual slide deck or a workbook remember you’re just taking the same content in the same statistics that you’ve used in your big rock piece and you’re just repurposing it in a different and more engaging and visual way and these are great assets for you to share on social media as well as email campaigns step number five create a video take the content from your big rock piece and create a fun video this doesn’t have to be something that’s highly produced it could even be a video created on your iPhone or something that you’re create in-house the idea is just to have an interesting engaging and fun supplemental piece of content to promote your content asset step number six create blog posts from your big rock content asset you can create dozens of blog posts from the content within just slice and dice it out based on what would make sense for your blog as an example for the big rock content piece of definitive guide to lead generation we generated about twelve to fifteen blogs that we spaced out over the course of two to three months really your blog content can go on forever so certainly keep this in mind as you’re writing now that you know all of the steps that it takes to create a big rock content piece and to slice and dice the content out to generate more with less sit down with your team determine what your topics should be and get to writing once you have your content created written then it’s time to send it to design good content equals good design all of your content should be highly visual it’s an important part of creating content in today’s market but how do you make all of your content visual so let’s go over a few design guidelines to keep in mind when creating your content first make sure that your logo and icon usage is on brand and correct you need to take into consideration typography you might have a certain font that is appropriate for your brand that you need to use make sure you’re consistently using brand colors even though you can be creative with your content design you want to make sure it looks like your company is a non brand and then you also want to use photography and image styles that also make sense for your brand and that are engaging and visual so let’s walk through some major types of content design starting with ebook your ebook cover should be visual and engaging this is how you’re going to get people to download your ebook and this is what’s going to attract them to your content the cover the title should be clear make sure that you don’t have lots of different design around your title so that it obstructs it make sure your title is clear at all times watch for proper templating keep your design clean again make sure that your ebook cover and your ebook interior is extremely clean and easy to read and then use images that relate to the content something that you need to be aware of is having images throughout an e-book that have nothing to do with the content make sure your images are relatable and they make sense and then let’s take a look at an example interior page from that same piece of content the interior page is clean the way it’s template it makes sense it has a call-out that pops out and the title is obvious for the different headings and

sections you want to make sure that it’s easy to read now let’s go over infographic design your infographic design should be engaging and shareable this is definitely your opportunity to be creative into pop consider a theme for your infographic make sure that your statistics are presented in an interesting but legible way so make sure it’s clear what those stats are so that your readers don’t have any questions and then don’t make your infographic too long don’t make it too long and that people have to scroll down quite a bit and don’t make it way to copy heavy now you have your content design it’s important to make sure you edit that design when editing InDesign here are some items to watch for paint layout you want to make sure that all of your pages are laid out appropriately and that they’re easy to follow and clean you also want to watch for orphan words these are words or short lines at the beginning or the end of a paragraph which are left dangling at the bottom of the column separated from the rest of the paragraph if you’re see any orphan words make sure they’re included in the above paragraph you also want to look for photography and image usage making sure the images go along with the text as well as the placement in the document look at typography make sure everything is the same size and that if you do have a brand font it’s being used and then add edits using commenting in Adobe this is the best and easiest way for designers to look at your edits so sit down and make sure that you’re reviewing all of your items once they’re InDesign if you don’t you really run the risk of having unfinished content so make sure that you’re looking over all your design content and edit it to completion you want to make sure you have a solid content mix so you need to choose your content types when it comes to content there are a lot of types to choose from you can choose your content types based on your brand look and feel the different content topic you choose your content goals and your proposed promotional channels first let’s talk about blog posts blog posts are a short piece of content that are regularly updated on your company blog your blog posts can range anywhere from about 300 to 700 words and they can be about a variety of topics blogs are fantastic for trend related content and when you need to put out a point of view at a very fast pace now let’s talk about cheat sheets cheat sheets are short pieces of downloadable content that are created for the purpose of giving the reader quick access to a series of tips or best practices your cheat sheets are typically one-page they can be front and back and they don’t have too much design these are simple easy to digest content assets you may or may not want to create a content piece similar to a definitive guide a definitive guide is a very large content piece often over 60 pages that you can then break apart into smaller supplemental content items these take a lot of effort and are often in conjunction with a large promotional plan these can really anchor your content themes for a quarter ebooks are the bread and butter of your content strategy particularly if you are in a b2b company an e-book is an electronic version of a shortened book it is designed to contain salt leadership and best practices on a particular topic ebooks can be short four to five pages or they can be much longer at around 50 pages your ebooks should be designed to fit your brand and give you the opportunity to present your information in a creative way infographics take information like statistics or best practices and present it in a visual way infographics are generally vertical graphics that are short and easy to consume your graphics should be highly visual with an interesting hook typically infographics are created to be presented on your blog and then you can promote them out for inbound links in media attention reports can be created by collecting survey and industry data and presenting it in a comprehensive document reports often contain lots of statistics a strong point of view and are generally formatted like an e-book reports can be a critical cornerstone of your content and gain a lot of recognition in your industry don’t just stop at downloadable content think about what else you can create that is interesting fun shareable and engaging for your audience think outside of the box because content can take many forms videos are an engaging and visual form of live-action content that can be filmed and promoted on YouTube social channels and your website your videos can range from product related topics to stories music videos and more they can also range in length from 30 seconds to 10 minutes however note that best practices here are for most of your videos to clock in at around 1 to 2 minutes people have short attention spans visuals slide decks present information statistics and best practices in a highly visual format using slides often your slide decks can almost be like an infographic chopped-up you can publish them on SlideShare and

promote out on your website in social channels a caution here is to make sure each of your slides are professionally designed and does not include too much text a webinar is either a live or recorded presentation that a speaker presents along with a slide deck these are generally events for your audience signs up at ends and asks questions webinars are generally topical and can be easily recorded for later viewing a downloadable workbook provides your readers with an actionable template to fill in his or her own answers to questions these are great if you’re trying to teach your audience something in particular using a checklist table or fill in the blanks now that we’ve broken down all the different content types you can then start to really formulate what your content marketing mix is going to look like once you have a good idea what types of content you will be creating next you have to plan your editorial calendar before you create your calendar you need to think about your content mix what are the different types of content you’re creating and how are you going to fit them in your plan to make a diverse content plan think about your content in food groups this is actually a great analogy developed by Ann Handley to help you determine what should be in your content mix first you have your roasts these are your large content initiatives this would be if you’re planning to create a definitive guide a large ebook or anything that has a promotional plan that’s extensive behind it these would be your big rock pieces next you need to make sure you have your Raisin Bran these are everyday pieces of content they are quick and consumable like cheat sheets checklists and best practices these should also be a big part of your content mix in addition to those roasts next we have our spinach these are content items packed with nutrients this is your high-level thought leadership this could be a report that presents findings that you have in a survey it could be a really high-level ebook an executive piece of thought leadership either way you need to make sure that these items are included in your overall content plan next we have chocolate cake these are your fun light-hearted and indulgent pieces of content this could be infographics any type of special projects you’re creating fun videos ideally these are pieces of content that you’re sharing on a regular basis on social channels and then you have your Tabasco sauce this is content with some spice content that challenges asks hard questions or provokes responses your Tabasco content is often best on a blog post so definitely take that into consideration but make sure you’re not saying anything that’s off-brand a Content plan and an editorial calendar can help you stay organized so you need to make sure that you are adhering to this mix you don’t want your content to be homogenized and you don’t want to be putting out things that are the same day in and day out editorial calendar also helps you increase visibility across your organization so that people in your company know exactly what content you’re creating and when it also helps to align your team’s you might have different content that you’re creating on demand gen versus customer versus brand so how do you organize your calendar first the content mix that we spoke about earlier then you should organize based on different teams depending on who you’re creating content for any ongoing campaigns that you have to keep in mind and strategic initiatives these should all be on your editorial calendar what calendar platform should you use there are many options here basically you just need to make sure that you’re putting them down into a calendar form you can use a calendar in your content management tool consider a google calendar that you can share with folks across the organization or a Google spreadsheet even if you don’t necessarily want to put it in a calendar form or your marketing automation calendar many marketing automation platforms today have calendar functionality that you can add your content mix to who should see my calendar your editorial calendar should be available to marketing sales customer service executives or anybody else who wants to know what specific content that you’re creating here’s a few examples of different types of calendars calendar example number one is a detailed spreadsheet a spreadsheet could be useful if you have multiple different content types you want to include the status production start and completion date maybe you want to say what business unit this is for your persona right resource section type if you have lots of different items that you need to put in a calendar form sometimes a spreadsheet type format works great calendar example number two is within a content management platform this one is from Divi HQ this one has an actual

calendar format where you can toggle based on specific due dates what calendar so what business unit is for the team member it’s assigned two content type or your content strategy no matter how you decide to reorganize your editorial in what you include in your calendar make sure it has that content mix that we spoke about and that everything is organized so go ahead write down your plan and put it in a calendar you before you start promoting your content you need to determine your gating strategy what is gating gating means putting a forum in front of your content for the purposes of gathering lead contact information it’s often in front of a Content asset in order to download it it typically asks for first last name sometimes job title email phone number company or role when it comes to gating there are a ton of different choices on how you can approach it so let’s go ahead and take a look at some of your options option number one gating all of your content so by gating all of your content you can gather all of the lead information each time somebody downloads a content asset this can be a good option for small companies in startups to build their database and it’s a consistent user experience through all of your content as your readers are seeing the same thing each time option number two is gating none of your content gating none of your content enables you to remove the barrier of entry for your content so that means that somebody who is downloading your content doesn’t have to put in their information it helps you to grow your thought leadership and branding over time and it’s good if you have a lot of early-stage content your early stage content because it’s so educational people might not want to put their information in in order to get it option number three is gating based on buying stage and this is the option that I prefer personally gating based on buying stage enables you to gate only the assets that show buying intent when downloaded so this means that you will only gate the items that are really close to your product and so that you know if somebody’s downloading that item that they’re closer to being a customer it also helps you to score your leads and determine where they are in their buying cycle scoring is something you can do in your marketing automation system and it enables you to determine where a person is in their buying journey if you’re only getting some of your content you can give them a higher score if they’re downloading the pieces that your gating here’s a sample strategy for gating based on buying stage for early stage those real thought leadership educational and best practice pieces don’t gate this is because those assets are great on social they’re good and promotion plans and they let people introduce themselves to your company mid-stage assets are great to gate because those are the assets that really show buying intent if somebody downloads an ROI calculator or a buying guide you want to know who that person is and you want to score them appropriately and then late-stage items you don’t have to gate late-stage items would be those customer case studies pricing sheets you want to make sure all of that is easily accessible to somebody who’s close to being a customer gating based on asset type also enables you to have more premium content take those items that you spend a lot of time on and put a form in front of them you don’t want those to be as easily accessible and you can create variable content value based on effort and output put a gate in front of the content items that take you and your team a ton of time to create and then don’t gave those items that aren’t as time intensive now that you know your gating options sit down with your team and stakeholders to determine what makes sense for your unique business case before you actually go out and promote your content you need to outline your promotion strategy once you publish your content online you then have to promote it through various channels let’s go through step by step what you need to think about when outlining your promotion strategy step 1 create a strategy document once you have published a piece of content you must send out a strategy document to your team that outlines and includes the following items what topic the content is created about the audience the content piece addresses the different goals at the content piece addresses the thesis that’s included in the content throughout the table of contents if available the buying stage that the content was created for and any promotions suggestions that you as the content team might have by creating the strategy document you can help guide and outline your promotion team step number

2 always score your content you can use your marketing automation tool to score your content base your score on the different content types you’re buying stage and persona you can often toggle the score up and down based on the importance of your content piece but by scoring your content you can help determine how close a person who downloads it is to being a customer step number 3 meet with your stakeholders you want to set up some time with your various promotional teams to go over goals and to describe to them what the asset is about and what it should be used for teams you set up some time with are your demand generation teams social media your online marketing teams product marketing and public relations step number four create a promotion plan so work with each team stakeholder to create an overall promotional plan for each assets four large assets consider meeting weekly or bi-weekly until launch assign a project manager this might be yourself or it might be a member of the promotional team and then ensure that team member stays on task over time step number five is to make sure your plan is multi-channel you want to be where your buyers are your buyers might be on social media they could be on your website but wherever they are be sure that your promotion plan touches all these areas by promoting your content on multiple channels you can ultimately maximize your exposure and also remember to promote content on appropriate channels for the content type not all channels make sense to promote every single piece of content on and then always test your promotion strategies you might think that you want to promote a content asset on a certain channel but then discover that’s not the best channel for that particular asset so now that you’ve sat down with your stakeholders and outlined your promotional plan it’s time to actually get promoting one of the most important aspects of promoting your content is merchandising your content on your website so your website is the key place that you want Leeds to download your content you must merchandise correctly in order to optimize for conversions so where can content live on your website content can live in a specific resources section dedicated to content on your home page or uh nads throughout your site so let’s take a minute to walk through each option your resources section this is where your content can call home it is a section of your website dedicated to thought leadership I definitely recommend that all companies that are engaging in content marketing create a resources section your resources section should be easy to navigate it should be very obvious to the reader where to find each content asset it should also be SEO optimized you want to make sure that you’re calling out the different keywords for search engine optimization throughout your resources section it should also be categorized by key topics that are important to your business should include visual images throughout and include forms for lead generation so now let’s talk about content on your homepage this type of content should be designed for conversion add your best content to your home page rotator or include a call to action in a prominent place so now let’s talk about ads throughout your site you can promote content throughout your website using banner ads and other images consider promoting content on your product pages where applicable now that you have your content published on your website it’s important to think about how to optimize it for conversion promoting content on your blog is a great way to get any new content asset in front of your audience your blog attracts early-stage buyers is great for lead generation ensures more eyes get on your content and is a great place for social ready content to live do not discount the importance of your blog when it comes to promoting any new content you may have created promoting content on your blog and nabel’s you to drive even more traffic to your blog and your content over time so let’s go over a few tips for promoting content on your blog don’t just promote your eBook give your audience actionable takeaways this means when you write a blog post that promotes a new ebook don’t just say hey here’s our new ebook and then include a downloadable link make sure that you’re taking valuable content from within that ebook and making it a standalone blog post that way it’ll be a piece of thought leadership in and of itself use bullets and headers for easy skimming this is really for your blog as a whole but most people won’t sit there and read a blog post word-for-word so make sure that you include multiple bullets and different headers so that your reader can easily skim through the post include a hero image of your asset

so just take a screenshot of the cover of your ebook and use it as the primary image for your blog post and then always include a strong call-to-action to download the ebook at the beginning of the blog post as well as at the end of the post and perhaps in the middle you want to make it clear to the reader that you want them to download the ebook for additional information so once you have your content published on your website on social media and have sent out an email make sure to create a blog post that effectively promote your content one of the most important channels that you can promote your content on is social media social media is a great place to promote all of your new content so let’s go over some key tips for social media promotion success when it comes to content marketing number one don’t take yourself too seriously remember that social media is a place where people congregate to share their thoughts and ideas and it could be a personal network that you’re sharing your content on number two organic isn’t enough on its own and don’t be afraid to boost your presence it’s one thing to post your content organically through your social channels but you want to put a little bit of paid boost behind your content in order to optimize it for a conversion and get it seen by a lot more eyes number 3 focus on valuable content and solid offers number four create strong calls to action when you post a content asset make sure the audience knows exactly what to do whether it’s download an e-book or share an infographic number five always provide value make sure that the content that you’re posting on social media is educational and helps your audience do their job better and number six never forget that social is a two-way street make sure that you’re constantly listening to your audience in taking their feedback into consideration so let’s go over all the different social networks in different ways that you can promote your content on those networks let’s start with Facebook Facebook is a great place to promote your content Facebook is a highly visual medium and so it lends itself very well to content promotion you can promote your content through organic posts on your company page you can also promote those posts by putting some paid ads behind each of those organic posts for additional boost and lift you can create Facebook ads that appear on the right side of your newsfeed you can promote your content by creating custom cover photos that appear in back of your profile image you can also create apps and tabs and that would also appear on your business profile page and somebody could click on a tab and download a content asset by filling out a form so now let’s talk a little bit about Twitter Twitter is a very fast-moving social network and also a great place to promote your content you can promote your content via organic tweets from your business Twitter profile you can promote those tweets using paid ads you can also promote your content hashtags by using promoted trends where you put some paid spend behind those hashtags so that they’re more easily searchable by your network you can also use Twitter lead generation cards where you can actually have somebody fill out a form through Twitter using their information and they can download an e-book now let’s talk about LinkedIn LinkedIn is particularly useful for the b2b market as people are generally on LinkedIn for business purposes here are the different ways you can promote your content through LinkedIn you can use LinkedIn updates from your company page and then you can sponsor those updates to get additional reach and visibility you can create LinkedIn ads which appear on the right-hand side of your LinkedIn profile and you can also join LinkedIn groups that are relevant to your particular topic and promote your content through those groups next up we have SlideShare SlideShare is the perfect place to promote your visual content such as slide decks and infographics you can also optimize your SlideShare postings for search engines by using keywords in the descriptions SlideShare also offers the opportunity to include lead generation forms that sync automatically with your marketing automation tool so that somebody has to fill out their information in order to download one of your content assets and you can also appear in the featured presentations on the homepage as well now let’s talk about Pinterest Pinterest is also a highly visual medium that’s great for promoting types of content like infographics and slide decks on Pinterest you can share your content and engage your audience by creating boards and pins Pinterest is particularly great for the b2c audience but also works well for b2b especially if you’re creating a variety of visual content so now that you have your content published on your website and selected it’s time to determine what social channels work best for each content asset note that all

social channels won’t necessarily work for every single content piece so very particular where your post your content but take your content post it on social media and engage your audience one of the best ways to get your content out in front of your audience is promoting it via email increase the reach of your content by promoting it on email promoting your content via email gets it in front of your customers so let’s go over a few tips on how to make sure your content emails are engaging and relevant to increase opens tip number one always segment your email sends so send your content only to people who will find that content relevant align your content emails to buying stage and role which should be easy since your content should already be aligned to each stage and roll and don’t forget about sending content to current customers tip number two design your emails to be engaging so you want to make sure that the people that are reading your emails find your emails engaging and want to download the content asset inside create a custom header by using the design of your content asset make sure you include brief copy with bullets you want to make sure that your email is easy to scan include a strong and a clear call-to-action which will often be download this ebook download this infographic or something along those lines and then create a hero shot of the content asset you want to make sure the cover of your ebook or the cover of your cheat sheet is included in the email tip number three send an email to your entire database so hopefully you have a large marketing database once you publish a new piece of content send an email out to your entire database using the segmenting rules discussed above tip number four consider a sponsored email a sponsored email is when you pay a trusted vendor to send out an email including content on your behalf be careful here and make sure that you’re only using vendors that are not only reputable but have a target audience that matches your target audience tip number five always add your new content to your lead nurturing programs if you’re using marketing automation you should be nurturing your leads once they hit your database make sure to add any new content into any relevant lead nurturing tracks to ensure that even people that are have already been in your database for a while receive your content email is a fantastic way to get your content in of your audience if you remember to segment make sure your email is engaging in relevant you can increase the chances of content downloads using pay-per-click ads to promote your content increases reach pay-per-click ads are highly targeted paid ads that drive traffic to a custom landing page where you can generate leads let’s go over a few types of ads that you might want to take into consideration first we have search engine ads these ads appear on a search engine and our targeted based on keywords and audience banner ads appear on various websites in our targeted based on demographics and then retargeting ads appear on a network of websites in our behavior-based s– these look very similar to banner ads however they are behavior based let’s go over some logistics for your pay-per-click ads you pay only if a visitor clicks on your ad and you can target your ads using demographic behavioral and search criteria there are various tips for using PPC ads for content promotion first of all consider using high-value mid-stage content for your ads because you’re spending a lot of money on your ad program you want to make sure that you’re only using the most relevant content your ads should have a fun visual or engaging copy this will make a difference whether it’s a banner ad or a search engine on a search engine ad since generally you can only include copy you need to make sure it’s engaging and a banner ad can have a fun visual all ads should go to a landing page so your ad should not just go to a web site if your ads do just go to a website you run the risk of your leads leaving your website or just going all over the place a landing page that’s custom shows your lead exactly what they should be doing particularly if it’s downloading a piece of content plus you want to make sure that you’re collecting all of their information in a form so all landing pages for your ads should have a form so now take your high-value content and determine what types of ads might be appropriate for promotion because ads can get very expensive my advice here is to ensure that you are only choosing your best content that shows buying intent then create your copy develop your custom landing pages and purchase some ads now that you’ve created your content and promoted your content it’s now time to

start measuring your content but first we have to start by defining your measurement goals many content marketers think that content ROI is essentially a mythical beast why is content ROI a struggle marketers don’t often think about ROI during content creation you must think about how to measure your content before your content is even created marketers aren’t aligned on metrics amongst the teams this happens if you’re creating content in a silo one team might be tracking one metric and the other team might be tracking another metric and then marketers don’t know how to measure content effectively because content marketing is a relatively newer discipline with businesses there aren’t established measurements in place that content marketers can take but measuring content can be done so let’s go over a few steps that you can take to ensure that your content is created with measurement in mind first establish goals and measurement estimates up front every time you create a content piece you need to determine what you’re hoping to get out of that content piece in terms of measurement define what you want to measure for each content piece are you looking for social sharing are you looking for increased traffic are you looking to measure revenue and pipeline define exactly what you’re looking for for each piece aligned with all key stakeholders and get their input it’s important to sit down with all the executives in your business to determine what the proper content measurement should ultimately be and then budget it out you need to make sure in order to measure content that you have an actual content budget and you know exactly how much you’re spending on content because this will make a difference for the return on investment in the end by putting together a sample budget you can better determine exactly what the ROI is on your content over time and then you should determine what exactly our other marketers measuring in your content so now that you’ve thought about goal creation for your content marketing analytics make sure every single piece of content that you make is created with measurement in mind you can track your metrics in your marketing automation tool or create a spreadsheet that tracks monthly metrics overtime so you can see exactly how your content is performing you structuring your content organization is critical to having a content marketing strategy that works many organizations have trouble structuring content teams this is because content marketing is a relatively new thing within the marketing team and many organizations simply don’t know where it should be within the organization and how it should be set up a question that I get asked frequently is what team within marketing should own content should the content marketing team live with demand generation how about product marketing should it be in corporate marketing inbound marketing brand marketing these are all areas of the marketing organization that create content so content marketing could potentially live in any one of these marketing orgs everyone wants content to align with them simply because all of these different teams create content on a daily basis and if content gets aligned with only one team content then gets created in a silo what do I mean by that well if content is aligned only with demand generation then contents goals would be only for say lead generation if content is aligned to only product marketing then content would only be gold with the same goals that product marketing has so if content gets aligned with only one team often content gets created in a silo without the other teams being aware and without the other teams input no one talks to one another and content messaging is inconsistent and doesn’t reflect the brand this is really critical content marketing must be looked at in a holistic way that reflects the brand messaging the brand look and feel and the way the brand wants to speak about itself across the organization but what if we didn’t align content to any one team within marketing what if content marketing was actually a neutral team within the marketing organization so that every team can benefit from centralized content that way your organization can have a much more holistic view of content and content gets to be much more strategic instead of siloed think about content as a strategic service bureau so what do I mean by this the content service bureau model determines a Content roadmap so your content team should help determine what the content strategy is a content service Bureau can help support requests from various teams all those teams mentioned earlier to mange and product marketing your brand teams they all have content needs so your content service bureau should help support those needs content service bureau oversees the content creation process so ensures that there’s that consistency of message and voice amongst the content that you’re creating they help streamline content creation

for that type of scalability that you’re looking for it reduces duplicate efforts because all of the teams are aligned so what sort of content should your content service bureau help create your content service bureau should span all of those different departments within marketing and support all of them with different types of content your content service bureau should create demand generation content this is content that helps to drive demand for your business such as ebooks infographics slide decks this is the content that’s being promoted out through social channels as well as through your programs on a daily basis your content service bureau should also help create customer content this is those customer case studies and testimonials that really show the external community how your product or service has helped customers your content service bureau should also help create brand content this is that really high level thought leadership content such as contributed articles executive content the content that you want to put out there to really show who your brand is as a business your content service bureau should help create all of these different types of content for the different teams within marketing so how should this content service bureau be structured on top of your content service viewers should sit a Content committee which is made up by executives within your organization the content committee our various stakeholders from those different marketing teams and they help drive the strategy and ensure that everything that the content service Bureau is coming up with is on brand and right so next you have your content service bureau and this is often made up of that managing editor as well as your associate editor and your managing editors the person that is in charge of content strategy the associate editor sits underneath the managing editor and then various writers might make up your content service bureau as well these could be internal writers or external writers and then you have content requests by various teams coming into the content service bureau ultimately to determine the overall content strategy so as you’re really sitting down to think about what does your content strategy look like and then once you’ve hired the right team sit down and think about how your team is structured content service bureau is a great way to structure your team so that they’re neutral scalable part of your marketing organization in order for your content marketing to be relevant and effective you must create your buyer personas and buyer journeys most companies have many different types of buyers these are your personas these are the people that you connect with each day with all of your marketing activities most companies have multiple different people that buy your product or service so you want to make sure that you create content that is relevant and effective for each individual person and each of these personas will have their own buying journey this is essentially the journey that they take from when they initially start researching a topic to when they make a purchase each buying journey has multiple different steps and it’s critical that your content marketing really address each of the different questions and concerns that a buyer has in each step of this process in order for your content to be relevant and effective it must speak to both your buyer personas and their journeys so creating buyer personas help you essentially choose what type of content you want to create not every type of content is going to work for every single buyer it also allows you to target the right topics that your buyers actually care about so by doing the research to determine what are your different personas care about and what do they want to read you can certainly target what type of content you’re creating your personas also help set the tone in the voice for all of your content assets you might have one buyer persona that likes to hear their content in a more conversational way another type of persona might want more of an authoritative more research driven type of content and by doing buyer persona research you can really determine what type and tone and voice of your content needs to be buyer journeys help you know what content create for each buying stage like I said each journey has multiple different stages in a proper content plan will have content that speaks to the questions and concerns your buyers have at each stage buyer journeys also help you know where your buyers consume your content at each buying stage so somebody who’s initially researching your product might be finding out about your content on social media whereas somebody that’s close to making a purchase decision my looking at a content syndication site or potentially your website so it’s important to know where exactly your buyers look at each stage of the journey so how do you find out this information how do you create your personas and your journeys you’re gonna have to conduct interviews and research so how do you do this so first you’re gonna want to speak with your sales teams your sales teams are really on the front lines of

speaking with your prospects every each and every day they know exactly what types of questions and concerns your potential customers have so sit down and speak with your sales team to determine what exactly they know about your current customers your customer service teams are also on the front lines with your customers each and every day in fact they know very very closely what type of concerns your current customers might have so speak with your customer service teams to determine what type of issues problems and questions your current customers ask you also want to speak with current customers it’s important to get both sides of the story so speak to both happy and unhappy customers this will give you a well-rounded perspective when determining what you should write content about and then also if possible speak with prospects the people that have yet to become customers but who hopefully will in the future during your interviews you want to find out some of the following information and this could be limited to what’s here or you could well expand it into what makes sense for your own business the type of information you’ll want to find out to actually create your persona as information about background potentially job details the goals and challenges of your personas where do they source their information watch their preferred content medium some people like to read content in an e-book form other people might want to watch a video get some actual quotes from customers what about their objections what type of objections do they commonly say if they’re speaking to a salesperson and then watch their role in the purchase process you might have multiple personas that are the decision-maker and then some personas might also be assistants to the decision-maker so lots of especially long sales cycle these days have large buying teams so you might want to create personas for each member of the buying team and and create a sample marketing message for each one of your personas so that you know exactly how you’re speaking to them in addition to your sample buyer persona you also want to create a sample buyer journey the information from all of your interviews will lend itself very well to creating this type of journey remember that a buyer journey has each different stage and they might be different stages depending on your own sales cycle in your own buyers but here’s just an example it could be calm status-quo shattered search around problem frame problem in solution consult peers and experts engage potential providers so those are just an example of some of your buyer journey stages and then in addition to each stage you want to provide an explanation and potentially any questions or concerns that buyers bring up at each stage now that you’ve created your personas and your journeys it’s time to align your content it’s extremely important to create content that speaks to each persona and each stage in the buying journey and that’ll make up your content mix so now that I’ve gone over persona and journey sit down with your stakeholders in your organization determine what questions you should ask and now create your own to make sure that your content is consistent you want to work on developing your brand voice your brand voice presents a consistent experience across channels giving your brand a recognisable persona this is extremely important because your brand voice will ultimately help you stand out there’s a ton of information out there and a ton of noise out there online so you want to make sure that your content is specific and unique and by developing a brand voice that really speaks to your organization your content will stand out from the noise first you’ll want to create your brand voice persona your brand voice persona is essentially the feeling that you’ll want your content to convey your persona might be conversational accessible humanistic educational authoritative professional there’s a wide variety of attributes that you can use to apply to your brand voice persona next you’ll have the brand voice tone your brand voice tone is essentially the way that your content will sound it could be friendly direct honest formal perhaps its scientific humanistic there’s a wide variety of attributes in ways that your content can sound next you’ll want to determine your brand voice language this is essentially the language in which your content is written this could be simple wordy complex jargony it depends on your business and who you’re selling to next step is to create a style guide once you have your persona and your language your tone all dialed in you’ll want to put that down on paper so that you your content team and any external writers that work with your company can determine the exact style in which you write for your style guide you might want to answer questions such as who is your company who do you sell to who makes up the content team your style and writing tone specific grammar guidelines

as well as content types and structure this could be sure or as lengthy as you want the more information you put in there the easier it’ll be to Train incoming content marketers on your company and your style one thing to know is that your tone might differ slightly for each persona as you build out your different personas in learn what each persona likes for their content you might learn that one persona might have a different tone than another so it’s important to make sure that you’re writing content that’s relevant for each persona but overall your voice should stay consistent throughout your content now that we’ve walked through how to create brand voice and tone now it’s time to sit down and write your own style guide get together map out what you want your content and your brand to look like put it down on paper and put it into action in order for your content marketing to be successful you must set your content goals you should always have goals for your content marketing and they should be the first thing that you set up goals help you choose the right assets for your organization design content to be measurable upfront and help you define success a content marketing plan without goals will simply not go anywhere so goals are critical to ensure that the content you’re creating is the right content and now you’re able to appropriately define success so what are the steps you need to take in order to set your content marketing goals step 1 meet with stakeholders to get their point of view on goals you need to determine who is involved in creating content there’s most likely a variety of team and a variety of team leaders who are involved in creating content and promoting it so get together your marketing team leaders sales team leaders and customer service team leaders step number two ask yourself some foundational questions why are you creating content what programs are you planning on using your content in what are your short-term goals and what are your long-term goals asking yourself and your stakeholders these questions will help you set a solid foundation for creating your content marketing strategy step number three define your qualitative goals these are goals like brand recognition thought leadership social engagement relationship building trust these are all extremely important goals to content marketing so although these particular goals might not have a number assigned to them they are critical step number four define your quantitative goals so these are goals that actually do have a number associated with them so these might be number of new leads number of downloads for each content asset specific number of social shares percentage of leads that turn to opportunities that turn to closed deals how much pipeline your content has generated how much revenue all of these quantitative goals are measurable and should be measured and they should be measured in addition to the qualitative goals so that you have a holistic view of all of your content marketing how do you track your goals you can track your goals in your marketing automation platform this is great for tracking those quantitative goals like number of leads downloads and revenue Google Analytics is great for traffic and conversions you want to track your social channels for increase in social shares your CRM tool you could track for customer engagement and your content management tool you can also track stuff like downloads and engagement and then add your goals to a plan put it down on paper you’ll want to socialize your goals with your team present it to the stakeholders you can schedule weekly or monthly metrics check-ins meetings and hold yourself accountable to your goals you can’t create a Content plan within a vacuum so by showing your organization that you are able to tie your content efforts to goals you ultimately get more budget and more team and bandwidth to create more content over time now that we’ve sat down to discuss how to create your goals for your content marketing sit down talk to your stakeholders ask yourself questions put everything down on paper and ultimately create your plan before you can start actually creating your content you need to brainstorm your ideas and create your content arts coming up with content ideas can be daunting it is easy to have writer’s block when it comes to content but before you start actually writing your content you need to determine what topics should you consider for your content and this will vary based on your business often it’s great to first start by taking into consideration your business priorities this could be new markets that you might be trying to launch in additional product launches or service launches branding initiatives within the business are different thought leadership topics that you want

to become leaders in the industry about these are all great topics to write content form you also want to take into consideration your personas and who you want to sell to you might be writing different content for different personas you might find that different topics will resonate with each one of your persona so you want to also take that into consideration industry trends that’s a great place to start you want to make sure that you’re on the cutting edge of what’s going on in your industry many businesses want to be on the forefront and you want to be able to write thought leadership about what’s going on with your peers so creating content and industry trends is certainly a great place to start search engine optimization priorities are also an important thing to keep in mind many companies will want to make sure that their content is SEO optimized meaning their content will show up in search results and you might have specific keywords that our company priorities for example if your company sells mobile applications mobile might be one of your keywords and you want to make sure that you write plenty of content around mobile competitor content is also an important thing to keep in mind take stock on what exactly your competitors are writing about you can find this out through following them on social media using a competitor content tracking platform or just keeping up to date and researching and then do some of your own digging within your organization to determine what other teams want you to write about meet with key stakeholders in your company to figure out what are their key is make sure you get your product or service roadmap so you know what exactly is coming down the pipeline ask your sales and customer teams what type of content would help them not only sell deals but also keep customers listen on social media to determine what your markets talking about you also want to ask your audience on social media what they might want to hear send out a tweet put a Facebook post out there and simply ask your followers what type of content they’d like to see from you once you have a solid brainstorm of topics that you want to write about then you can create your content arts content arcs are monthly quarterly by yearly or yearly content themes that you can write about by organizing your content and content arcs that makes it easier for you to determine and select what to prioritize when writing your content but then you also have ongoing initiatives throughout each of these quarters these might be product launches or items based on the type of business units you sell into so in addition to your arcs you also need to keep into consideration any ongoing initiatives that you might have so now that you’ve sat down and done some serious brainstorming about what type of content to include sit down write your plan map out your arcs and let’s get started mapping your content to your sales funnel helps to ensure that you’re moving new buyers to actually become customers once you understand your buyer journey and have created your content themes and brainstormed ideas you need a map your content to your sales funnel why well you need to create content that speaks to your buyer at every stage of your sales funnel first you need a know your sales funnel this could be different based on your business whether you’re a b2b business or b2c business that’ll make a difference your sales funnel typically starts at a top of funnel stage where most of your new buyers and your new leads are coming in it gets a little narrower in the middle of your funnel where you’re actually starting to nurture your leads and you know who they are and then at the bottom of your funnel you have leads that are very close to becoming customers and then they are customers so again this will vary based on your business but it’s important to understand your funnel and to start to map your content to these different stages so let’s walk through let’s start with your top of funnel tofu content this person is at the beginning of your sales and marketing funnel she is aware of your service but she is not ready to buy this person might have found you through social media she may have gotten on your website offered types for your top of funnel content our educational and thought leadership do not mix this with content that has too much product information let’s take a look at some tofu content in action all of these examples are ebooks that are best practice and thought leadership that each of these vendor publishers have created to help educate their audience so now let’s go to the middle of your funnel your mo foo content this person has displayed buying behavior is engaged with your content and is potentially a sales lead so this person knows who you are and you might be nurturing them over time offer types here are third-party reports return on investment calculators for your products or buying guides to help these people make a decision to purchase your product so now let’s take a look at mo foo content in action in contrast with the tofu content assets all of these three examples actually speak a bit about the

product in the core business Kate says each of these companies are trying to sell so the one on the left developing a business case for marketing automation the one in the middle is a third-party asset that talks about the total and economic impact of the company and then the one on the right is a buyer’s guide so all of these pieces are working to push the buyer through that funnel now let’s talk about your bo foo content that bottom of the funnel content this person is very close to becoming a customer your offers here are very specific to your product or your service so your offer types are promotional and return on investment proving some examples here are to prove competitive and value so some ideas could be pricing sheets these could be customer case studies you really want to show a customer in the bosu stage that your product is the correct choice now that you understand your own sales funnel you can take your content and you can map it to each of the stages in your funnel then you can make sure that you’re pushing leads to eventually become customers you

Uncategorized

Jaspersoft Tech Talks Episode 06: Embedding Jaspersoft into your PHP Application

thanks for joining today’s Tech Talk this is Tech Talk number six embedding jaspersoft into your PHP application today’s the 22nd of January I’m joined by Mariana Luna he works on the same team as I do he’s joining us from Houston Texas I’m in Dublin Ireland so if you haven’t joined these Tech Talks before the way this works is we have a presentation and during the presentation we’ll try to make it as interactive as possible so if you have questions you can submit them in the question box and we’ll do our best to answer those live and if if you have questions that are not related to today’s topic don’t hesitate to ask those as well so anything else that you have with Jasper stuff that you need help on we can do our best to to answer them in life so we’re gonna start off today with a poll just asking you a little bit about you know why why maybe you’re here today to embed a jaspersoft into your solution so we’ll we’ll put up that poll question now and give you about a minute to vote so it’s really about how how you use or plan to use jaspersoft whether you’re using it as a standalone app or embed it into other applications all right so it looks like most people have voted now so we’re gonna put up the results – so you guys are aware okay so looks like yeah a good good spread no no Java no Java users but about a quarter of the folks are embedding into a PHP app so great all right so with that we’ll we’ll get Mariano to to get started welcome to the show Mariano and and thank you because most things we’re going to be talking about will relate also to to embedding the the application into different technologies PHP but we will focus on that for the samples and also to present the new wrapper so a little bit of the overview of what are we going to talk today and I will initially start with talking about the REST API that Jasper’s of expose and allows us to use the Jasper services from different applications as you know rest is it’s a pretty standard you can operate and work with that from different technologies so it’s a really good way to embed the application into your current application have a reporting full reporting functionalities and VI a stack into what you already have either being that an existing commercial application or an intranet it’s good to give your end-users seamless experience with their data I will talk a little bit about the iframe integration and REST API integration and how each one work and what the benefits we get from either side the good thing is that we can mix and match them so while embedding and we will probably they do that on a regular basis I’ll give a quick overview of the single sign-on and user management that’s probably a topic for Apple tech talk by itself but I will explain a high level what that entitled since we’re talking about two different applications that have security we will be able to let Jasper know that some other application is the one in charge of that security and be able to create a single sign-on between the two applications so again you give the end-user that one point of access experience and then of course when working with the integrating and embedding the Jasper server themes are pretty interesting because it gives you the capability of working with Jasper UI

to mimic what you already have a company wider in the application you’re embedding and of course since we will be talking about PHP we will be using the a new wrapper library that came up with version 5 and that allows you to from the PHP environment gives you a full library that you can abstract an and work through the server a lot simpler than having to create your own restaurant and integrate that in a more easy way from from the programmer side ok so we probably you have seen this like a couple of times if you have been around Jasper but the main things that I want to point out here is that the Jasper was built to be embedded so you have a lot of things that you can leverage here the food UI it’s CSS based so it’s very easy to customize to fit your needs those CSS and images are actually stored in the Jasper server repository so you can have more than one look and feel for your application and their URL parameters that you can access through HTTP API that allows you to change that on the fly while you’re requesting pages for the job for server and of course rest will allow you to access all the underlying services or for Jasper this is just a quick overview of the stack and it gives you kind of a bird’s-eye view of how Jasper works internally and that way you can leverage those functionalities and those services from outside just for instance to to bring that into a different environment okay little bit about the REST API we’ll talk about so we know that the jasper core server has its own UI and you have a lot of services that are provided there the thing is that most if not all of those services are available through the REST API so you can manage element in the Jasper server repository and you have report services that allows you to receive a render report receive an export of that report in a PDF or Excel format you can access the scheduler input controls and everything and also a full set of admin services that allows you to control users on the jasper side access control repository resources and scale jobs and manage those schedules so you can check how many jobs you have running currently were the next triggers and so forth and that allows you to interact with the server and build layers on top of that that can simplify either your user acts or your admin access depending on west side of the of the fence you’re in and what parts you want integrate so up to the rest side that’s exactly the same for any for any technology that you want to use and this is fully documented in our web services guide in particular for PHP with the version 5 there is a new project that it’s on the it’s a develop was developed by Jasper and it’s part of also our community project and it said Wofford library for PHP and that wrapper library will allow us to extract all the interactions from our PHP application to Jasper in them in a very simple way and you will see that in a sample in a couple of minutes but the idea is that you can work with the server by handling a class in PHP the REST API as you know will work with other JSON or XML elements the wrapper will take care of those elements transform those into objects that you can easily manipulate from inside PHP so it gives you a very rich functionality on working with the server without having to handle everything on your own Mariana I’m gonna stop you for just a second since we have a lot of customers probably using many many different versions of the product just want to point out a couple of things the the REST API was introduced in version I might be lying but I think it’s four point two and then yes so four point two and above has the REST API and then the PHP wrapper library was introduced in

version 5 now whether that’s backward compatible with other versions of the REST API I’m not sure it was tested with version 5.0 so just just be aware of that in previous versions the only API that existed out of the server would be the soap API now that that doesn’t mean that if you’re on version 4 you can’t build the PHP application just means you won’t get our wrapper library it’s ok that’s that’s just something I wanted to point out and you can continue Mariana thank you yeah and going forward with the laptop the actual REST API got big overhaul into version four point seven so most of the things that you will see on the PHP wrapper leverages those services that were exposed in their birth for seven and five all versions okay would that be rest v2 if you’re looking at the API yes okay you look at the API dog it will be rest v2 and most of those improvements were in simplifying the access specifically to the management of the input controls and management of the resources in the repository so if you’re on a different version of Jasper or Forex version you will see that if you grab the latest API documentation it’s a lot easier now to manage those resources and to work with those input controls with one you know just one call to the server instead of having to do multiple poles as you have to do with the initial versions of the REST API so this is just to show you typically how this architecture will work from your application so we know PHP works in a lamp or womp environment fairly straightforward so you will still have your Apache web server or any other web server that will be handling your PHP code in the front and connecting to your clients and managing all the operational level access to your database from the PHP application and now when you want add the reporting piece to that application instead of having to build everything from scratch what you can do is just leverage everything the task 4 provides so you can have a jasperreports server running and this is actually separated out but you can have them running on the same instance that will depend on the scalability that you’re looking for and your specific deployment but Jasper report will be running on a tomcat container that can be run directly in the same systems that you will be able to run you PHP application the connection or the communication between your application and the server will be using either rest or so or the HTTP API and Jasper will also access the same database or for generating the reports that you need so the good thing here is that from the PHP side you have to develop our full reporting environment just to give your users access to the data that they’re building or creating at an operational level and you can leverage the full UI and all the elements that Jasper reporting library gives you to bring them pixel perfect parts or analysis to your to your customers so a little bit going into what the client gives us and that wrapper that we were talking about so the client is fairly straightforward to use is a PHP class and you can build your own wrapper for other technologies if you want the idea of creating a wrapper on top of an existing API so you don’t have to take care of all the internal communications and the transformation you need to get those objects that you have in need of JSON or XML and build something that it’s easier for you to manipulate on your technology so invoking the client is pretty straightforward you just call the class and with that class you have to create a connection before using that as you can see in this example I have everything in my local host so I’m just connecting to the server to jasper server understand the radiating port on produces you give the username and password that you’re going to be using to connect and here’s

where a lot of things come to come to your mind because in this case I’m just hard coding those but you have the option of using either single sign-on process so when you’re calling Jasper client you will be using your own user the ones that you have created in your application if you created a pool SSO implementation you can rely on an external client so let’s say your application can rely on LDAP or tasks for a user education Jasper can do the same so you will have that information to pass on through the client and of course if you’re using multi-tenancy in the connection you have to pass the organization name so Jasper can connect and know exactly who is which part of the repository should expose ok so one thing Mariana is maybe you could imagine that here you would just add a check maybe if your application if the user exists and the Jasper side and login if he doesn’t exist and create him that sort of thing exactly and since if you have services through reps that allows you to manipulate users in Jasper you can create them on the fly at that point instead of using a unnecessarily implemented from the size jasper side you can implement that directly in your logic so you can connect with an admin privilege user to your server create that user object and then log back in so you can access whatever that user will that user role will enable them to so in the the other snippet of code we have here is just a simple way to get the elements of the Jasper server repository so the repository in Jasper is where you have all your report units and dashboards and everything that can be consumed by your end user so the client has a really nice way of getting the full repository and giving you an object that you can iterate and get all the elements for each one of the resources you have there this is just a simple query I’m doing a repository with no arguments who will give me the full list at that level and just iterating through that I’m just printing out the name you will see in the sample that I will show you in a minute that I do a little bit more than that so so I can manipulate those objects accordingly so the resource is a folder I can link that to a subsequent call to see their contents of that folder if the resource type is a report unit I just can link that to the part of my code that will be handling the rendering of those reports but the idea is that you receive a full resources vector that will give you the all they give me all the all the elements of that particular resource once you once you have that resources krypter and you’re working with a particular report unit you will see that each element has what we call a URI string so that your eye strain is the location in the jasper server repository so with that URI string you can use the same our client library to generate renders of that report in any of the of the same formats that we support from the UI so in the first sample I’m just executing a simple report passing no parameters at all and just asking for an HTML rendering of that and since the HTML string that I receive I can just echo that out to the screen but in a particular implementation you will want to do a little bit more there you can run the report and ask for a specific page of that report in the descriptor you will have all the elements that that the reports need to be executed so if you have input controls like in the second example you can query for that in the in the research and use the client to get the input controls on contact and you can render those directly for the user remember that when you’re using ap is like in this case you are in charge of rendering all that information to your user so

Jasper will give you all the information you need with a get reporting put controls he’ll get the full set of e patrols they will have an ID they will have the type of ink control that you need to use and you will have to render those depending on the on the display level that you have in the case of pH Pierre will assume that you’re working with a web application so you will have to render that it’s either as a select box multi selects checkbox depending on the type of control that you receive and all that information is in there in the input control itself once you get the user input for those controls the for passing those two to the run report unit you just pass those as an array so in the case of multi select it you pass the array of the selections of the user if it’s a single select just the value that you need and by passing those controls to the report unit you can get the rendering of that report those that specific input one one thing that you will notice is that Jasper also supports what we talked a scathing input controls and the idea of a cascading control is that you will need the input of previous four of a previous parameter to render the second link of control the good thing there is that you can use the same API to interact instead of through your PHP application projects and get the Jason’s for those controls and mimic the functionality that comes with Jasper directly from there so it gives you a lot of flexibility on how you can render that you can work from the PHP side you’re going to work from the JavaScript side and get the information that you need to enter that that particular set of filters cool quick question Mariana the so you said the input control values would come back as JSON but everything else comes back at the XML this is that right you can decide how you want to do the call so you will see in the on the API and also in the in the documentation for the wrapper that they get reports input control you can ask for a specific format so you can when you’re doing the call to get the input controls you can request that response in XML or in JSON oh that’s great it’s actually not thank you so normally when you’re calling from within the PHP you will receive the XML format because it’s easier you have a simple XML class that allows you to iterate through that very easy but when you’re working from Java Script is way better for receipt thousand days okay so I think that after this life is this just for helping me to show a little bit of what we have taught up to now and why rest and iframes and why what are you bringing to the table which is each one of those implementations so up to now we talk about the REST API so the REST API it’s great because you can you have full programmatic control of the rendering of the elements of the calls that you do of a server or everything that you want to do with the jasper server back-end so it’s great when you’re looking about working with canned reports so reports that exist in the repository and you won’t expose those to your users the other thing that you want to think about is that task force of some great functionality in terms of interactivity and a good UI for analysis instead of having to recreate all that into your application and doing thousands of call to Jasper to bring that interactivity to life it’s way better to leverage the idea of Jasper Haase working with themes to customize the UI can bring that in an iframe to your application so in the sample that I have and I’ll just jump into that after this slide you will see that I use both methods because each one has their benefits and the way that you control that you see a couple of screenshots here this is a sample we’re looking at this is a natural application

that is using rest Web Services to bring report units so you can control the way you show the repository and everything that that’s all up to you to generate the code to to build that part on the iframe side it’s a little bit different you customize it just for server UI and you bring all that functionality that’s already built to your application so you have benefits on both of them and I’ll say that most implementation will use a mix of them because it enables you to have really good interactivity on one side and full control of the of the server and the output of those reports on the other hand so let me jump into the into the sample so we can see a couple of things who have been talking about up to now in action so this sample will give you the link at the end this is actually available for you you can go ahead and install it and play with it so for the single sign-on part and we’ll talk after that about the options you have there I decided to go the lazy way here so since my PHP application the only thing it does is actually integrate Jasper in a way it doesn’t do anything in particular I decided to use Jasper as my authentication provider so instead of having my own of education system in PHP and just relaying that on Jasper and rest the rest API has functions that allow you to see users authenticated he will give you a session ID on Jasper so you can use that session ID to initiate excuse me iframes where the user will be already logged in so let’s jump along into that you will see when you install it there’s a simple readme that talks about what the sample does you have the access to the wrapper library documentation which is really useful to show you exactly what you can do with the PHP wrapper I have two tabs here one uses web services and the other one uses the UI integration through iframes so as you can see with web services I can do those calls similar to the one we saw in the code snippet before just to bring my full repository and just render that in this case it just uses a simple list with icons to show the folders so if you want a sample you’ll see that if you hover any of the elements I’m just putting the full resources script rank XML there so you can see all the elements that Jasper will actually give you when you do a call to the repository so you see that you have a type operative resource you have the URI string which is important to make all subsequent calls to the API you have the resource type properties that that portraying to that particular resource and the way that are integrating these if I been working with folders I do a call back to my application to receive the contents of that folder and that way I can go further than the line to lease the report units so instead of quick question Mariano when you were hovering over there was something like permission mask prop security permission mask yeah does that mean does that mean that the repository security is ignored by the rest api and then you have to rebuild it or else know that actually something else I loved in myself as an admin and any user that has access to the security of an object will receive that that what that is showing you is actually the mask of the permission setting that that particular object has if you log in with a different user and I’ll do that in a minute you will see that the elements that you cannot see at the repository will not be sent to you because the security layer is still there on the Jasper side that’s just masking when you go and right-click on the Jasper UI and see the properties of an object you will see also the permissions of that okay the permissions there okay all right but and there is I guess there is an API that I could say for the employees account change the mask to something else exactly you have control of the security and you can do that of course your user will have to have enough permissions on

that particular resource to change the permission settings so you have to you need admin privilege to like if not the API would like to respond with you have enough permissions to do that operation all right all right cool thanks for that ok so as you can see I did that for all of them and since this is a sample the idea that you can see a lot of things printed out more more for knowing what type of resources you’re getting and doing some some debugging on on implementation so in this case this employee accounts is a report unit as you can see you do have properties of prompting for input controls so they put control that you have if I go to another one you may have different values there you can choose to ignore those elements but they are there for you to actually read and figure out okay what type of questions should I ask the user to handle that particular resource so if I click on the under report and just have a simple view report page I just listed a couple of the export formats here so I can play with them as you see a this report have an input control so I had to go and render that input control just will give me all the information so as we saw in the code snippet before I can do a call and you will receive all the elements to populate my select I know I just printed that out so you can see that that the input control type is a single select so that’s what I render here and I can choose any of the elements and pass that to the report and in the case of HTML I decided just to print out the output directly here without any type of excuse me of paging or anything you can implement your own pager and the API allows you to ask for a specific page in the report so that’s that’s the way you can do it you can get the full report the other the total amount of pages and instead of rendering everything like I did here and just print it out one page after the other you can ask for a specific page bring that one and go forward from there and so that that’s great because that was one of the questions was how does the pagination work in the API so sounds like the advanced repellent and then I’ve got another question which is can you pass any sort of export parameters to the jasperreports library at this point like if I know HTML has a you know ignore pagination or you know XLS has is there any sort of exposed stuff in there rest api as far see now no it’s not and I will tell you why because those parameters are said in the report unit themselves in the J XML so at this point I’m just executing that there XML but what you can do with the rest api is deployed at your XML on the fly so if you want to go the whole nine yards you can have a like a standard error XML do those modifications on your site deploy that to a repository and ask the server to run that particular report but from the one report command you’re running what it’s already on the server that’s why you cannot pass those parameters okay okay make sense thank you okay so the export formats I just leads to the few here I just coded those but all the export formats that are available from the server are available from the from excuse me from the API so you can get PDF pixel or pagination on a generic that’s of course so you can get a full PDF there I decided to just force the download of the PDF you can show that inside your application but the way it works is very similar to what you will see from the Jasper user interface and as you see in the case of a form of different that HTML I’m just choosing to to expose that and let the user download the document instead of just printing whatever I received there if you think

about it this is pretty powerful because you’re receiving the full stream so from your application you can decide to write data and your file system to deliver that in an email to do other things with the report unit themselves because in this case I’m just grabbing the string and shooting that back to the processor but you can add your logic into that okay so from what services the other interesting point is that you can also access the scheduler so if you want to scale the job on the one day Jasper back-end you can do that directly from from the word services that yeah you have a full set of calls that deal with the job scheduling and one of the things that is interesting to know is that the services that the Jasper scheduler offers through the API are richer than the ones that you have from the Jasper you are so it will give you the ability to create calendars on the scheduler meaning that you can set the a stewardess a designated us holidays or non-working days and then create a scheduled job that will run a report only on one of all working days in your calendar and you can create as many calendars as you want on the on the packet scheduler the the other part of embedding and we talked in the last slide is leveraging the Jasper UI so in these other just for your integration tab that I have here I did something completely different I created a small menu navigation menu of my application to access different elements on the Jasper repository so as you know Jasper has what we call an HTTP API and it’s a it’s a way of grabbing specific elements in the repository and addressing them through URL so if you see the Jasper URL what you’re using just but you are you will see that you have a flow with the flow name which tells you which part of the application green and that flow name can have parameters and you can execute everything from there so my lever that and leveraging the things with which I will talk in a minute I mean bring in part of the Jasper UI into my application so this is the library page for the ones that have used just four four seven and five you know that this new library page gives you a list of all the report units dashboards and everything that you have on the on the application and allows the user to interact and run or schedule those reports so it’s an easy way to bring a full reporting environment without doing a lot of programming work so I’m just have everything set up on the Jasper side and I expose this library to the user you can go here and do quick searches and go ahead and execute them know in a specific dashboard and now it’s a lot simpler than what I have them on my REST API because I’m just leveraging everything Jasper will give me so I can click on a dashboard and get that directly on my screen if I choose the dashboard that actually works that would be great I can go to the repository view and see the full objects that have on their own containers and you know the the CSS here I did a very very mild change it just took everything that it was of the Jasper branding and the background color and I left everything the same but you can work in changing those icons and changing font size and all the CSS to match your application yeah that’s a key that’s definitely a key point there and it seems like the CSS matches your application pretty well just by default but yeah certainly know that it’s fully customizable so you can match the fonts the colors the spacing the boxes the shading all that stuff can be completely changed exactly and then since I built this sample I actually use a color scheme that was pretty similar so I didn’t have to change that but yeah everything is customizable there and and you will see that I have a couple of slides that we talked about that but you have full control of everything here and that’s kind of the good part of using

this UI integration the as you can see I have a link that goes directly to a dashboard you have all the functionality about the new charting libraries and you have to do anything from your site to diversity’s just bring the iframes get the single sign-on working in terms of executing specific reports well now Jasper would takes care of my input control so I don’t have to do anything in my side and I can just apply those and work with what Jasper gives we have the same UI I now can leverage the full interactivity of the interactive viewer of Jasper so everything that I have from the Jasper UI and bring it the same thing to my application let’s say I want to work with changing the formatting English my columns resolving having the asynchronous loading of my reports since I’m bringing just for you I I’m I have all that for free and as you can see it’s the minor thing but in this actual report that I’m linking directly I took her from the machine to get rid of the batboat on here since there’s no where to go back from my application and you can do the same for the Save button or the export if you don’t want the user to export that okay yeah that’s really nice it seems like you’re talking us out of using the the PHP API because it would be like quite a bit of work to build my own pagination build my own input controls I lose interactivity so it’s compelling I’d say to it this way versus the PHP API way that’s why I said that you will probably use a mix of those because you know what I can use the PHP API to get the exact URI of this report so I don’t have to hard code anything in my application so I can jump from something like my view of the repository and when I click on a particular report unit instead of instead of rendering that through the API I can just bring that the same thing out the same way I did here in an iframe so the PHP API is still is a good thing because it gives you lower level access to the repository and as I said before you can deploy a report on on the spot when you go to create an ad hoc view and get all the Alcock functionality of Jasper you know what you can report you can deploy these topics or a full domain from the API directly for that user so I think that the answer is you will use both and the right tool for for the for the job is the one that will will give you most of the functionality without having to do a lot of your own yep mxf makes perfect sense so before we run out of time and we we can get back to with more questions I want to just go over a couple of things that I talked about but we didn’t get too deep into like single sign-on and and the themes so youth in my sample I cheated out on this and this I saw as I said I’m using Gasper so to authenticate so it’s a little bit easier but Jasper gives you a variety of providers we have built on spring security for those that know Java so you can easily integrate Jasper with LDAP or Active Directory or use tasks as a single sign-on mechanism so if you have that in your application it will be a good thing because you can do the same with Jasper and use the same Authenticator for both of them the other option that you have is of course extend the security of Jasper and your application can be the security provider so I’ll go over a quick overview how that works internally as I said this is probably a full topic for another Tech Talk but the idea of spring security is that you have entry points and filters that you can hook into so in this case the SSO server can be your LDAP server which you have configuration from before but this can be your own application so when Jasper receives a request you have an entry point there

and you can extend that to redirect your login to your application and then what that happens redirect back to Jasper from the provider and get either a validation ticket or get Jasper to go directly to your database and grab the token for the session of the user you name it you have a lot of points there that you can use to interact to your server so you have entry points they are provider validations in between post-processing of authentication and what Jasper will do at the end of the start will just sync the internal data rays to what you have asked so if you do a full as a so integration you can create users on the fly through the SSL process which is a really nice thing and the other part and we can talk a little bit with Esther a few minutes ago it’s working with the themes and and working with the Jaspers of UI so Jasper you will see that if you log in as an admin you have these themes folder in the UI this is the act the full CSS stack that Jasper suffuses you can override any element you have here meaning that you can authorize hisses you can override all the sprites and icons which are in the image folder and change fully how this works so these folders you don’t like them you take them out the way that the all the icons and the toolbars render you can change all that and have full control on how the user interface loads the themes if you go through the user interface if you right-click on a theme you’re able to download that and give you a sip file with all the CSS and the images and the other interesting part there is that when you create a new theme on Jasper if you don’t define something on your own thing Jasper will look to the default theme for that recognition so it works like overlays over the default theme and it’s that’s actually a really good way to to implement that you can have a master thing with the UI that you need and then just have different overlays depending on how you talk to the server and the theme that you want just but render something can be passes the URL parameter which is also written acting and that’s the way I’m using in the UI integration sample that I have and if you want there’s documentation of the UI but I think that the best thing is if you go to if you long as an admin and go to the View menu have a sample UI page and that page will actually show you all the elements render and give you the classes for the CSS for each one of the elements you have the standard layouts all the components of the UI so when you’re designing you are this page will be your friend because you can see exactly how everything gets rendered without having to don’t take lore around the UI to get everything to render see in our next typo we have some more questions and yeah there’s there’s a few questions that have come in let’s say let’s do another poll question now it’s just about your level of knowledge and jaspersoft just so that we can help guide future shows and the content and then we’ve actually only got a few questions from the audience today so we’ll maybe wrap up a little bit early but so what’s your level of expertise with jaspersoft if everybody could just just vote on that for a beginner intermediate or a Jedi now unfortunately we can’t vote so one of the questions while people are voting Mariano is about downloading the example application if you have a way to get to your github account so the full sample okay if you send that to the chat I can repost it to everybody so they

have the actual URL okay so then I’ll just go through and some more questions here it’s soap web services going away in favor or of rest or are both here to stay and that’s actually a really good question I don’t want to get into too much trouble without having somebody from product management talk about that specifically but you might notice the new features tend to be going into the REST API versus the soap API so the dashboards API I think that we’re introducing things like that or are starting to go rest only so while there are no official plans to end of life soap you know anytime soon we’re trying to entice people to move to rest and that might be a hint as to what may happen in the future I don’t know if you have a better answer than that but no I think that you will notice if you go through our projects is that the SDKs for mobile access both of them are using the REST API also so as renessa said there are no actual plans on suspending the support for salt that there still is there and the product and for the foreseeable future is there but the new features you will see that rest will get a little bit of priority there on getting those in and gives you the thing that it gives you an easier way to handle the all the services from from Jasper excellent so we’ve got the poll response here looks like there’s a lot of people on the dark side nobody has become a Jedi all right that’s my job all right the so we’ve we’ve posted a couple of links there if you’re interested in downloading Mariano’s PHP sample you can get it from github there so get that branch and in the next couple of days that will be merged on the master branch but the difference wrapper branches is the newest one there alright it looks like your github icon is a South Park character alright I’ve seen worse so another the final question today unless there’s oh yeah no some more some more commits so does the rest API support the ability to read a security XML file from a domain object and then write or update that same security file so he’s asking can i download the security file make a change to it and then upload it I don’t know the answer to that that’s available you can replace a repository server I’m not sure if you were I think and on homie on this one but I think that you have to update the full resource meaning that you download the whole domain and you’ll have to upload the whole domain but probably have to check that foundation okay the option of getting resources so if you go just talking about the PHP side on the sample that I have I gave access to the PHP client class so on the repository service you will see that you have options to updating new resources and updating an existing resource in the server so you can also delete if you have that access level and you can get the full domain definition if you want but I’ll take recommendation I’ll get back but I think that you have to upload the full domain definition within your security interest ok so just just for just for the questions that don’t get fully answered live really all the

questions we actually typed them out into the into the page and also upload a YouTube video so just be aware that if if you want to review any of anything that was shown or we didn’t answer a question live it’ll be answered in the next few days say one thing that remember when you define a domain your internationalization bundles and the security file can be linked to that domain and they can be repository elements by themselves which is a best practice there for you so that’s how you define your domain you can just get that repository element which is the security file or not baby for anyone with no problem ok yeah that’s a very good point my and so yes it is supported whether it’s a direct call to the domain API or it’s just a regular update you know plain old update resource alright very good Mariana one last question does Jasper self support give support for the REST API yes that that is true you will get your questions answered now if your questions about I’m trying to make a call you know from my application and it in it how do I do it like you can’t come and ask support how do you do it you can ask support I’m doing it this way and it’s not giving me what I thought I should get if that makes any sense so so don’t expect them to write your code for you our services people would have to do that as part of a paid engagement but in general it’s more specific questions about the API not open-ended once the services that are provided yep exactly okay guys so the next Tech Talk is here in Europe it’ll be at 10:00 a.m. GMT so it will be about 4:00 in the morning for those in the US so you can watch it you can watch it record it but it’s it’s about using I report with the domain so it’ll be an interesting one because it’s a it’s a it’s a hot topic a lot of the times when I talk to people so ok great well Mariana thank you very much for for hosting or for presenting in today’s show is a very nice very nice demonstration you put together for everybody I’m also cured you be able to share the slides as well because it seems like one of those good just you know how to get started with PHP and jaspersoft that seems to be like the best resource I’ve seen so I don’t know if you have a SlideShare account or if you just or if I can just post them yeah writing I can get that to you so we can lick those from the techno page okay great great well thanks everybody please tune in to future shows as well your participation is very appreciated and I hope you got a lot of today’s session bye everyone bye thank you

Uncategorized

Use-Cases and Methods for Scalable Enterprise HPC in the Cloud

yes thank you for for being here on a saturday afternoon in Austin I’d like to talk today about HPC in the cloud and scalability in the cloud and some of the use cases that we are seeing from customers deploying on AWS again at scale on the cloud across enterprise applications across the industries as well as as well as research want to talk about scalability in particular I’ll show some examples of customers that are employing large-scale simulations for example on AWS how they’re doing this some of the automation that they’re putting in place we’re going to talk through those use cases fairly quickly and then dive down into some of the capabilities of the cloud what you get in a virtualized environment how you use automation to stand up HPC clusters and how you think about parallelism in a different way and job scheduling in a different way on the cloud and then finally I will give you some pointers to how to get started we actually have for attendees here at the conference we have some credits so if you want to get started with AWS at zero cost we can show you how to run some HPC simulations on the cloud and provide you the credits to do that so before you leave be sure to get the cards that we’re giving out that will have a pointer to those credits I should also mention in the back of the room Linda hedges Dougal ballantine from our HPC solutions architecture team if you need questions answered after the session and go deep those are the folks to talk to so i’m going to move fairly quickly in the interest of time but why our customers wire hpc user is going to the cloud now it is it is primarily about getting scalability not so much to reduce costs a little can bat can be dramatically cheaper to run HPC in the cloud the more to get time to results much faster right to get your scale scale your simulations much much higher run many more parallel jobs and get things done quicker without the need for a job scheduler and I’ll talk about that concept in a few moments it’s very powerful when you can scale to tens of thousands of course on demand very very cheap but perhaps more importantly you can think about putting other applications in the same place big two applications data of course has gravity so if you can keep the data and the compute in one place it’s far more effective and it’s all far more secure as well just want to mention that global scalability is critical we have customers now that are operating very very large compute grids on AWS that in many cases span the globe you don’t have to of course you can keep all your compute in one of these regions but we continue to expand we have regions now across the world as you can see we’ve recently announced that we will have region soon in india in Korea in London so we continue to advance and add these new regions over time when we talk to simulation customers again whether their commercial customers in industry doing large-scale engineering projects or their life sciences customers right or they are customers in research we hear of course that they’re looking for scalability the ability to scale up very fast and scale back down when they no longer need that cluster but they also talk to us about the need for secure global collaboration I touched on this a moment ago the idea that a researcher in Europe for example can collaborate with researchers in the US and in Asia on that same data in a very secure way which leads to the discussion of data governance for many of the customers that we work with that data sovereignty a data provenance is very very important to them so doing this in a very secure environment in a very monitor and manage way is very very useful and we have customers now publicly stating that including NASA for example recently stating that what they can do on the cloud is more secure than that what they could do on-premise when we look about look at the use cases out there on the cloud today we talked to our customers we see a lot of legacy HPC applications now moving to the cloud goods good examples here in the simulation space things like antenna simulations things like life sciences simulations genomics and proteomics there’s actually a lot of CFD now being run in the cloud you know which is not something you would traditionally think of as a cloud-based application but lots of innovative companies now cfd as a service or individuals that are running CFD and other applications like that a scale to do very large parameter studies finite element analysis molecularly modeling oil and gas reservoir simulations weather simulations all of these now are running quite high scale on the cloud when we talk about traditional methods of deploying HPC if you’re an HP see user today you understand the challenge of

capacity right a small manufacturer or large research organization whoever you are if you’re deploying on-premise or owned and co-located perhaps HPC resources you face this capacity challenge in an ideal world such as this you would always have enough resources available to run every job you need to run when you need to run it well this is an ideal situation so you continue to add new servers new racks you keep up with the latest innovations and you always stay ahead of those capacity needs of course that never happens what happens is this you instead have some compromise between your peak needs and those valleys because you want to operate at a high utilization the result of course is that jobs get delayed you use job queuing software or perhaps you delay entire projects until those point at the point at which you do have enough servers this is the reality today so our take on this is that hpc job queues in particular are evil you should get away from them there’s always in need of course for job scheduling and workflow management tools whether you’re running the cloud or running on Prem but the basic concept of managing to a fixed piece of infrastructure that has only so many cores with so many jobs being thrown at it really becomes a bit of an archaic idea when you move to the cloud because you can scale up as needed almost a ridiculous amount as needed when you need to run those jobs quickly in a traditional HPC environment whether it’s an enterprise environment or a research environment you’ve got this inherent conflict of goals right you got a central IT team that’s managing that cluster and wants to run at a high utilization for for cost optimization but then you have the actual consumers of that resource that are throwing jabs at the queue and expecting fast results they don’t care about utilization all they care about is getting the job done fast it’s an inherent conflict of goals right another way to look at is if you use job queuing software if you use cluster management tools you probably have some graphs you know around and graphs that might look like this the central IT team that manages that hhbc resource may be very proud of a graph that looks like this because it shows very high utilization right so you can report to the people that are writing the checks that you’re actually getting great utilization of that resource in this case over eighty percent which is a almost ridiculously high utilization in many organizations but in some well-managed organizations this is what you do see but actually the flip side of this is what does it look like to the end users if you look at how many jobs are in the queue over a given time frame you know for example this may be a month and maybe a year while the central IT team may be looking at this in terms of that utilization graph the end users are impacted by this the jobs in the queue the time waiting in the queue it’s a serious problem in hpc environments so again get rid of that constraint scale up the clusters as needed use cloud resources to scale very large numbers of jobs in parallel to spin up different kinds of clusters that are application optimized for example if you have certain codes that require high memory verses codes that require high cpu density spin up different types of cores different types of servers for those in the cloud on demand and use automation right you can use automation in the cloud now using something called a virtual private cloud that I’ll talk about in a moment and using automation tools that we can provide so do a lot of the same kind of job management and workflow management methods that you use previously on premise waitin to use those in a scalable cloud environment automatically scaling the clusters up and down as needed and using traditional job management job queuing software tradition scheduler’s just as you did before but again in a more scalable manner I’m going to flip and talk about some examples we have so many examples now public cases of customers that are running at scale hpc I’d like to focus in particular on manufacturing customers like we talked to a lot of manufacturing customers that that have a need for scale and have new simulation requirements that perhaps they didn’t have you know even quite recently good example of this hgst the disk drive manufacturer they spoken publicly a number of times about their path to the cloud their migration to the cloud over time which was triggered really by the divestiture of the disk drive division of Hitachi which itachi global storage technologies out of Hitachi and the sale to Western Digital that happened a couple of years ago as a result of that divestiture they lost access to a lot of hitachi’s central IT HPC clusters and the like they had to rethink what they were doing as an organization in support of their engineering needs at the same

time their simulation requirements were going through the roof hgst develops very advanced disk drives perhaps you use them in your own installations these disk drives today or helium-filled highly dense to develop these disk drives requires a tons of simulations magnetics simulations thermal simulations cfd simulations so they made a decision the CIO of the organization Steve Philpott made a decision that they would go cloud first right so they would really run everything they could in the cloud including these large simulations and working with partners like cycle computing and AWS partner they are able now to deploy sometimes seventy eighty thousand cores at peak devoted to a single set of simulations again parameter sweeps for these large magnetics simulations that they need for disk drive design to investigate perhaps millions of different disk drive head designs right that’s a very typical use case that we’re seeing in manufacturing the idea that you want to explore the design space much much wider and do this using cloud trek bikes recently spoke at one of our events very similar use case in this case it was a CFD use case they’re using star CCM they’re using software from rescale in this case another AWS partner to scale this in the cloud and do those design explorations to do those very wide parent parameter studies for bicycle design lots and lots of examples like that out there today what enables this for these customers whether they’re looking for security whether looking for job management capabilities automation capabilities is the virtual private cloud that’s provided on Amazon Web Services with a virtual private cloud a VPC you can really stand up a virtual data center in the cloud all of the security and compliance requirements that you need the ability to set up your subnets set up your identity and access controls monitor the system all of those things that you would expect as let’s say an enterprise or academic and research installation for your on-premise clusters and far more as well many of our customers who are in this enterprise domain and manufacturing and other domains they want to have hybrid approach where they still have some on-premise infrastructure that’s connected into this virtual private cloud over VPN or via a service we call Direct Connect that allows you to have a private connection into AWS on from your on-premise environment and never traverse the public internet very common use case now for enterprises that are moving to the cloud this hybrid approach another benefit of the VPC is you can set up the network topology so that really mirrors what you’ve got on premise you can extend your subnets from the on-premise world out into the cloud you can extend license servers across that border lots of lots of capabilities and you can lock down sections of that VPC for specific users for specific applications under the control of the security and compliance teams of your organization again tons of examples of this in the life sciences space DNAnexus has been a great story for us scaling up quite dramatically over time in support of life sciences customers recently they’ve published with Baylor their charge project doing genomics analysis for thousands of different participants tremendous numbers of runs scaling up using spot instances that we’ll talk about in a few moments to run again very high scale workloads on AWS in this case like hd-sdi mentioned earlier they’re using these spot instances to scale up to tens of thousands of cores in support of this analysis and to bring the time to results down quite dramatically another great example you know just public quite recently was walt disney has talked about how they’re doing rendering in the cloud POC projects at this point but they will they have talked about production rendering for major projects in the future you can see their full presentation on SlideShare if you’re interested it’s a very good presentation because it goes quite deep into their use of spot instances and their use of VPC to create this hybrid environment of on-premise rendering with the cloud and like the other use cases they’re spinning up tens of thousands of cores on AWS at a very low cost to enable this if we dig a little bit deeper into that use case rendering use case or if we’re talking about simulations as well using a VPC the pattern that we’re seeing is becoming a you know quite common and template it even in the VPC as you expect you might

have you know servers that are being stood up on demand scaling up perhaps automatically in response to need connected into the on-premise resource using either our Direct Connect or a VPN if you want to maintain everything in the cloud not only the compute and the storage but also the login nodes and just push pickles pixels back and forth that can be enabled through the use of GPUs in the cloud to create a graphics workstation in the cloud lots of customers now doing that quite successfully for engineering simulations for energy sector types of use cases as well digging a little bit deeper into this on the on the lowest box as the two below you can see some storage layers and this could be done using let’s say Intel lustre add to set up a shared file system on the cloud or using third-party partners of various sorts to create storage caching layers within that VPC you see on the left the use of Amazon s3 that’s an object storage that’s a highly durable 11 nines of durability a highly redundant storage system and that really is for most of our customers the source of truth that is where their long-term data storage would be in s3 and then as needed they can move that data from s3 into their VPC to do their runs to do their processing perhaps using something like Intel lustre during the course of that of that computation and moving it back into s3 when they’re done accessing that data perhaps from other applications or remotely there’s a number of ISVs now that have embraced the use of cloud for simulations one of them is here today altera of course all tear has hyperworks unlimited this is an appliance approach somewhat like software as a service where you as a customer can go to all tear sign up with your accounts and then they can deploy the the suite of all terror solvers and the remote desktop environment on the cloud using AWS and they’ve been showing this as conferences it’s actually quite impressive how it works lots of solvers in their job management tools in the lake ansys has also done something similar they call this their aunt’s ascent er prize cloud this is a little bit different from the altar approach it’s not a software-as-a-service it’s more of a portal it’s a way for you as a customer to deploy ansys software into your own AWS virtual private cloud slightly different approach but the end result is the same europe able to run your simulations store the data and visualize the results graphically all in the cloud well let’s switch now and talk about capabilities we’ve talked about use cases we’ve talked about what customers are doing let’s talk a little more about how they’re doing and how they’re getting performance that’s needed for these solvers for these applications we’re very much in line with with Intel in this we offer our customers environments in which they get access very low level access to the capabilities of Intel processors this is a virtualized environment but in the in the case of our larger instance types that are typically used in hpc and solvers you get extremely high performance very low overhead in terms of virtualization overhead you get access to capabilities of for example has well in the avx2 instruction set the ability to control power states and so forth see States and P States in these in these instance types that we offer by way of a little vocabulary here when I say instance type we’re really talking about a virtual server in the cloud that has a particular set of characteristics it may be a certain generation of Intel processor with a certain amount of RAM certain storage architecture that is an instance type on AWS and we have many many instance types today our most recent instance types the ones of most interest to the HPC community are our C 3 s c4s and now our m4s and these as I said use the latest generation intel processors they’re not oversubscribed you’re in a virtualized environment but the overhead is quite low and you do get access to these advanced features if you’re using intel compilers for example and taking advantage of avx2 now these are increasingly dense we’re offering up 240v CPUs on our new em for instance we’ve recently announced an instance called X one that will have up to 100 v CPUs when we use that term V CPUs by the way we’re really talking about hyper threads that we’re exposing to the customers so if you’re interested in physical core accounts for these largest instance types simply take that number and divide by half so a 100 vcpu X 1 would be a quad socket e seven based server in which we’re offering you 50 physical cores 100 hyper

threads 100v CPUs our website lists the details of all these I’m just going to show a few here if you think about an instance type like a c4 that is going to be based on e5 processors this this one we actually use a custom processor developed in conjunction of course with Intel the e5 to 666 you won’t find that in the specifications on the Intel site is specific to Amazon we worked very closely with them to develop a an instance type an instance family the sea 666 that has a performance and you know the capabilities that we need in the cloud environment and this provides very high performance for our customers there have access to turbo and as I said before avx2 and other capabilities all of these instance types c4c 3r3 you hear these names come in different sizes some people like to think of them as t-shirt sizes right so here for example there are five ranging from the sea for large which is listed as having to V CPUs that really means it’s a single physical core a certain amount of memory in this case the storage there’s no attached storage on these particular instance types it’s all elastic block storage which is a proprietary network attached SSD based storage quite high performance and then they range up to these largest ones in this case the sea for eight x-large which has 36 v cpus 18 physical cores our newest family m for similar but it has more memory so this is also a haswell-based instance type you can see that it has in a more memory about double the memory of the of the c4s slightly lower clock speed so where have the compromise there between clock speed and the amount of RAM in this m4 it’s more of a general-purpose instance type we have other instance types like our three have substantially more memory we’ve announced recently our x1 which will be coming out in 2016 and that will have up to two terabytes of memory in the largest size of this x1 instance type so that’s a very high performance instance type high memory it’s primarily intended for applications like SI p hana and other in-memory types of applications in the IT space but we also think that the HPC community in fact we have a lot of feedback from the HPC community that this will be a very important part of a typical HPC cluster whether it’s a head node or whether it is a an actual worker node doing a memory intensive computation talk a little bit about networks Amazon Web Services is operating at such scale now that we really design and build everything on the network layer our networks are nothing like you’ve seen before there there have been presentations at some of our events that you can find online our reinvent conference for example that describe this in more detail than I will try to today but there’s a proprietary a 10 Gigabit Ethernet network built for extremely high scale for high consistency of performance right and it scales actually quite well for many of the traditional HPC codes i mentioned CFD before it does not have the the latency characteristics the low latency characteristics of solutions like InfiniBand and others so scalability for many of these codes in CFD and in MPI types of applications can you know be limited but we’re finding in our enterprise customers and research customers that are doing benchmarks and actually testing CFD weather simulations and other types of applications the scalability is actually quite good of this these instance types up to some scale and that scale is going to depend entirely on the application and the models that you’re attempting to simulate the major benefit of running in AWS is you can now run tremendous numbers of these simulations in peril so you think about a parameter study in CFD where you want to run a thousand models in parallel with different parameters you could do that on the cloud and scale extremely wide but even within each one of those simulations you can scale surprisingly large on a typical HPC job even if it’s tightly coupled MPI but you really need to test so please and again the credits that we will offer today if you want to do some of that testing we have a tutorial that I’ll talk about in a moment where you can run some CFD test for yourself and see how it scales let’s talk a little bit about the consumption models for cloud there are three basic ways that you can consume AWS ec2 Elastic Compute

cloud instances 3 pricing models if you will the first of these is on demand computing pay by the hour some people call this coin op computing you just keep feeding the meter and as long as you keep feeding the meter you can continue to use that instance on a pay per hour basis no commitments long as you want to use it you pay for it when you’re done shut it off stop playing it’s simple it’s it’s easy to get started no commitments great way to get going if you want to run over a long term or a very high scale you should think about some alternative methods of paying for services one of these the long term commitment whether it’s one year or three year would be reserved instances so in that case for example you want to run a database or the head note of your cluster or maybe you have some base level of workload that you need to run every day you can purchase a reservation that will give you a significant cost benefit and you’ll also get that reservation that capacity reservation as well the third model is the one that’s most interesting for hpc and in fact all of the examples I cited earlier whether it’s Disney doing animations whether it’s hgst doing disk drive simulations honda has spoken about how they use material science simulations on AWS many many customers novartis and others they use spot instances if you haven’t heard of spot instances these are incredibly powerful it’s a bidding mechanism it’s an auction mechanism to get access to our excess capacity now our excess capacity any given time in these different instance types in the different regions around the world is very very large and so we have thousands of customers at any given time that are running spot instances for high scale workloads let’s bidding mechanism the trade-off with spot instances because it’s a bidding mechanism if you are outbid during the course of your job for one server or perhaps multiple servers then your server can be terminated with two minutes of notification that sounds terrible if you’re running a database but it’s just fine if you’re running large-scale Monte Carlo simulations or as I said parameter sweeps and you can simply restart the job or shift it to another instance type or run bitcoins it is not cost effective now so in reality what most of our enterprise customers do and I mentioned use cases before is they’ll run a combination of these of these models reserved instances they’ll run as I said for things like licensed servers and cluster head nodes perhaps databases and the like because those are long-term jobs right they want that price advantage and the long-term reservation if they have workloads that are not fault tolerant that the the cost of actually losing that instance might be quite high because of the restart need then they’ll run those on demand but the difference in price between running on demand hourly on AWS and running on spot can be as much as ninety percent off on spot a 10x reduction in price by running on spot so customers that are running on demand and scaling up and scaling down they pretty quickly figure out how to put automation in place how to use some of the capabilities that we provide to help manage that bidding across different instance types and what we see with customers again and again is that once they get to a certain scale they begin to scale up dramatically on AWS we see their use of spot scaling up quite dramatically and their use of on-demand tapering off because they learn how to manage interruption I want to talk a little bit about spot because it’s so important for hpc and it’s how I’d really like you if you use these credits and this tutorial that will provide to take advantage of spots that you can run at high scale and really pay very little for your compute so spot is basically an auction model right you’re going to be at a price the maximum that you’re willing to pay for a given our of compute for a given instance type right as long as the supply of that instance type is greater than the demand there’s actually more supply than there are bidders that price will stay at a very very low level very very cheap right and that’ll be less than you bid and that’s all you will pay you won’t pay your bid price you’ll pay that floor price it’s only when there’s more demand for that particular instance type in a particular pool of capacity that the price will start to come up because now you have a bidding situation between these different customers and as soon as that bid price or that market price goes above your bid price you’ll get a notification two minutes and your instance will be shut down right so it’s fairly simple you know rules as to how we operate it to use it effectively there’s some things that you need to keep in mind you need to think about

building stateless applications right so applications where the data is not you know necessarily resident on that instance and you have to check point all the time or save state somewhere right fault tolerance is a big deal so again if you can figure out for your applications how to automate this so that when a given worker node goes down you can just spin it up elsewhere restart that job there are certain application patterns out there like Hadoop they’re just kind of inherently fault tolerant and stateless they work extremely well on spot we have super high volume customers running spot Hadoop jobs on a daily basis Monte Carlo simulations are another one that are great for this fault tolerances is almost inherent in those diversification is also important so if you can diversify across different instance types you multiple availability zones on AWS even multiple geographical regions you can have a much higher success rate of running very high scale jobs on AWS and then loosely coupled so think of spot as being the the world’s largest compute grid right so you can go out and just scavenge corazon spot for your very large runs as I mentioned before instance flexibility is important as well so if you can run on a c4 on a c3 on an m4 set up an environment on the spot where you’re actually diversifying across different instance types and managing your deployments as the prices change that’s how you can get extremely high scale and there are third-party products that can help you do that spot is different capacity pools as I said so the prices for c3 for example of a certain size and a certain availability zone may be different than that same instance type in another availability zone so customers that are taking advantage of this will use automation or use capabilities that we provide in order to examine the price and automatically shut down servers in one region or one pool and start them up in another one based on the observed prices customers that do that as I said they can operate an extremely high scale with high reliability day after day running many many thousands of course lastly I want to mention again that the price that you bid on spot is not the price you’re going to pay there’s always a market price and if that current supply of that particular instance type in that pool is is higher than the than the demand that price is extremely low often ten percent of the on-demand price of the typical full price on demand pricing last note about spot we’ve been rolling out a lot of new capabilities around spot one of those is called the spot bit advisor so now you can go to our website and rather than just making a guess about you know where you might get the least likelihood of termination and the most efficient pricing you can go to the spot bid advisor now that’s on AWS that I amazoncom in your console and you can find out for example you know ac3 of a certain size or a c4 you can filter it down what types of instances where appropriate for your workload and you can find out what the likelihood of termination is and the observed cost savings over on demand that customers getting on that instance type it’s a very very useful tool if you’re using spot at scale so to kind of end this we have a tutorial that we’ve stood up if you would like to get started with AWS you can do it at no cost you create an account will give you the credits a hundred dollars worth of compute credits which actually if if you could play the spot market intelligently on a per VP CPU basis you could actually get as many as 10,000 core hours of vcpu hours on spot using these hundred hundred dollars of credits so you can run lots of different simulations try things out over the course of a few days and in fact with this tutorial you can stand up a large spot cluster or a small one that you want to run many jobs on very quickly in a matter of minutes I’m not going to go through that whole tutorial here in the few minutes we have left but we have cards that you we can give you that have the web address so you don’t lose it or you can take a picture of this link do not tweet it the reason I say do not tweet it is because there is a link to a registration page and that registration page will only accept the first 250 requests for those credits so if you want them don’t tweet this out certainly it for a few days then we’ll let anyone who’s still in the pool go get him but this is a very useful tutorial it’s highly automated so once you sign up with your AWS cadet credentials you apply the for the credits you put them in the then you walk through this tutorial that shouldn’t take you more than 20 or 30 minutes to get through the whole thing you can run some openfoam simulations on

multiple instances right so I think it’s up to 32 physical cores 64 hyper threads of openfoam for these jobs and you can modify that the scripts and the tutorials this is based on a tool that we provide called CFN cluster under the hood if you want to dive deeper into how to deploy on to AWS go to our AWS amazon com / HPC page and look at CFN cluster it is a great tutorial it’s great set of cloud formation templates that will help you get started and understand more completely how to run simulations on AWS this is the one it didn’t want you to tweet so the SC 15 / giveaway again we’ve got cards that you can take on the way out that have this information on the 100 hundred dollars of credits that again on spot will run quite a bit of compete for you wolf kind just tweeted it it did not no worries another great resource is our AWS architectures page lots of use cases are described there outside of hpc there are there is at least one hpc architecture example in here but this is a great page page to understand how to create you know scalable web sites how to stand up dr type of environment using a VPC right how to do big data and hadoop analytics it’s very useful for a quick introduction to how to stand up these different architectures on AWS resources again / HPC page great place to go there’s a white paper there that you can download about about HPC on the cloud some of the use cases I described before as well as links to that CFM cluster page that I mentioned earlier

Uncategorized

Sucuri Webinar: Website Hacked Trend Report Q1/2016

okay my name is Kristin Thomas and I am the manager of community here Laurie and I will be your moderator we do have a few housekeeping items to go over before we begin today’s presentation first we ask that you please ask questions while the presentation is happening if you think of a really important question go ahead and post it to the Q&A box right away so you don’t forget just to note that we will only answer the questions during the Q&A portion of the presentation please keep questions as concise as possible focused on the subject of website security we’ll do our best to answer as many questions during the portion is of the webinar but not to worry we will follow up with the answers to any unanswered questions in the following days second at the end of the webinar a brief survey will open up on your screen we ask that you take a moment to complete it so we we can learn more about you and continue to offer you meaningful content we do plan to make a video of the presentation available to all who register you can expect a follow-up email and a couple of days with a link to the video and a copy of the presentation slides lastly as a thank you for joining in today’s live webinar we are extending twenty percent off of any of our security solutions until Friday please use the code that you see on the screen now hacked report 21 we have our chat representatives that are standing by help you with your order immediately now i’m going to hand it off to tummy and Daniel hey guys how are you my name is Tony Perez some of you guys remember me from previous webinars when we talked about the impacts of compromises and how has happened and with me today I have our father and my Christmas partner and friend daniel said I want to be welcome if this is his first public speaking in a while and we’re not even public or in private just knew and I since this is progress my friend and he’s gonna be kind of locking us through some of the data that we’ve collected and analyzed over the first quarter and i think is really exciting we kind of tried this back in the 2012 and didn’t get my fermentum but they know what kind of ready we’re positioned by we have better telemetry data to be able to work with so with that I’d like to turn it over to you down I know too kind of tell what must do this hey guys thanks for joining us here on our first packaging partner yeah I’m personally really excited to be here and share these stats which days on the work years of work that I team have been doing for the last few years and I mostly up behind the scenes that I value this is my first webinar so if I screw when I screw up just be nice to me don’t you help me and we promise not to get physical with this interaction no no I’m probably gonna be my last anyway so I’ll not double page watch we set in the pocket so if you haven’t rat please read it after we finish here the link is over their news light and but when you try to do share the important takeaways an insight for my to learn truths data it’s going to be same as I think this is that we feel are the most important for our Sanders I mean I think that it’s important that they get it to the document read details those know cool their own insights from that but this is specifically the things that we thought were high level enough and that things that they can make some action from it and maybe some of our own opinions in past life and will give some homework that you know people don’t like homework right thank you everybody looks more anyway told you get started first time we want to clarify what these reports about here’s the Kuti we do incident response for thousands of websites every single knife and instant response just a fence them for website cleanups most of our sites come to us when they are infected their blacklisted they have Spanish yo they have they have whatever and they come to us for help so this data is made on these sites and we work across all major website CMS’s web servers english’ types website types and we have a really good visibility what’s really happy on the overall market share of compromise sites and that’s fine we decide to show your support you should help individual my sis what’s really yeah you’re not an important point right is that this telemetry is specific based on our audience in the data that we’ve collected it doesn’t necessarily sync up 1 2 1,000,000 a marketer and how the platform fits with a larger scheme of the internet but it is enough data that provides us good correlation it is this is interesting what’s going on why is this happening and provide some thoughts around that and hopefully when you look at the data between the cobra my site you see the similarities what behaviors all these when my sis did to cause there’s like jacket it’s true and you can unfollow them you’ll probably be in a good place security wise and adding that a lot of emphasis will be placed on open source a message wrong like the Drupal’s of the world the Magento the WordPress has made that you must yes and they’ll probably able to take the most from this and before I start I want to give credit

where credit’s due this is our room adjacent loop and these are the guys that do the hundreds of cleanup service loading their work 24 by 7 give me up sites for you guys I think that the pictures a little bit leaving misleading right they actually do like their job you mean there’s a mining that this smiley yeah okay tell us another button remediations you now color then configure their military is codified into me groups one of the researchers do other are the incident responders they’re actually go to your site you know and they work side by side Annalise knew my friends and analyzing issues and make sure that together we can clean the website as fast as possible and in a way that you won’t be reflected up garnet so you can’t let us consider response team that they’re working with the customers day-to-day engaging with them collaborating fixing the security issues whether it’s infections or not and then they kind of work hand in hand with the researchers together they kind of stay ahead of the emerging threats and I’m kind of dries the kind of information that we’ve collected as well as the kind of information that we disseminate we have things like our blog and social kind of all that matter research and laughs so casual haven’t read our blog check it out the effort has a lot of the insights that you will be learned to distort the last few years perfect so now we’re ready to get started the first Nome that one point out is that we clean up house of websites every month during the first part of the year we cleaned around 15,000 sites but you only took 11,000 them that you have enough data enough organized data that you could go jip to see what was going on during the turbulent compromise and that’s where this number come from that seven th set of all the compromise sites we work on or what were using WordPress forty percent were running joomla five percent magento two percent droop that’s an intentional you think about seventy eight percent wordpress for two percent june five percent magento to put a drupal i want you to think about number because i want to compare them to the overall market share of CMS’s you see when you look at the data provide by butte with WordPress has around fifty percent of the market a Drupal has six blogger for juniors forum even if you look at dubbed attack in two other copeland attract CMS users if you like that they give word spreads at fifty-eight percent gmat 62.4 but when you compare these numbers from the overall market share of compromise sites or as a semi some people sent do has 16 yeah it doesn’t seem to make sense right yeah you know your market shares are in fundamentally different than the types of the distribution and we’re finding a North Elementary data of people are coming to us for support we would expect them to match similar posts unless one CMS is more secure than the other so does this data tell us that made me watch brands is more insecure than the others and that jewel is probably the most insecure Davao that we have two times the data you know over my sites and we’re having the overall architecture I’m going to keep an open mind on this one and I’m gonna see can already vomited I would say that yes I think they are more execute that joomla and wordpress are more insecure misfortune okay but is that really true because mr. Jenks the virgin actor right yeah but really secure when you look at the core code of WordPress Joomla Drupal they are really really secure then the developers behind his platforms really normal to do and they’re responsive and they patch things quickly they then they’re taught and and mindset these are on secret a lot of third leg what they do meet or since so why would the heck we go and um what I want to say that generally works as they seek eulogy is logic board SMS you don’t see the six you too much press corps because they were in June or I think the problem is another higher Matt I mean a higher level is how their website something deployed is how they are being managed it and hope you break extenders a mess okay okay so I see we’re going did I think you had a lot of people on the webinar right now kind of holding their pressing what did you just say I can see but I think when you see ready on Twitter bashing us and these guys I think I kind of understand where you’re going with it I think what you’re trying to say is that from a holistic standpoint insecurity doesn’t necessarily mean development or web administration but as a whole there’s a problem right so old because I think that from a core perspective and from a development perspective in the platform development process so there is security a lot of emphasis on security and the platform’s themselves or secure but because of the way we neither gets distributed the way it gets managed the way gets maintained the way its communicated and marketed audiences that makes the platform insecure and this actually becomes really interesting when you look at different platforms like for instance a wordpress in the market that they target predominantly a lot of DIY

there’s a lot of end users it’s about you know I quickly get up and running and you know they have this infamous no five minute install right and that kind of talks to your point specifically the messaging marketing specifically around platforms is hey get online quickly you continue to seek open source kind of change the landscape of online websites absolutely and that goes to autonomously don’t watch some foreign lunchbox team hit dude it’s already nutrition he’s saying I’m saying then it takes a lot longer than the five most famous five immunity store to install WordPress properly and by that I mean hard that goes to what you’re saying is that for sure watch princess is secure but to deploy which persecute them it takes a lot more yeah you can’t draw a line right you can’t sing over my developer standpoint I’m said I’m secure the rest is on your language right as communities whether we regardless what platform you have we have a responsibility to the larger ecosystem right and we have to ensure that security applies across entire tax so when we’re communicating it’s not about it’s a five minute install but it’s a continuous process you know but just think about the goal is a small small pieces of the overall absolutely absolutely I love 140 characters right kind of really restrict you in allow you to kind of get a point across enough to get and that moves us to the second number they don’t want to share I one of the only website you work on 77 word spreads I want of those entre sites for example fifty-six percent of them work out leave it and that’s the core of work resulted on joomla was eighty-five percent magento nine seven percent Rupo eighty-one percent that’s a lot of outdated sites that we know this is a really interesting insight when I first saw this and we were kind of talking about it it kind of makes you wonder what’s happening right because if if wordpress for instance where we have the most elementary data for still runs at about fifty sixty sixty percent out-of-date eight they have auto updates force update for some things and it just really simple one capsule they they place a lot of emphasis on backwards compatibility compared to some of the other platforms and their process rocket is a lot easier on ex post environment a lot of hooves are quickly to patch and even then they made up seventy-seven percent of the customers or the websites that came to our environment infected still some point sixty percent it make it begs the question what happened to some of the other platforms especially when you think about things like magento and drupal that target fundamentally different audiences drupal for instance we see them a lot of the enterprise and large organizations what do we have in most organizations we have british stringent security governance that dictates how things get it to production right and you know if if we’re suffering at this scale it makes you wonder what are the real impact so we can be making to address the problem in the middle of these numbers what it really tells me is that we’re really bad at website management because there is as part of your website management if you have insights the easiest thing to do on your site of where if you can be doing you just update because the object was is simple just click update for motor than simple and you’re still not doing that or any kind of goes to what i was saying right i mean it insecurity or a large organization we the importance of patch management vulnerability management great how do we convey this to the end users and that’s the real challenge and it’s easy for us from a developer standpoint we’re just after you so how do you tell that to the I user that has a website that fully the end is mother doesn’t look that you bring hangers on the first working don’t touch and you know they say like oh less than one percent of such great ah but are you here you got one collateral that’s not percentage I can assure you that most people talk on a percentage so let’s go back to our question we asked about for them please we express our the jewnicorn segal percentage-wise they have a lot more in the couple of my sight right back then it has on the other overall market share and I’m jelly the answer is not the course or insecurity but are they being used in security this data todos yes these are using their websites and they’re managing their websites in insecure ways they kind of go to the root cause of what’s happening in a way we think when questions getting infected and updates a message they’re not a problem they’re just if consequence the results of your badge website management again you put on managing their sites properly that’s why we see so many out did it that’s why they don’t execute on any part of the of the site life site which start with the deployment management and extending it yeah you can’t imagine a very interesting point here right you know are the results of lack of asset management right a lot of organizations just don’t have some kind of inventory layers or understand what they have I’ve talked to large organizations as you as have you were this is hey we’d love to help you how many websites you have until I otherwise you know okay really I’m no idea what I have how is an organization that meant to designer and these are limitations with good governance in place how was the other day website owner supposed to do that and then you know it’s one of those things we always have the same security right you can’t secure what you don’t know you have but then we start looking

at the applications themselves and the extensibility around plugins and modules and components we certainly can okay well we haven’t even moved into that yet we’ve only been talking about the core of the application what happens when we look at that extensibility of what’s next and that’s even scare when I look at these numbers and that’s the other number of you going to do today is that out of the overall number of water test that’s just don’t George when you Rudy the workspace and they look at our dating site twenty-five percent were spending just three buttons kingdom beretta slider in graft forms is to about seven seven percent of all over my sights I running cheaper yeah that’s crazy now is that Boston for a year now respect for and it’s still continual and I think that think an important thing to enter site here is that it’s not that dish use plugins are out of date these are out of date plugins that have known bone abilities out of day oh you know patching is designed there’s no rise in the number of things including vulnerabilities but these are twenty-five percent of them had an out of eight vulnerable version of either the script or the plugins right that’s for in Kingdom respects four years ago that one spread everywhere there’s even blood is to allow you to find me warn every other like open is warning back and it’s two people he’ll get it slow and we have a challenger we just talked about the challenges at the core out of the application now we look at some of these scripts the Rev slider gravity forms right you know Rep slider for instance is one of those plugins inevitable yeah it was two years ago but it was invented within themes and frameworks how makes a mistake time how many organization has come to us we may compromise this is the vector they’re like oh I don’t even know I had and the developers of correct form red slider they did everything right but the Mexican they want all the users the portion of our inner blogs with portion of uggs or that without a phone Treach and shoe users are not listen but I did not listening or do you really don’t know well I think it’s a combination of things right I think they don’t know or maybe it’s an innocent life versus gravity form to spend a lot of time talking to Carl and the owner of gravity forms and when he told me is like what are we supposed to an organization everything we can to get the information out to the website owners but whether or not a part of our date nights plan I love point as a business today say hey we just give it away for free but then what does that do your yeah the number to your business will you depend on sustainability license if you don’t want to pay for that so it’s a big challenge and I think organizations have to start looking at alternative technologies to address this and then disick you to if you’re already psyches is your responsibility absolutely anybody else I think a classic example this is maybe an e-commerce right a lot of organizations misinterpret what pci means and if you furnaces have an e-commerce or we have any comments issue and credit cards got stolen that these are the master cards gonna penalize asked the matter service provider which you’re supposed to be now and that goes back to how to fix this website an engine problem not a lot of things that can be doing when I want to start to something and that’s gonna be the homework for the day ok just a little that because a lot of these diseases happening because I think people don’t know what they have in their sights for what sites they have and what’s going on the environment so to start I really would like for us to create an asset list of all your site it’s pretty simple write something on a stretcher yeah simple spreadsheet you missed all the site you have one by one go to our go there named chip get all your domains disorder which ones are protected which one should be act which ones to use me yes there which one should not be pointing or lease them and then you go to the second step you list all the necessary plugins and waters that you need to be running or which one which ones I really need to be running right now anyway joke spreadsheet and then you go a little bit deeper you talk about who you supposed to have access to Eternity sites sites one should be the distributors to be developer a demonstrator just leaves each one of them and after you do that you don’t know remove everything else at the beach I just want to catch up please back up right the last thing we wanted people coming to us and say hey removed everything as you say it is but you remove I wanna see in des plaines I don’t see any demo sites or test sites which is all the time so it happens all day that’s by design hosts they allow you to create an account multiple sites and people just install whatever they want and happens all the time cross-examination a lot of lateral movement within the accounts to challenge and after that when you do you’re agreeing everything that’s left all the plugins are the sites and you repeat everyone Microsoft has an interesting concept in every second choosed among little bit bad Tuesday every windows and a demonstrator knows massachusett a hatchback is becoming this process they know when they respect you will be available they have to patch so maybe we can connect these facts choosers websites or the less frightened month or website where you go to your ass at least you’re even Tory you add

any new sites that you created you add a new bloody if I developer laughter company remover on them and go to move from the website I think this is really your physics everything is left I think this is a really interesting idea that’s a very very simple way to start fixing the website menu it’s a lot more stuff to do okay me wrong maybe the next web next we’re going Jimmy Jimmy keep what you do well that’s I just want to say this super simple stuff I think one of essence is this would allow is for not just from the basic maintenance per se an organization partner with an agency or maybe they partner with the service provider like a maintain calm or something that allows you to manage the environment they say mostly they can communicate with them to say hey this is what I have right now a lot of organizations will go to these service providers and they’ll say hey I just take care of it but they have no visibility what should be active but should be active but you can be maintained if you are it’s provide that you manage your client site you have to let you know how much help you and if you do want on your site’s that’s the perfect time to read all your site your movie to make sure you pay a little for any training them both of time Messick you teachers because I know most of you have Martin message which that’s my own clients in my George then have a listen to those elements yeah some disabled some enabled so much as a dancer and I’m very food in really care about but guess which was I always get a kid first he’s doing with a different what about oh yeah for sure I don’t know what happened it’s alright so for anything what happens if they don’t do that anyway what are the implications of us then we move to the lexus is the mall orphanage division your website gets Hackett and it gets injected with power and as I wanna talk about the first row here is back doors I almost seventy percent of every side to dream has a back door of some kind resident and you know when we go deeper inspectors and you find the command control that the attackers are using it when our eyes then we see they have this list of all their sides detect signs one two three four five and they have how attracts the back door back door one two three four five objects respiratory time is that the attackers are actually implementing good asset management a second innocence their sights the own because when they the site they miss disturb their own websites so they managers actually own when you are and the asset is kind of nice they have this site the hopes the PG version a version that’s really balances installed the losers they do anything on online faction last year on duna duna we’re seeing a text they were hacking the site to the trip on a village and patching the side frame that’s really very low way because they are on the side and they want to keep their property themselves did not anywhere else you know it reminds me up is like when you have a little brother your brother and somebody else tries to beat up in is my little brother I not you same exact thing I got it oh yeah so that if you don’t know the proper management of your site and the SS management the attacker to do for you but what you’ll do is that they will eject whatever they want the music back door is the first one then sixty percent have Gnawa which is supplied by dollars that try to compromise your visitors or is very co which actually didn’t want to I want to talk about that this number has been growing year over a year over year you’re missing more and more site to expand as you and that’s one of the things that don’t really affect your users no we just probably didn’t notice you probably haven’t noticed until we go to google and search for your site and you find if I agri cialis you know like shoes online casinos I think I think one of the biggest drivers for spending a CEO it was farming back in the back in the days you know that’s looks like yeah about three years ago there was a good study done by University I forgot their name but they estimated that the pharmaceutical kind of affiliate program generated close to 20 to 30 million dollars a year for this face because it was all impression based and so there’s a lot of incentive or black hats to be able to go in fact what’s actually really curious about this is that when you look at the statistics were on black lips whether it’s ABS for search engines this isn’t accounted for the only thing they can afford the things that have a nefarious fact on the end user or user curator at the end point because all these black lists they care about the insulin okay so when you go to vista site if the site has a chance of compromise new than the Bible at least the site but if you just expand as you the ones affect you as a visitor they will occur however google being in all the major section you see the SEO and that’s where to put a bit unlike unlike some of the other infection types that affect engines like you’re saying Thomas sales one of those things that affects the business owner in the website the webmaster right because this is you know it will affect your search rankings or search engine result pages or lose wrecking it you know you’ll lose your domain Authority grant can stuff like that it it has an adverse effect on that economically and potentially in other ways as well and most of that I actually really like to think so it goes there there are additional injectors where they only display this pen when it’s coming from or it’s coming from big if you visit and into a view source Oh looks good I think it’s a mistake that’s always a mistake

but when you actually go to Google my dude and you check your site watch girl is their target good no and it’s actually interesting so we’ve seen Spanish zero grow but we got this in the end the inverse of that with other platform or other infections I think if they see faces the faces are actually one thing that we see going up five to ten years ago I remember every time I saw a sideways hackage they have to do that defects I just hack your site your security sucks that’s the type of message do well now we rarely see there anymore and the main easily fix that before a lot of the heck images for political personal reasons I’m gonna show off my skills that I hackish whatever you’re talking from experience right okay well no this is all about money that means you yes and then adding this exactly right if we think back maybe 585 10 years it was just about the environment was fundamentally different right now we have things like Buddha Jefferson now we have malware as a service if the economics in the industry are much higher so there’s incentive to go after the stuff an expression because the other key points the defacement they actually tell you about my so that they’d be higher success all right as soon as the visitors side they see the go hiking they want to develop now when they look at Spanish you they’re trying to hide for the Masters a lot of them you drive-by downloads the actors try to hide from their much to think about magnetizing and I’ll call difficult understood in fact where you necessarily compromise the website will be distributed be an ad that you think is in the way for them to generate me but it’s impossible to replicate in some instances their goal is to maximize the time taken in your size Sullivan and also hexham eyes what they can do for them if you look at these numbers on the more advanced year over here you see that they don’t add up join represent it means that when someone hacks your site and use a dis- you you’ll be much a subjective initial you also use it for fishing from mailer maybe to Diplo’s others that’s interesting so it’s not mutually exclusive right so back just because you have a back door doesn’t mean you don’t have fishing or doesn’t mean that you have SES fan in fact the inverse of that right where you can actually expect seventy percent of websites to have a back door that allow it to bypass your controls while still distributing model or potentially being used to fish and you know spear phishing attempt to gain entry to an environment or some kind of information they kind of kind of combine those information together right because not only did this bad cut into the proper as a management they also really utilize their resources well so if they own and resource they’re going to really fully realize that and that’s what these numbers houses and as you can see year-over-year fix has the critical systems no oh in some areas right like for instance you will look at spam SEO we kind of see growth there and actually the biggest growth has have occurred there over the past years I think that talks to the potential economic return but we’ve also seen the reverse of that happening on the defacement that everything else has been pretty consistent in fact now it would actually be looked to be increasing from forty percent to close to sixty percent of our distribution so it talks to how the land and the website specifically are still like the number one distribution mechanism for things like traffic downloads and that’s pretty much eat that I want to share was this key main key points that I can show you guys and the one that really want you to think about enter your homework now is about start of desert it if I what you do on when you think about website that what you do when you think about sick to the point what you start I want to think about that in this do you think radius in this website start by that no probably on the next one arms will be adding more and more to you know maybe interested um maybe we’ll see kind of who’s listening and maybe they can send us their information and we can take a look at known to review what I’m talking enough okay we’re talking maybe a simple spreadsheet or maybe just some basic questions and if it’s valued when we see this information Roberto burger will share more information how to kind of continue for that when you think about Oh black security who failed very complex ways when did they fix simple yeah they just were perfect well thank you so much me I think this is great I think the other homework I would add is that we encourage everybody to go to read the report and ask us any questions and share it and get the information out because for us it’s all about education and awareness and then I want to kind of tie a little bit on maybe ways to think of security how do I take information the diner just shared and make something actionable out of it or how should I be thinking about security I guess the first place I always like to start this kind of security as a continuous process right it’s not a static state and I knew this is childish rector don’t absolutely it just carve our continuous process and is it’s hard right this is a business order you know you always have challenges market is a continuous process the other containers process but guess what security is no different how are you update your design is the same as how you up in your security is the threats continue to evolve we can both win your site and more weighty any Content exactly like oh yeah no pricing changes no product changes no no change to the demographics of your audience’s personas same thing and the other thing to think about is that our attack

service is much larger than the specific application and we’re running whether it’s wordpress joomla whatever the case may be and in fact it’s more complicated like you look at it’s not just about the environment which has since been socially the environment in which and how you engage the web the web right receive as a pci for instance yeah he says we need to close bc on his focus on ecommerce sites the best predictor and when you go to the design requirements they go over everything they talk about your physical security they talk about your networking they talk about your servers and they talk about your medications and they also talked about how to connect them you have it so even your laptop that connects to the CDE which the credit card data environment has should be on compliance and I mean I visited because 1 plus it’s a chain Institute my fortune breaks and I can get away right and I think that’s an important thing for every website owner to understand is that everything is interconnected right you may have the best configuration at the server level or I may be at the application level but necessarily mean that you’re secure locally maybe you’re running a windows XP box with no antivirus Holden end and you spend the evening surfing that knows what website to look bad knows what right and then of course you wake up in the morning it’s that the gift that keeps on giving and so I think a good way to think about this it’s kind of like a five pronged approach right so we always talk about obviously we want to protect and insecure you explain we always have this debate between protection of infection which one is better than the other and we like to think that a bit small and cozy it’s cool it’s not one or the other so you know how are you protecting your environment how are you detecting in the event of protection fails or some other indicator occurs then they lead you to notice things happen just because we just saw how complex that changes and then of course what’s going to be a response solution for this and this is I think very common in the security world and it becoming more common web cell role but we like to extend that include things like make this administration and the reason we emphasize this is because for the everyday website owner they don’t necessarily place any emphasis on this we talk about that so the find an additional and nobody really talks about babies administration because it’s not sexy it doesn’t generate sales you know I mean I but then let’s talk about best practices things like hey every user it comes to the website is an administrator role well I highly doubt that all administering that website right think of thinking things like least privilege and defensive death I think those are really important matters in terms of operating system security would be training you absolutely get better nobody wants an open up my logs into your Luke surface which we walk into the user and if we need to run item shredder commands you like pseudo I mean to suit the route now when you go George pres I see every single user logins that at me even if I different username would you administered a row well you’re going to think right make yourself internet for 20 years or more in some instances but when you look at how young the web ecosystem is specifically open source CMS is like the Drupal modules in the magenta in the WordPress it it’s fundamentally different you know me um I usually joke the great state of CMS’s how people use them it’s kind of like the windows 95 yeah everybody administrator everybody was in the curl we just have to change its mind textile more about and I think a good way to start with that is you know we need to really stop looking at technology yeah as that silver bullet or like that easy button you know it’s like and I just install this plug-in or if I just install these 10 things I’m safe and secure it’s not necessarily like that right because securities a lot a lot more complex and involved process its people process and technology working together harmlessly to get the information out you know making a lot of websites let missile in the majority have a security blow themselves was an enormous all of them yeah wife you’ve been running in this you get Hackett it’s because they were Marshall just thought about maybe if I strongly Street buggies I would fix mr. George but they forget about the process that people anything else it was a rock secure is it I always like to joke right it’s like going and buy a piece of hardware firewall put it into my network and say hey I bought this IRA I put into my network I plugged in and I turned it on and I left and I’m said I’m secure yeah you’re not everything in the world you haven’t configured if you have an evolving LOL ruler there you like i don’t understand i use essentially going to put a very magic door with the door wide open we had a user to respect that’s to borrow a plan didn’t EDG site in excess didn’t feel anything didn’t communicate with us he just purchased and said I’m sure gonna look at is accounting do anything well buddy bought at your service of it you have to use it by using it embittering are fundamentally different right and so we like to remind everyone that security is not into it yourself project contrary to what the platform you’re using and what the communicating it just isn’t and we’re seeing this more important i think this data accurately reflects that saying hey it definitely is not and so here X coolly what we focus on is providing a comprehensive package where we work in conjunction with website owners it’s not a matter of come to us you never have to think about security its work with us let us be your complimentary source and will provide

you three core pieces of a nicole point dildo otherwise well they’re not if somebody says we will keep you a hundred percent security you have to do absolutely nothing it’s full of right it’s not going to happen right because it’s a collaborative process we can do a lot of stuff but we can’t do everything it’s a will provide protection where we’re trying to mitigate explication in 10 attempts against vulnerabilities we have a continuous monitoring or detection engine that’s looking for potential indicators of compromise and then we also provide a professional response team or we’ll go in and address the actuation our goal we should give you all the solo have a secure site the tools are they anything yes of course and if you solved it a few of them and we will help you will figure it will help you stall them then we just require communication you know needed it has to be some level of understanding from a website owner itself and so with that reminder that we’re playing a discount for anybody that’s coming to up to twenty percent through Friday and you have any questions I’d like to turn it over to you Kristen and see we can help out to anybody’s uh burning questions you have a lot of burning questions so we will give a few to you so that you can respond so the first let’s go to this one how is it that my hosting company was unable to detect my site being hot hecht but I and Google was able to locate it even after the hosting during a site test run they were still unable to detect it yeah the first things that hosts are not secure tolerance they know hosting really well so there’s no security well especially when you’re going deep into the website they are not a much higher score that case catalyst so then missing a mower or missing a fish that’s a passionate home all right and a lot of it actually yours server level scanners like clan Navy or the mower absolutely they’ll find server level now or they won’t necessarily find web based power which is kind of why companies like ours exist with that being said we have to remember the intent of organization so for instance a host a host intent is from a security standpoint is for the perimeter and their environment which is why they provide you an account one of the biggest challenges that hosts have is how do we address the end-user problem because if I give you an account and I allow you to do anything you want because that’s what you paid us for how do I address that problem it’s been a long time since we’ve seen mass compromises in ensuring host what we see is compromises in a calyx on sugar so I purchased account of Blue Hose I install 150 different sites on there and then I get infected so in terms of our rules 10 as well like they give you the physics acute and I give the service acute that is this face what happens within your space as your responsibility never push the mental is it’s gonna look for example if you have a desktop on your network and you are all focused and you have a virus on your desktop that’s not good guess you know you don’t plates are gossipy like why I don’t give you the network independence and jumps to the specific question and why could it be detected you know we have to stop thinking in absolutes right security is not an absolute and so for instance it depends what was the infection so for instance of Google detected as SEO Stan we just talked about our sales plan when we could talk to how difficult it is to the tag a lot of organizations again focus on the system the administration providing you the environment they’re not security companies they’re not designing to a target of stuff and Google’s resources Dwarfs everyone else’s right and remember we talked about conditional payloads malware specifically designed to target google so Google may see something that nobody else is going to see and let alone a mower twice your height absolutely they are heavily coated heavy have any hope we’ve scared so and I can assure you that the 3 99 you paid that month is not paying enough overhead to provide you the security services that you might have been expected megabit unaware up but are expecting them to last because of just not knowing so with that person what’s your next question alright next question is if you had any insight into the purse of hacks that you found it versus those that were more automated you can you ask the question one more time you had any idea the percentage of the heck’s that you that you know happen in q1 were targeted packs meaning you know targeted to particular business or website or how many were just automated from the platforms and plugins being targeted yeah that’s a very good question so we’re going to invite more information in the future and attacks and how they happen but for those that attended last webinar where we talked about how hacks happen we would say that what great at ninety-five percent of the attacks me see or I very rarely do we see targeted attacks and those target attacks are usually either a big brand or big enterprise it’s very rarely limited as a group against as yeah somebody has to stop so like first of Sakuni we have a lot of targeted attacks because a lot of black hats don’t like us for variety of reasons but I can assure you that I’m a jury the websites in there owners out there don’t suffer from targeted attacks most of them are hey for instance temptem is a perfect

example there’s 1a village-based they like it has a wonderful ability they can exploit like no no no or a collective early most of them have multiple they have kingdom rubs later repped Forbes whatever their have a big list and then go against every site again trying all of them and the bill problems and I’m here to protect against about 99 percent if then use one they’re gonna hack to that next question Kristen alright next question is is a thin hack the same as an SEO spanning we used to fame hacked and customize it but discovered years later that the thing was not up to it what does that mean so that’s the first part of the question again is this set so the first part the part of the question was is a theme hat hacking the theme the same as SEO spamming well some hacking the team can be used for advising except so they can hack your team to jettison span they can hack your deep to right right by dollars or even fishy so happy the Dean is just where the way they try payloads hidden he’s not dead the detection it’s not it yeah exactly you know the way that we’re thinking about it incorrectly so whether they have to fame or the half the core or where they have to plug in it’s a hack as a hack it’s what they did once they were successful whether the distributed SEO spam or maybe the deficient or mitigated Mauer then they took some action the distribution of something regardless of what they actually have the theme just happen to make sense because they probably adjusted the function or the header power the footer following it was just load with every every time the team votes and there’s lots of a mower that even hikes on a Texas low before which was loads or they even a cure it back mrs. Turner whatever reminder so each before we can even see what’s going on the pitch Blair holder words with level so I think the important thing to take away is that SEO spam is the result of what they did once they successfully compromised the environment the same as normal distribution or fishing it’s all part of the same our families it doesn’t necessarily matter what they ended up hacking whether it was themed plug-in environment whatever the case may be ok the next question is when a website is compromised in your experience does the attacker routinely login to install additional software manage it create other users etc or is this an automated process done strictly through BOTS or other means it’s mostly automated now tomorrow when they do for example if you look even at the ref or misrepresent inta they do the first thing they do is upload a vector and from there is where they actually compromise to the back door they inject the payload to su expand my knowledge but then really the first step is always a bit yeah that as user friendly don’t you think i’ll add to that is that while the attacks themselves are automated i think you could also say that attackers do have a tendency to install either users or plugins or think but the fundamental difference here is that they don’t necessarily do it manually the automated so it’s the same exactly it’s we’ve seen attachment where the login to say a platforms admin panel and we can see the lodge or immediately within like half a second they’ve already modified the post their modified visit XO an automated right you don’t see somebody clicking that passage is practically impossible a majority do when you have thousands of websites excellent it’s not feasible for you to make me one by one yet so you there’s there most of them automated and yes they do install users and do install plug-ins they do make configuration changes but again all automated okay and then we’re going to take our last question and it is that most clients don’t want to pay for security management what do we do we wait till they get compromised and endow pega no but you can you start the sea voyage like working with the asset management there you can do a lot of the stuff for free from pain you can sell the asset management dragon a spreadsheet you can stop one sick huge it’s open source and free on your server and you can stall with Norwich another gasm you can go all out of this stuff for free you can do want with with the free open-source passaic it’s a lot of something to do for free yes you know how to do if you are thank you enough you understand how to manage your server you understand how to start configuring it much your job David then you’re fine you don’t have to pay for yourself absolutely but if you’re not back into enough and you will know how to this fix if you try to make mistakes oh and wobbly make things even worse yeah I mean facetious when I say wait till we get compromised but that is actually the reality there are a lot of things that can do they can go and they can read articles they can configure all these tools but the fact there’s a lot of organizations aren’t going to do that they’re not going to invest in on time so it has to come a balance the organization has to weigh the risk to take ok I’m not going to pay in my experience where I found that is at least when we talk to end users or own websites it’s they usually recognize the value of it once they feel the pain it’s

very difficult to convey to them the impact of that um if they haven’t felt it themselves oh I’ve had illicit for 10 years and I never been hacked well the landscape has changed a lot in 10 years even own a house and Sophie breaks you do a handy in order to do stuff you don’t you can wait fix or celebrate you for like me you’ll have to hire solo because every time I tried I cause you make it works I did my second part well I spent money and time doing and I just a million time I nutrifix I remember right it’s not a static process not a static page is it a continuous process so you know if ya configuring the tools and applying the configuration hardening is but one piece of I now you have to stay ahead of all the threats and continue to make the changes to the configuration as the environment changes so don’t pay if you don’t like that’s fine and then it should hopefully will work out perfectly if not you’ll find yourself in a predicament where you’ll be suffering a lot of anxiety and stress because of an action that could have been addressed practically so with that um no more questions right now rechristened that was our last question um so that brings us to the close of our webinar I want to thank everyone for their participation um are you guys going to go to Twitter and take some more questions yeah for sure we’re always on Twitter so come after the hashtag IC booty will continue to answer the best we can and like Christmas a will be responding to all the questions the best we can hear the next couple days be mindful is Daniel and I actually responding so it just might take us a little bit of time and also we want to remind everyone about the twenty percent off promo code hack report q1 which will be valid until friday and just keep a lookout for the recording of this email and I’m sorry look up keep a lookout for the recording of this webinar in your email as well as the answer to any unanswered questions thank you and Ray J thank you boy

Uncategorized

Britain's Worst Weather – Flood

what do we talk most about in this country to create British weather we’re obsessed with it it’s virtually a part of our culture but sometimes it can turn from being a mere talking point into something deadly in Britain’s worst weather I’ll be investigating some of this country’s most extreme flooding events to find out how they’re caused and why they’re so dangerous flooding is devastating and as a lecturer of geography at Oxford University I want to assess the scale of the threat to Britain in England and Wales alone over 5 million people in 2 million properties are currently at risk as global temperatures and sea levels rise so will the numbers of those in danger flooding is something we as a country take very seriously it’s probably the biggest impact our weather has on us in the UK so I’m on a journey around flooded Britain to examine how this most destructive force of nature threatens both property and people from the ceiling torrent in bas castle which swept away over 115 cars to the flooding in November that paralyzed Cumbria but it’s not only rivers that flood a far greater threat lies all around our coastline in the sea when whipped up by hurricane-force winds a tidal storm surge could one day race up the Thames and drown London Thames Bo’s been used more often this ever been before but it’s getting to the stage where is virtually past its sell-by dude one of the most dangerous and destructive forms of flooding is the flash flood it can come to your town apparently from nowhere within minutes what seems like just a heavy rain storm can turn into something that’s actually very dangerous in just a few hours sewers can overflow and rivers burst their banks causing misery to homeowners caught in a storm like this it’s tempting to take cover in your car oh that’s better although actually is probably not but potentially made a fatal error in floods more people drown in their cars than anywhere else it takes just 2 feet of water to sweep a car away and once it does there’s virtually nothing you can do okay Cat Cat I said stop well as you can see we were cheating this was just a simulated flash-flood experience but that was bad enough I can tell you this research tank at HL Wallingford can only get a fraction of the power and energy of the water in a real flood the sight of cars being tossed around in a raging torrent may look like a scene from a Hollywood blockbuster but this scenario became reality for a small Cornish village of bas Castle in August 2004 this was one of the worst flash floods Britain has ever seen it took just eight hours to turn this picturesque village into a scene of total devastation at 12:00 noon a cloud burst of biblical proportions began over Boss castle and by the end of the day it had dumped nearly 8 inches of rain straight into its main river the valancy bursting its banks this normally innocent looking stream sent an estimated 440 million gallons of water cascading down into the town as the first of many cars were swept along the Main Street it was obvious that a major disaster was unfolding the volume of water that went through Boss castle was about a hundred 80 tons per second that’s equivalent of the Thames flowing through Post Castle at 4 o’clock the BAS Castle visitors center became totally cut off by the rising water trapped inside an attic

space were two families with young children the center staff and their manager Rebecca David lost Castle Vista Center the children were beginning to panic so I started seeing the wheels on the bus and we got to the wet wipers on the bus go whoosh whoosh whoosh and with that there was a huge crack and whole building shook a huge tree had destroyed two-thirds of the building miraculously the only part that remained was the small section of roof that Rebecca’s party including the children were now hanging to luckily the way the roof fell the slates fell down and so protected us at 4:45 the first of seven search-and-rescue helicopters were scrambled all over boss castle tourists and residents were trapped in buildings totally overwhelmed by the water 90 minutes after becoming trapped one by one Rebecca’s group were winched to safety lower the helicopter was actually overloaded but they knew they had to get a saw by eight o’clock that evening eight hours after the rain had started and just five hours after the river burst its banks the water finally receded it was only then that the true cost could be counted but with the harbour containing over a hundred cars no one knew how many bodies when you look to all the divers in the river and down in the harbour you kept on thinking any minute now they’re going to be pulling bodies out of these cars three days later the emergency services announced what has since been called a miracle astonishingly no one had been killed boss castle has a history of flooding and each time a number of meteorological factors come together to cause the chaos essentially what happened was we had an awful lot of energy in a cloud system huge amount of energy in 2004 the remnants of a category 2 hurricane made their way east across the Atlantic and when its moist tropical air collided with the southwesterly winds over the Cornish coast it created an enormous thunderstorm the clouds were effectively held over boss castle for several hours it wasn’t like a storm which moved through which sort of drenched the place for a few minutes then moved on it was this continuous to help all this deluge dumped over eight inches of rain in just eight hours on to Bodmin Moor source of bas castles main river the balance e once the downpour started the more quickly became saturated and water poured straight into the river this falls from a thousand feet to the sea in just a few miles giving the valancy enormous energy and power water is driven by gravity and water is trying to find the lowest point and so boss castle acts like a funnel and all the water effectively comes down that funnel into the sea flash floods are virtually impossible to produce boss castle was caused by intense localized rainfall that current technology just cannot foresee the fact that no one died is more down to good fortune and blind luck than adequate warning and with the risk of flooding on the increase in this country more people are going to die in 1952 the Devon village of Linn Mothe suffered a remarkably similar flash flood but this time it occurred at night and with no rescue helicopters available the result was a tragedy like Boss Castle nimeth was a disaster waiting to happen a popular 1950s holiday destination it had grown up along the river line’ but lying at the bottom of a steep ravine this place was in the firing line when on the night of the 15th of August 1952 the river burst its banks 34 people died most in houses that were simply crushed and swept away 1,700 feet above the village lies Exmoor a vast expanse of well over 200 square miles this more is the source of the river line’ the summer of 1952 had been abnormally wet and the moor was already totally saturated the water from the storm that raged all

night had nowhere to go but straight into the river and down into the town the mood effectively just says look huntin off we can’t hold any more water and it floods very very quickly doing what are very very shot rivers and very narrow rivers as well so the force of the water is very powerful back in 1952 this river channel was less than one third of its width today houses and hotels had been built right on top of the banks hemming in the river on the night of the flood over 300,000 tons of rock brought down by the ranging water smashed through everything in its path the lower part of the village was almost completely erased and it would take six months just to clear the debris linworth is now a changed place and has learnt from its mistakes the river channel has been widened and straightened to give the water a direct path to the sea but all over the country we’re increasingly building closer and closer to rivers in doing so we’re storing up misery for those who find that their homes and businesses are now under threat from rising waters flash flooding can be catastrophic and devastating but more damage is done by the slow steady rise of a major river bursting its Bank autumn 2000 was the wettest for 270 years October the 9th so the beginning of the most widespread flooding Britain had ever seen 10,000 homes and businesses were flooded in an extraordinary 700 locations during the next 90 days the Environment Agency would issue over 1,400 flood warnings countrywide everywhere from Lewis on the south coast up to newcastle upon tyne the picture was the same as local residents tried in vain to save homes and possessions from advancing flood waters we had to foot in the house and we’ve got about seven inches I expect now one of the worst hit places was York where the river dues peaked at an amazing five and a half meters above its normal level causing millions of pounds worth of damage to the ancient city the emergency services the army and the general public found themselves in the frontline on the night of November the 4th when 900 people deployed over 50,000 sandbags to hold back the river this Herculean effort saved about 5 and a half thousand homes but these were the lucky ones in Hampshire a motorist died after his Peugeot car was swept away in a flooded stream in total five major rivers recorded what were record flood levels causing an estimated 1 billion pounds worth of damage my journey to investigate some of Britain’s worst flooding took me to a city that was completely drowned by river bursting its banks with a population of just over a hundred thousand Carlisle has grown up on the floodplain of the River Eden in January 2005 as the eyes of the world were on the devastation caused by the Boxing Day tsunami Britain had his own crisis in Britain’s worst weather I’m looking at not only how flooding is caused but also at the huge damage it leaves behind January the 8th 2005 the river Eden burst its banks devastating the city inundated about 2,000 homes and businesses Carlile lies at the end of one of the longest rivers in England the Eden flows 490 miles from its source in North Yorkshire and by the time it reaches the city it is meandered through a huge catchment nestling between the Lake District and the Penn Iams over the years the rivers natural floodplains have been drained and used for agriculture now when the river rises the flood water has few places to go until it reaches the city the weekend of January the 8th 2005 had been exceptionally wet with over a month’s rain falling in just 36 hours and we’ve had whole succession of strong areas of low pressure large depressions lots of southwesterly winds piling up water and rainfall over the hills over the long period of time but what has to go somewhere and essentially the the body is give up and that’s when you get the city’s flooded places like Carlisle that have grown up along riverbanks have a

long history of flooding 1857 1856 here so – pretty close together yep this is where the 1822 flood was that was the biggest record if looked previously but astonishingly if the flood in there in January 2005 was considerably higher it was up here but still not the what up that’s right when you can see that that was the biggest known flood up to them we toured it and I not plus another meter oh it’s more than a meter yes that’s right yeah Glyndebourne was the Environment Agency officer charged with investigating the flood some of his findings are startling how high did he get during the night it came to within a metre of the top of the arches of the bridge here yes some water how much water and well at the peak was 1600 cubic meters a second coming through the tower that’s the equivalent of six million tons of water per hour that’s six billion litres of water six billion billion litres of water per hour going through the streets of Cairo overnight on Friday the 7th of January the river rose nearly 15 meters when dawn broke on the Saturday the full scale of the crisis became apparent as residents realized heart their city center had been drowned the whole of Carlisle woke up absolutely stunned to see millions and millions of gallons of water and a lot of it was very Brown thought it was surging contaminated water and we just looked and thought is this happening to us the whole of the area where we’re now walking was completely submerged and certainly that morning like everybody else in Carlisle Paul Hendy had a hard time comprehending what had happened I remember coming down to near the hospital and there was an ambulance stuck up to half of the ambulance flooded with water and you just thought wave this is this is a story we can’t remember this for a few years to come the breakfast show with Martin plenderleith good morning to you sir Martin played the lead with the breakfast show on BBC Radio Canberra till 9:00 Martin was on duty the morning of the flood and his show was proving very popular we had listeners like we’ve never had before really the number of people who came to what turned out to be the only show in town because the commercial station lost power the local television station lost power for the fire station was flooded the police station was flooded Carlisle was facing a major emergency half the city center was underwater thousands of people were trapped in flooded houses and the electricity substation had been inundated by the river leaving a hundred and ten thousand homes without power as the only form of information radio Cumbria had become a lifeline and even they were working by candlelight the temperature was barely above freezing all over the city the emergency services had to brave the icy water rescuing people from flooded homes what is in the Mill Stream as the buddy engine not work in a salon when the water finally receded three people had lost their lives and Carlisle now had a population of six thousand homeless citizens who had lost everything suddenly you find sewage floating around your home all your possessions is now contaminated everyday things we take for granted suddenly gone and never ever replaced it was just two years later in 2007 that the country saw similar scenes whole towns and cities were submerged coastal defenses were stretched to their limits thousands lost their homes some lost their lives 2007 was the wettest May June and July on record and Britain suffered an extraordinary year of floods three billion pounds worth of damage forty-eight thousand households affected cities were fast turning into lakes the June rainfall that deluge towns especially in Yorkshire was unprecedented some areas had received 300 times their normal amount of rain but where it had come from Britten’s weather is driven by a system of ocean currents and high-altitude winds that encircle the globe the

dominant force is a 100 mile wide channel of air at 36,000 feet it charges above us at 300 miles per hour this is the jet stream the jet streams position over Britain is responsible for weather we experienced a high pressure and sunshine or low pressure and grain but in May 2007 something unusual happened the jet stream changed course it ended up 1,500 miles south of its normal position this through Europe’s weather off balance for the whole of the summer in Britain that meant being trapped under low pressure systems for months basically we got on the wrong side of the jet stream so instead of the jet stream bringing us nice warm settled quiet weather it brought us low pressures frontal systems thunderstorms and as we saw lots and lots of ring by July 2007 the jet stream was fixed to the south low pressure and an unending rain followed Britain was about to be tested by a second wave of floods what’s this little lot it’s beginning to run itself up and set itself into very thick line of constant dream another set just coming in across the south east and then gradually it forms the solid mass of heavy rainfall pink pixel that’s some very heavy rain fight pixel that’s even heavier read the floods were Britain’s costliest ever natural disaster with a hundred and eighty thousand insurance claims in 2007 Britain was truly under water it’s always the north that seems to get hit the worst and it wasn’t long before the North West was hit again this time it wasn’t just one city so much as an entire County Cumbria in November 2009 floodwaters swept down the Derwent River in Cumbria and through the towns of kazakh caca mouth and Workington it’s estimated that 1300 homes were flooded several hundred people displaced and a thousand households left without power the major rescue operation took place involving police Coast Guard Fire Service the RNLI and the military 200 people had to be airlifted from Cockermouth they’d been completely cut off by the waters which were nearly 8 feet deep in some streets the flood was so powerful it destroyed stone bridges that had stood for a hundred and fifty years what there is a highly destructive substance if you give it a lot of power a lot of energy and you get vast quantities over able wash away most man-made sorters 140 year old Northside bridge in Workington developed cracks early on November the 20th PC Bill Barker was on the bridge redirecting traffic away from the danger he couldn’t escape when the bridge collapsed without warning his body was found 10 miles downstream the calva bridge upstream from Northside Bridge also began to crack and was condemned a third bridge the footbridge also collapsed leaving the residents of Northside cut off from working to me in total six bridges collapsed under the extreme force of the floodwaters and another 1,200 had to be checked for structural damage so what had caused the inundation of Cambria in the hours before the flooding started the UK’s heaviest rainfall ever recorded in a 24 hour period fell in the county the prolonged downpour was caused by a conveyor of warm very moist subtropical air when this mass of warm air was blown by high winds over Cumbria’s mountains it cooled causing the moisture to condense and fall as heavy rain what made matters worse was the weather system got stuck over the heart of the Lake District this is home to seethe wait on average the wettest inhabited place in England in November

the prolonged steady downpour which triggered the flooding lasted thirty four hours and deposited fourteen point nine inches it takes eight months for that much rain to fall in London ever cumbria the rain fell on the mountains and entered the region’s rivers which were forced to carry exceptionally high water loads causing a once in a thousand year flood the ground was sodden and the rivers overloaded the water had to go somewhere and the results were devastating the residents will be repairing the damage for months to come Cumbria and the Northwest have a history of heavy rainfall and flooding but the worst danger is in fact in the South imagine it’s 2030 and Londoners have just been warned that a tidal storm surge is racing its way up the Thames Estuary it’s expected to climb on the back of a high spring tide and overwhelm the tens barrier seven meter steel gates by a massive two meters with up to 65 square miles underwater the nation’s political and financial capital will be paralyzed one-and-a-half million Londoners who live and work in the flood risk area will have to evacuate the roads will be gridlocked the tube system lost and there will be panic in the streets the central London’s gained here without heat without power without sewage without water without gas there will be tens of thousands forty hundreds of thousands of vehicles to arune and damaged and there will also be widespread loss of life fortunately this catastrophe hasn’t happened yet but some experts believe it will in 1953 thousands of people on the east coast of Britain felt the full force of a storm surge these seems doomed from a panic as it plain show the full extent of Britain’s latest web disaster for mile after mile along the east coast the sea has broken through the defending wall for mile after mile it turns the countryside into a picture of desolation the sea has conquered On January the 31st 1953 a vast swathe of the East Coast from Hull down to deal in Kent was inundated by the North Sea breaching over a thousand miles of Defense’s flooding twenty four thousand homes and killing three hundred and seven people this flooding was caused by one of Britain’s most feared and dangerous weather phenomena the storm surge a storm surge can be described as the ocean coming ashore the ocean climbing onto the land it’s also been described as the tide that doesn’t go out the surge is caused by an area of very low pressure being driven across the ocean by hurricane-force winds the low pressure means that the sea swells more than usual in January 53 the deep depression caused the sea to rise by nearly half a metre combined with strong winds and a high tide this forced waves into a gigantic hump of water that raged round the coast of Scotland and down into the North Sea now that’s very dangerous because the North Sea both narrows and shells that is it gets shallower towards the southern end so it has a funneling effect and as it moves down the North Sea it’s squeezed and it gets high and starts to move faster and the Coriolis force that’s the force of the of the earth turning throws it against the east coast of Great Britain the sea rose nearly three meters above the normal high water mark all along the coast defenses were either destroyed or simply over topped by the giant waves one of the worst affected areas was can be Island which sits exposed in the mouth of the Thames Estuary in total 59 people died that day on can be half of those had succumbed to exposure while waiting to be rescued today the island has one of the best sea defenses in Britain these walls were built as a direct result of the 1953 flood but they remain under siege from the sea some estimates suggest they will need to be at least double their current 7 meter height by the end of this century the tragedy of 1953 which also killed nearly 2,000 people in Holland prompted the British government to take the threat to London from a tidal storm

surge seriously although London is currently protected by one of the engineering wonders of the world many are worried that it’s not a question of if but when a storm surge may hit the capital if sea levels continue to rise in a warmer world this premise could be more than just the plot of a disaster novel Richard Doyle spent five years researching this threat and is convinced it’s not science fiction I’ve talked to everybody particularly the emergency services and the Environment Agency and although on the surface many such people will tell you oh it’s quite all right we have the barrier privately they all admitted to me we are absolutely terrified because the barrier we now realize is simply not going to be up to the job a combination of melting ice caps expanding oceans and the fact that the South East of England is sinking means a sea level rise of at least half a metre is expected in the next century if you add this to the continued threat from storm surges combining with high tides it becomes apparent the thames barrier will not last forever the design criteria that the barrier had to me was to protect against a 1 in 1000 year flood event in the year 2030 hmm it is there a risk of a sort of over reliance on the barrier or is it completely foolproof as far as the tidal threat is concerned then the thames tartar defences protect a world-class level of protection and therefore are able to say basically that the threat from the tidal flooding and has been removed but not all scientists put so much faith in the barrier terrace bodies being used more often that’s ever been before at the moment is holding its own but within 25 to 50 years is difficult to see if it will still be adequate it’s getting to the stage where is virtually past its sell-by date this is the threat we have to take seriously as the potential consequences are terrifying big floods have happened in the past and they will happen in the future the blue bit of the map is the 10-meter contour line so and I think blue would get it in a big one Waterloo station Victoria Station and Westminster Abbey has the Parliament guys Hospital CIN Thomas’s hospital financial services Canary Wharf the City of London all in the firing line London is overcrowded we need more homes schools and hospitals one place the government is building some of this infrastructure is on the Thames gateway this is an area of land that extends eastwards from the site of the 2012 Olympics in Stratford it’s cheap it’s flat and it’s right next to the Thames if you were to look at the map and think was the best place in Britain to to develop you certainly wouldn’t be looking at the Thames gateway area in in East London Darren Johnson is Deputy Chair of the London Assembly’s Environment Committee he’s concerned about the continued expansion of London onto the Thames flood plain much of the development area is floodplain and whilst I’m not saying let’s not have any development at all that would be ridiculous what we do need to do is look seriously at some of the high risk areas we’ve got time but we’ve not got much time to really get serious about about the flood risk issue the government is currently in a period of consultation for London’s flood risk management for the next hundred years and one of the proposed options is a new barrier further downstream in the mouth of the estuary but with conservative estimates putting the cost at nearly 10 billion pounds perhaps what’s really needed is a change in political asked you towards the threat that Britain faces from flooding it’s such a long-term project that it needs a national consensus to do this better to spend 10 billion over ten years they have to face the loss of 80 billion overnight my Torah flooded Britain has shown me the scale of the threat we face from flooding over the coming decades whilst we can’t do much about our weather we can do something about the desire to build on River and coastal flood plains it’s a delicate balancing act but one we have to strike if we’re to protect ourselves from the very real dangers

that flooding presents it’s like being in a casino it’s as a gamble on our future the odds are long but the chance is there and the stakes are very high you

Uncategorized

5.16.19 21st Annual Top 10 Tech Trends

>> Our mission at Churchill Club centers on strengthening innovation, economic growth, and social good Something we value as an organization is the role of art in upholding human creativity, which is so crucial to the work that all of us do Tonight, we are thrilled to debut a new site installation by our artist in residence, Pamela Davis Kivelson Pam and her collaborative called Neur-on are breaking ground by creating an artistic dialogue with artificial intelligence With this installation entitled Breadcrumbs, the portraits you see on the monitors and screens were created by an AI that was exposed to a training set encoded with aesthetic judgments of the artist The AIs initial output was further transformed through drawing by the artist, and the process was iterated The result is a new artistic language, abstracted and transfigured by a combination of machine learning, meaning AI, and the human mind and hand The grid of faces on the large screens shows some of the diverse styles the AI applied to Pam’s images The video now playing shows portraits of people who inspire the artist with their work to solve huge problems we all face as a global society The inspiration for Breadcrumbs came from observing the current trend of surveillance capitalism This is where we draw digital self-portraits with our devices, by our actions over time Our breadcrumbs, or digital artifacts we leave are constantly sketching us in We don’t necessarily own, control, or even know about these crumbs that trail us or how they are being used This has happened very fast Increasingly people have lost trust and felt less secure Using art as perspective on this cultural shift is a way to focus on the human in this era of interdependence with technology Pam and Neur-on, congratulations and thank you for this wonderful contribution Please stand so everyone knows who you are (audience applauding) >> I’m right here >> Oh, I’m sorry, Pam is standing right in front of me (laughs) Everyone, I hope you’ll introduce yourselves to Pam and the team after the show Now, some further thanks are in order Accenture, thank you for being our platinum sponsor tonight We cannot thank you enough for your continued support, thank you (audience applauding) Thanks also to Forbes, our co-presenter this evening You have been an amazing partner, and your participation continues to make this event one of a kind year after year (audience applauding) Our gold sponsor is Alaska Airlines, who is joining us for the first time (audience applauding) Thank you and welcome And our silver sponsors, British Columbia Trade and Invest (audience applauding) Huawei (audience applauding) And M12 (audience applauding) Thank you also to our friend and collaborator Chris Stedman for his help in putting together tonight’s panel (audience applauding) Our polling partner tonight is Poll Everywhere I now invite Aimee Escobar of Poll Everywhere to demonstrate the easy voting system that will be used Please take out your cell phones and welcome Amy to the stage (audience applauding) Welcome

>> Hello everyone All right, so we’re super excited to partner with the Churchill Club today for this annual polling event We’re thrilled to provide the platform that allows the audience to engage with the presenters this evening, and be part of the conversation So, to get started we wanted to do a test poll So if everyone can pull out their cell phones today, and we’re going to go to pollev.com/topten on a web browser All right, so once we’re there we’re going to vote on whether we’d prefer to be lucky or smart Hard decision, only one vote All right you can see the votes there on the screen (audience laughing) Most people would rather be lucky (audience chattering) Looks like the votes are changing a little bit there Lots of people want to be smart Awesome, we’re going to go through the tech trends tonight and you’ll be able to see the live voting right on the screen Thank you (audience applauding) >> Thank you Aimee Now, let’s start the program by inviting Olan Kenneally, managing director of Accenture up to the stage to kick off the show Come on up Olan (audience applauding) >> Thank You Marcia Good evening everybody Great to be here tonight It’s a lot of fun for me as a longtime fan of the Churchill Club It’s exciting to be here at what I think is one of the most fun events in the Valley And you might think that I’m making that up but I had tickets to the Warriors game tonight and I’m still here so (audience chattering) I’ll be doing a live tweet of the score on this polling thing later on So, congratulations to the Churchill Club There’s something magical about 21st so congrats on coming of age for this spectacular event, and I’m happy to be here to represent on behalf of Accenture So, because we ask all of our pundits to come up and talk about non-obvious trends that are going to have potential for explosive growth in a five-year horizon, I thought I’d look back five years to 2014 And I’ll just pick three of the then predictions, trends, and let’s see how they’ve played out So the first one we are going to talk about is the last-second economy And James Slavet from Greylock predicted that the mobile, this whole mobile on-demand economy, well beyond just taxis and ordering food, would change how we live, how we work, how we play, and he was spot-on I mean we’ve seen it, it’s pervasive in every part of our lives And in probably the 60 seconds I’ve been up here there have been about 1.6 million Tinder swipes, and a few of you guys not concentrating back there I’m not sure what’s going on But it’s something that certainly the VCs in the room have certainly done very well from As a retail investor I’m looking forward to my post IPO stock improvement over the next few months in that world The second trend that I thought was fascinating and interesting was when we start to look at the home automation space And the trend really was thinking about the routers and the idea of the dumb router, and the transformation into an intelligent network, a space where it is going to be software-driven networking that interacts with all of the devices and all of the products that are in our homes And in that space, you know we’ve seen it, we all have our Blink and our Ring Of course they’re now both bought by Amazon You know you think about Nest, and of course being acquired by Google And my favorite which of course is Alexa This relentless ability to answer the really annoying questions from my little nieces that seem to have a tsunami of questions that come my way Now Alexa takes care of that for me And so from that perspective again spot-on, maybe a more obvious trend, but one that is obviously phenomenally explosive, and something that has come true so another great trend And the third one I thought that was interesting was Rebecca Lynn, who’s sitting here with us tonight, and Rebecca talked a lot about data from the gut, and how that’s going to transform the healthcare industry through data and analytics,

and really take healthcare from being what we see today is more of an art and become a science And I think while we might not, it’s not as obvious to us, it’s probably going to impact every single one of us in this room in our lifetime So make sure you eat your probiotics every morning That’s a tip from Rebecca So with that, I think it’s time for us to really start to focus on what’s the next set of trends And one thing we know for sure is innovation is never going to be slower than it is today And so I’m really looking forward to hearing from our 2019 pundits All of them here graphically represented nicely Thank you guys, it looks great So with no further ado, let me start the proceedings Returning to the top ten stage let me welcome up, of course General Partner at Canvas Ventures, Rebecca Lynn (audience applauding) Rebecca, this is a test, you’ve got to find your resemblance and sit in that seat (Rebecca laughing) And so we’re going to keep an eye on you >> Thank you >> Joining us for the first time Aspect Ventures Partner Lauren Kolodny, welcome up (audience applauding) Managing Partner, Norwest Venture Partners, Jeff Crowe, welcome up (audience applauding) Mayfield Managing Partner, Navin Chaddha, come on up (audience applauding) He’s not here, there he is >> There he is >> And of course Partner at Venrock, Brian Asher And that is not a fashion faux pas, that is a wetsuit (all laughing) >> That’s the best line >> Welcome up And of course our emcees for the evening probably don’t need an introduction, but I will go ahead and do that Our CEO of Forbes, welcome up Mike Federle Otherwise known as Fed (audience applauding) Irishman with a German last name And of course we have our prolific author, speaker, co-founder of the Churchill Club, Forbes publisher, a man who has an ability to make me feel like an incredible underachiever, who have four weeks ago published his book “Late Bloomers”, and two weeks ago had the headline article in The Wall Street Journal weekend edition Rich, come on up and join us here on stage (audience applauding) Okay, over to you guys, on with the show >> All right, thank you very much Olan It was a great introduction, and we are so delighted to be here It has been eight years or so I think with Forbes, and it’s great to be here We appreciate how easy Karen makes it every year for us to participate, so thank you Karen and the Churchill Club for having us back And hi everyone, it is excellent to be here Rich, my cohort here at Forbes, and friend, and we have these very five very smart VCs who are going out on the limb with their presentations As a matter of fact back in the green room or the locker room as we called it, it sounded like it was going to get a little rough and tumble There might be some good debate going on And with that, Rich and I thought maybe we should put on our umpire referee uniforms here So Rich, let’s do so all right (audience laughing) (audience applauding) >> Do you have something for us like boxing gloves? >> You’ve got paddles to hit each other with >> I thought those were for ping pong >> We expect a good clean game tonight but we will be here officiating We will call foul when we see one but we’ll have an excellent time By the way now, Olan said he’s going to give the scores to the game but he was quickly shut down on that that the Warriors are of course playing as they were last year when we had this event, and several people in the audience actually many have said please don’t announce the score They’re videotaping it or DVRing it at home so they don’t want to know so keep that to yourselves and keep it quiet And what we don’t want is for people to cheat on the score and leave breadcrumbs which is a ongoing theme we see here tonight So thank you very much, and we’ll be here all night with you >> And it could be a long night >> And it could be a long night >> It’s going to be Settle down this is going to be what a great night We’re always happy to do this event and tonight is going to be especially fun And Karen wanted Mike and I to make our own comments

since we get around in the world, various conferences, various things that are covered at Forbes and maybe throw out an idea or two and see if our ideas match up with what the panelists are going to be talking about tonight And one thing I’ve observed over the last year is the is the kind of quiet rise of enterprise software You look at the once and future king of the market Cap tables is now Microsoft Again as it was 20 years ago It was and you look good now with all the hullabaloo about Uber last week and it’s disappointing IPO value today that I think at the end of trading date $72 billion, people have kind of forgotten that VMware is worth $60 billion or that ServiceNow is over $50 billion And so I’m looking into the future and we think about digital transformation of the rest of the economy and wondering if that will come up or if it’s going to be more consumer oriented >> Rich it’s why you write books I think you got a good one there but I would perhaps it’s ’cause I’m from New York, the financial capital of the world at least would like to think of it that way, and our investors in Forbes or in Hong Kong A lot of the conversations I’m involved in is cryptocurrencies and I believe we’re coming out of the Crypto Winter It’s going to change its name to digital assets and we’re going to see companies, we’ve seen Chase We’re hearing about Facebook possibly with their coin We’re going to see this really become a major and significant force which is obviously disintermediating a lot of the middle of folks in the finance world So I don’t know if that’s going to play out here as much I remember last year I mentioned blockchain and everyone laughed at me so we’ll see where the thoughts go on the panel tonight >> Well since we’ve already gone half way rogue, why not get the full rogue? At the end of this while we’re tallying up the votes some five hours from now (audience laughing) Two hours from now precisely ’cause it’s so well scripted Mike and I are going to present our own rich in my gold star award for the trend we think represents the criteria of this event The not obvious which is hard to do and will have the biggest impact (audience laughing) Who wrote hugest on it? Where did hugest come? Have the hugest impact >> We’ll throw a flag on that one >> Hugest impact on the economy and society and we’ll see how that works out Audience please do not fail to vote for every speakers trend by the way on the Poll Everywhere app It’s up to you to select the top trend and I think we’d really like to get a true read from the 400 people in the room This is the gold star award We’re going to do it while you’re voting so we’re not going to try to put our thumb on the scale or maybe we will a little bit well Well heck, we should We sit higher than they do, can’t we? >> We’ll see how the audience– >> We’ll see how it works out >> Definitely vote but last year’s trend wizard was Tomasz Tunguz with the hunt for authenticity About the growth opportunities around determining what is authentic and what is not So let’s get started and hear what comes out on top this year Now here are the ground rules each speaker will deliver two trends based on the following criteria One, they are not obvious today and two, we may see explosive impact in these areas in about three to five years Now here’s how it’ll play out Rich and I will give you a brief description of the trends one-by-one then the owner of the trend gets 2 1/2 minutes and we will stay to the clock here on the stage, to persuade everyone in the room that his or her trend is the top trend The panel will vote and comment with their paddles The speaker gets one minute Red they don’t agree, green they do agree and then the speaker gets one minute to rebut The audience then will vote and we’ll see where we end up Now don’t forget pundants the magical wizard wand will go to tonight’s winner As I was saying back in the green room that’s akin to a Nobel Prize or the Pulitzer or something very big for the Churchill Club >> Can I have my book? >> See someone needs to plug this >> Yeah someone needs to plug it My publisher as I said would kill me if I didn’t use this opportunity So never mind I’m just going to put it right up here It’s going to be nice and handsome facing the audience Okay let’s dive in, trend number one The rise of functional medicine It’s Rebecca’s keto paleo and gluten-free

are here to stay Lookout for one-stop-shop companies that facilitate functional medicine and do more than just provide diet or supplements but create packaged actionable results based on diagnostics diet and diagnostics again Rebecca over to you >> Thank you for that, so we have a big problem here in the U.S According to JAMA, our Journal of American Medicine 45% of adults in this country are obese or near obese and 60% of adults have at least one chronic illness and the current approaches in modern and primary care are just not cutting it and solving the problem, and they’re often making it worse Functional medicine is the answer It’s often referred to as holistic or integrative medicine, and the point of it is to determine how and why the illness actually occurred in the first place instead of just trying to put a pill into it And they heal by addressing actually the root cause of an issue on an individual basis This can include things like sleep, diet, environmental toxins and mold and lifestyle Functional medicine also has a focus per 2014 between the gut and the brain And in the last ten years they’ve actually found a new lymphatic system in the body that provides a direct link between the gut and the brain connection And they use data based upon what’s happening there as well as your personal genetics Things like the MTHFR genetic mutation, 30% of you in this room have it and if you do, you shouldn’t be taking a normal B vitamin You should be taking something that’s methylated And things in the blood such a vitamin D deficiency which a huge amount of people have which normal medicine doesn’t really test for, and which is the number one correlate element to a host of disease including MS, Alzheimer’s and arthritis We’ve already started to see this trend explode Meditation apps including Calm and Headspace The rise of supplements personalized to you with carob, Ritual, Bulletproof and Thrive Market and a cult of following with Goop and Well+Good And we’ve also seen that people try to dismiss this saying this is not rooted in science and it actually very much is Unfortunately, it takes about 17 years for things in studies to make it into mainstream medicine and the good news is that those studies happened in the early 2000s Real doctors such as Dr. Terry Wahls who reversed her own multiple sclerosis diagnosis Dr. Mark Hyman at the Cleveland Clinic and Dr. David Bredesen at the Institute of Aging He wrote the book, “The End of Alzheimer’s”, are all behind it And so in the future look forward to the rise of functional medicine to solve some of our major health concerns in our era (bell rings) And it will happen in the next three to five years There we go (audience applauding) >> Okay, let’s go let’s go down the line starting with you Lauren, a good idea or a bad idea? Give us your one minute opinion >> Are we going to vote? >> Yeah >> Oh we vote for– >> We can vote for ourselves >> You’d think I’ve not done this before >> And hold them very high so everyone in the back can see them >> Red or green? >> Agree or disagree, all right Two agrees and well you’re going to agree of course but two agrees and two– >> Got to vote for yourself >> Okay, now we’ll start down the line Lauren, you like it, tell us why >> Yeah, no I think it makes a ton of sense Some of these fad diets have taken off for a reason they work for some people but not for all And I think we see these come into trend and then out of trend but a lot of them are rooted in science, and I think that the functional medicine approach that really drives a more personalized understanding to why a particular diet may or may not work for you is what we need And so I mean I think about when I was growing up, fat was considered terrible for everyone And now fat is cool but in reality, it’s good for some people and not for others And I think that the more holistic functional approach coupled with new personalized data from genomics and biome data and glucose monitors make for a really interesting and more personalized solution to many diseases >> Jeff >> So I actually think that there is a lot to say for functional medicine and I think it’s the combination of data the personalization I think these different health practices I think actually can have significant impacts The only reason I actually had it as a red was simply that I think that the pace of change in the medical community is just so slow In terms of the average GP or other doctors actually rolling out with functional medicine that I think it just takes a long, long, long time

for the medical community to change the way they train and they behave I have a daughter in medical school right now I can tell you they’re not teaching functional medicine in medical school the way Rebecca articulates it So I actually think it has a lot of merit to it I just think will it be prevalent in five years? I think it’s just going to take a lot longer for that to roll through the medical system >> Navin >> Yes, so I commend this trend right and I’m a personal believer in it But the purpose here is to take a lens outside of Silicon Valley, right? So I think the technology exists and there’ll be a lot of companies going to go after this problem but one of the things having done a lot of consumer investments We have seen, it’s hard to change consumer behavior It’s hard to change the behavior of the practitioner as you said One of the things people care about is, hey, am I paying for this or is the insurer paying for this? And if the insurer is paying for this, you need efficacy You need to prove results that this is actually making change, right When I take a drug, I have diabetes, you can control it I keep away from the hospital and then finally I’ll say this in a jokingly way One of the most successful person in the world, Warren Buffet says, “Hey, I invest in sugar “I invest in Coca Cola.” We all know it’s poison but it’s addictive, it’s cheap Everybody wants it and that’s an example of human behavior is impossible to change Even water comes with all kinds of sugar, right Artificial sugar, non artificial sugar and I think that’s where I struggle when we are trying to go for these things Are these niche solutions and when we say, yes these are being used are they be used in mainstream? Are these for like tens of millions of people, 50 million people, 100 million people, or 100,000 people so that’s where I struggle with these things >> We need to get Brian, Brian >> I liked it because we’re already seeing the tip of the iceberg on this trend There’s a company out there called Verda that through a daily diagnostic, a little pinprick blood draw and coaching can reverse Type 2 diabetes No drugs, just take your blood every day, run it through the diagnostic It’s a digital coach and an online coach controlling your diet and it cures Type 2 diabetes And I hear you on behavior change is hard but when you get letters and see the tears on people’s face who thought they were going to have to take insulin every day for the rest of their lives and they’re cured I think that’s going to catch on >> Thank you for that It’s my turn So I hear you and it’s the VC response I would expect I grew up in the Midwest, I am sorry I grew up, except for Brian I grew up at Procter & Gamble, right I know P&G is actually the one that planted in everyone’s head here that sugar was okay and you should eat like basically poisonous things like Wesson oil and Crisco, and stuff that should never be in anybody’s diet by the way, and that somehow animal fat was bad P&G paid the American Heart Association in the 1950s $1.7 million to basically go against fat and to support sugar and there is absolutely no clinical studies to support that fat is bad for you like absolutely not And so there is a big media piece behind it I totally agree with you but what we’re seeing or let me just read off the billion dollar companies in this space to make it mainstream >> Be quick because we need to go to a vote >> Sure, Carob, Ritual Bulletproof, Thrive, Calm, Headspace, Goop They’re all billion dollar plus companies that have already hit mainstream America, and quite frankly I think people are sick of doctors making them sick You know prescribing statins to you is not a good idea It totally kills your Coenzyme Q and people are tired of it, and I think weird Midwest has spoken (audience applauding) >> Okay, well it’s the rise of functional medicine Will it happen as Rebecca described it in the next three to five years? What are your thoughts? >> I think I’m in agreement with the majority My daughter is already way into this so I’m a green panel on this one >> I think you’d find the votes reversed in red state country though >> That’s close >> It’s hardening up like your arteries will if you eat Wesson oil (audience laughing) >> All right, we have gone, 15 more seconds for those last minute voters All right it looks set

>> Yeah >> Think so, I think we’re good, very good (audience applauding) >> All right let’s jump right into trend two, flexible fertility management will become mainstream The U.S. and the earth is in the early stages of a seismic cultural shift around how and when people have families As more women delay having children to stay in the workforce as the average life expectancy increases, and as more same-sex couples start families, flexible fertility services will become commonplace Lauren tell us more >> Yeah so as a Millennial on the panel, I thought it would only make sense for me to talk about putting off having babies but seriously, 2017 was the first year in U.S. history that more women had babies in their 30s than their 20s Think about that for a second It’s a really big change And as life expectancy continues to increase by generation and we start to see the economic impact of women staying in the workforce longer, This trend is here to stay Women who wait to have children have salaries 9% higher for each year they wait That means a woman who waits until she’s 35 years old to start a family earns 45% more than if she started her family when she was 30 Family composition is changing dramatically For example, the number of same-sex couples raising children is up 74% from its 2000 numbers Unfortunately today, fertility treatments are extremely expensive, tough to navigate and inefficient 70% of people go into meaningful debt One cycle of IVF costs about $15,000 but successful IVF requires multiple cycles for most people which means the average successful IVF cost $61,000 and surrogacy is at minimum $90,000 So next, yeah that’s the right slide Increasing demand coupled with the clear economic benefits will drive more affordable and accessible fertility services Fintech has a great, big role to play in this We’re already seeing a lot of innovation around the way that financing is serving this industry In addition, I think there will be new savings instruments for fertility like almost like a 529 for college education Increasing demand and clear ROI will drive affordability because employers are going to start providing this as benefits, and I don’t just mean Google and Facebook We’re a little jaded here in Silicon Valley This does not exist more broadly right now And I think treatment care models are going to improve significantly In particular, clinics are going to become more efficient and even automated, and science is obviously playing a big role too Genetic screening on embryos is improving as is measuring the health of eggs which will allow for fewer fertility treatments to result in success I think Gen Z is going to become more proactive and as they see Millennials struggle with this and actually start egg freezing And software is going to play an important role in eliminating some of the confusion (bell rings) by providing tailored fertility navigation Next slide, so what will this amount to? >> Oh hasta luego (bell rings) >> The biological clock is not going to matter the same way it does today (audience applauding) >> Thank you Lauren All right panelists, hold your paddle high Red or green? Hold them high so they can see them in the back Oh look at that A unanimous green across the line Brian, why don’t you start down at that end and keep your comments to within one minute if you could >> Sure, I went into this wanting to say it’s obvious and then I got shamed into realizing okay, it’s obvious at Google and Facebook, and there’s a few startups but we’re still so early across all of America in this trend and it will happen And again we’re probably at just the beginning so I agree >> Very good, Navin >> I agree too, I think this is much needed I think the only thing that needs to get seen is what’s the business model? How does the pricing come down? And do employers or even insurers take this into account like if you’re going to be a growing society with a growing mindset It remains to be seen but I’m a big believer in this trend Thank you >> Jeff >> So I agree with it I agree with a lot of the specific forces that are coming to bear on this that Lauren mentioned I also think it’s just ends up being a national imperative for the country I think it’s going to be, it’s a big deal for America that we not go the way of what’s happening in countries

like Russia or Spain or where populations are starting to actually shrink and growth is being questioned And so I think it’s going to get a lot more, it’s not only an issue that I think couples are going to be considering very seriously I think you’re going to see over the next five years a lot of broader societal and frankly government and economic tailwinds pushing behind trying to do something about this because we’re going to realize it’s going to become a big national problem for us So I think there’s going to be a lot of more broad national focus on it as well >> Interesting, Rebecca >> So I’ve looked at this a lot and I don’t disagree and I’ve wanted to love it It is like a coming trend but I haven’t seen it I come from the Midwest Most of my family is back in the Midwest We come in a day when Alabama just voted in the toughest abortion laws ever, right And guess what, it really is a New York, Bay Area thing It’s not even a Fresno thing, right It is New York and Bay Area and it is In San Francisco, they’ll pay for a sex change, right And in most places, they’ll pay for your Viagra and they won’t pay for like birth control pills right, and so I think this is a real issue It’s a real issue for educated women Half of the women graduating from Harvard aren’t working full time ten years out, and half the MDs that are actually women minted as medical doctors don’t work full time I think five years, they’re half time and it’s causing a real issue right but I don’t think this is an outside the Bay Area problem I think or a New York problem I think as we, I’ve jumped into this looking at financing solutions and things like that it Actually for me, personally was more of a functional medicine issue I had three miscarriages between two kids because of a bad thyroid that my primary couldn’t somehow catch And so there are a lot of those blocking and tackling issues as well that sort of factor into this So yes, it’s a big problem I see it with my friends in the Bay Area I don’t see it with people back in the Midwest >> I’ll qualify that as a Midwestern green You did hold up green, right? >> I did, yes >> Okay good, now let’s go to the crowd and vote Oh rubato, sorry Lauren >> Well I’ll keep it brief Rebecca, I appreciate your your commentary I will say that and it’s all anecdotal but my husband is from rural Virginia and we have friends from there that have taken out second mortgages on their home in order to pay for fertility treatments And so I think the problem varies in different geographies but I really do think in looking at the data and this anecdotal experience that it’s a countrywide problem and it’s going to get more prominent >> Good but voting and has begun but keep it going Give it a little bit of time here What are your thoughts Mike? >> I think it’s, I agree it’s a global issue And I thought it was very interesting, the comments on, it’s also an economic and maybe even security issue in terms of encouraging continued population rather than reverse population >> Yeah, about a month ago on stage in LA, I interviewed Mike Milken and he said something provocative He said, “Social science is the best market forecaster “and when you look at these big problems “eventually solutions are found to address “these big problems.” And I agree that may not be as popular in Fresno as the San Francisco Bay Area but Singapore and Hong Kong There’s so many places around the world that this problem seems to be a ready market for this idea >> Very good, we’ll give it a few more seconds Nope we just ran out of time so it looks like 57, 43 It’s a winner, so agree (audience applauding) >> Trend number three, real estate is often listed as one of the last sectors to be impacted by technology Get ready for some major changes The way we buy sell, lease, build, move to, live in, furnish and secure physical spaces are become faster, simpler and cheaper for both our homes and our offices How we utilize space will continue to dramatically change in the next few years due to macroeconomic shifts on demand preferences and lack of space in urban areas Jeff, take it away >> So let me start by saying why I think this is not an obvious trend, and that is let’s start with just homeowners We’ll talk about residential and commercial but for homeowners if you go to buy a house today other than maybe looking at some listings through Zillow or something Basically buying a house today is pretty darn close to where buying a house was a generation ago If you go to rent an apartment today,

again other than online listings, renting an apartment today, moving in and furnishing it, pretty much the same as it was a generation ago And by the way in the commercial side for those of us who are in venture capital Our portfolio companies find offices and sign long-term leases and take months to move in, have a lot of CapEx And I’ve had multiple companies go through that this year same as it was a generation ago So in fact I would argue that the real estate industry, the motto is same as it ever was And these are trillion dollar industries in both both commercial and both residential that we’re talking about yet I think if you look underneath the covers, there are changes that are occurring that are being driven by software, hardware, building design business processes that are going to really become prevalent over the next five years Let me give you some examples of those So getting rid of the friction, you’re going to be able to sell your house with just a couple of clicks and actually get the cash in three days You’ll be able to buy a house in a very similarly frictionless fashion You’re going to be talking about trading up a house the way you trade up a car That is starting to happen today that will be available in the major 15 metropolitan areas around the U.S. in five years It will be prevalent In some cities, it’ll be 10% to 20% of the homes are going to be bought and sold this way Going cheaper, you’re going to see Millennials in cities living in purpose-built buildings that are basically called co-living where it’s a smaller physical footprint for the bedroom but larger common areas All managed through software Better amenities, better group activities It’s cheaper rent for the individual but it’s more dollars per square foot for the landlord that is going to become prevalent in the next five years Virtually decorating a room, take five pictures or eight pictures from around the room and have software do a 3D rendering of your actual room decorated according to your preferences So you can do a 3D tour of it as opposed to getting random designs from an interior designer and go I want to buy that, I want to buy that, I want to buy that Cheaper security, you’re going to secure your house with $20 cameras and a $20 motion sensor Gone is going to be spending hundreds of dollars or $50 a month to secure your house that will be cheaper (bell dings) It is by the way, same thing is true in the commercial side It is all about flexibility, lower cost, speed of transaction and transparency >> All right fellow panelists, say yea or nay Hold them high >> Split decision >> Yeah okay, why don’t we start on the other end Brian, that was a pretty resolute red holding in your hands >> Well, I feel like there’s already been massive change and you know Zillow is now like a sport We watched the Warriors and we check the home prices of our neighbors (Rebecca laughing) And agents will tell you that they have to become digital marketing experts and the common application So I hear you, there’s a lot more to go I totally agree with that I just think that what hasn’t already been done over 20 years of eCommerce and online agency are the really hard parts, the friction-prone I don’t want to sell my house in three clicks There are certain things that are just so important You got to slow down and do it right And there’ll be a little bit of enablement from digital but like I don’t want to buy a house in three clicks, I don’t want to sell it in three >> Okay that’s your minute very good, Navin >> Yeah >> And all of you I keep one eye occasionally on that clock >> So yeah, I’m a green on this right The real estate is one of the biggest market Doesn’t use more technology primarily the innovation has been more more on the information side with Zillow and Redfin but as we look at the Millennial demographic and as an example Millennials are changing their homes and apartments 10 to 15 times before they get settled and just simple needs like hey, what kind of furniture do I need? Are they going to keep buying it or do they going to get it as a service? Similarly right like what do I do with my art? Food is a huge issue for people So I think beyond the buying and selling of homes, I think they’re very interesting problems to be solved here and the question is whether it happens in two, three years but for the Millennials I think this is a big trend that we’ll see >> Well done, okay Lauren >> Yeah, I look at this similar to how I look at with the FinTech category more broadly which is the last 10 to 15 years have really been characterized by bringing a lot of existing In Fintech’s case, financial products online In real estate, bringing existing business models online so better advertising, yes Better discovery and content but still fundamentally the same And I guess what has me more convinced that we might be at an inflection point in terms of really transforming the fundamental sort of behavior and business models is when you look at companies like Jeff’s company

where he’s an investor Open Door or Knock where you’re starting to see real momentum around these around fundamentally different models for buying and selling homes So I think we might be there and I think potentially within the next five years, it will will have a meaningful impact >> So we’ve seen a lot of change I mean Zillow is a sport Whenever you go anywhere people are popping up in the app to see what the home prices are I think it’s fundamentally different if the home is Canoe, people feel it to be an asset that they’re buying and selling versus their home And so Millennials, well I also was younger at one point in time I probably lived in 10 different places It’s not that different right but they’re going to one day have kids and guess what when they have those kids like letting go of that house becomes a much more emotional issue The place we lived in, it took us six months to convince the buyer to really like actually let go of it and I think for people that if it’s their actual home It’s got that emotional piece to it If you’re buying and selling, you know a single-family home, I like we’re in Roofstock for example and that’s a very different thing that’s just, it’s like a physical asset that you can trade So I think there that we’re seeing some of it but I think when you get into single-family homes, it’s a different story Largely where people are living in their own homes I do love the co-living piece of it though I think that the common collective, all those people have house and others, I think that that’s just, it’s great It’s just like you know post college dorms You know that makes a ton of sense >> Jeff, you got a minute You had a mixed verdict here and you’ve got a minute to rebut your critics >> So it actually didn’t surprise me at all that people in the Bay area wouldn’t be saying, hey, I don’t want to sell my home in a few clicks This is actually a trend that isn’t hitting the coasts first This is a trend that’s happening in the middle of the country first and in fact thousands of families are doing this in Phoenix and in Raleigh and San Antonio like as we speak I mean this is a thing and it is starting to happen So I actually think there’s been way too much friction in the system for too long, and you’re seeing a lot of families, it’s not everybody but I say prevalent 20%, 30% of homes in some of these cities in the south and central parts of the country are going to be bought and sold (bell rings) this way in five years from now >> All right, now audience it’s your chance to vote on Jeff’s proposal and we start up a dead heat here And look at it, we got a horse race >> Not convincing folks >> Jeff, I was hoping you were going to go towards the technology that allowed me to get a house in Atherton for 1/10 of the price (audience laughing) >> Faster >> Well if you want something else that you’re not likely to believe in, there is going to be in five years, you’re going to see your bed and dresser drawers stored on ceilings Mark my, you heard it here first folks Not on the floor, not on the walls and stored on ceilings Voice control bring it down at night Really, really cool stuff happening in real estate >> We get to pick up all the shit on the floor >> Really? (audience laughing) >> If you were Mitch Kapor and you put $75,000 into Uber and you’re now worth an extra $325 million from that investment then it wouldn’t be an issue Mike All right, I think we’re done with that >> All right, 51 to 40 that was a– >> Pretty slim >> This one’s going to have to come down to the hanging chads I think (audience laughing) >> All right on to trend four Biology as technology will reinvent trillion dollar industries So biology as technology uses design and programming to save planetary and human health It’s reinventing trillion dollar industries in food, fuels, materials, diagnostics, therapeutics, computers and more Navin, the floors yours >> Absolutely, I think the purpose of my trend is to bring science back to Silicon Valley We talk a lot about software We talk a lot about eCommerce We talk a lot about this being media but let’s get back to our roots of what Silicon Valley made Silicon Valley So let’s start with what is the problem we are facing today? It’s in both planetary and human health With the rise in global population and people’s incomes going up especially in developing countries, consumption is doubling every 10 to 15 years and it’s depleting our natural resources At the same time as Rebecca said they’re more sick and obese people leading to a higher mortality rate, If you look at this slide, I look at the evolution of biology, the same as the evolution of mankind The left side is the monkey The middle is the chimpanzee I think and then it’s the human being The 1970s was the birth of the biotech

where biotech companies spurred use of biologics like insulin to develop new therapies to treat diseases What’s my prediction? My prediction is we are entering a global and golden era of biology where biology now is a platform like information technology With all the innovations that are happening in things like CRISPR DNA sequencing, gene editing, cell engineering and bioprinting not only will be fixed things but we’ll also make new things So bio will continue to solve therapies, and not only will we treat diseases, we’ll cure them forever But now, I don’t know whether these things will happen in the next one year or two year or five years They’re already startups which have gone public which are making meat without the cow, cheese without the milk, wood without trees, whiskey from molecules This is for people who like alcohol Silk fibers from bacteria, jet fuels from sugar and corn, concrete from spores, leather from mushrooms and even skin cream with engineered bacteria At the same time, bio might reinvent in information technology Our friends at Microsoft are working on how to store information in a few grams of DNA equaling to the entire data center So if I have to summarize, biology as a technology will have massive impact to not only save our planet It will save lives and as a result reinvent trillion-dollar industries Thank you >> Right on the dot >> Yeah (audience applauding) Panelists put your paddles high Oh, look at this >> Put them high, please put them high Oh three to one >> That’s good There you go All right well, we have three to one and why don’t we start down on this side Rebecca, what’s your commentary? >> And I think the commentary is I would also love to bring science back to Silicon Valley I’m a chemical engineer at heart so I can make what is it beer bombs booze and all kinds of different drugs but I think it’s a timing issue There are a lot been a lot of these really exciting companies that can make things at bench top scale and then struggle to really do it at high production volumes and bring it to market So I want to see it so it was a hard vote ’cause I do you agree with you but I think it’s a timing issue >> Lauren >> Yeah if I could have done a half panel on this one, I probably would have I was on the fence I think I totally believe in the theme and I think longer term we’re going to see massive innovation on this front The reason I gave it a green is so I think most of this is more than five years out but when you look at the subset around food that Navin mentioned that’s where I see a ton of innovation today, and like real product ties offerings obviously I mean you look at the Impossible and a number of others and I think that that we’re starting to see that become real So on that dimension I am supportive of this trend >> Jeff >> So I think it was just for me a timing issue I think directionally I would agree with Navin but I think these trends, I think people are starting to working on them today but I think they’ll have a massive global impact with some exceptions like Impossible It’s a massive global impact I think it’s more than five years And so for me, it was just really a strict interpretation of the timing role in saying that I think it’ll take more than five years for this to play out I think if you said hey what’s a 20 year or 10 to 20 year old then I just say one of the biggest trends that I think is going to come out of the whole innovative world will be this If you take a really long-term view >> I agree, it’s purely a timing issue and there are three things that hold it back The FDA in clinical trials The expense and the approval process is so slow The cost curve for some of these things We have such a long way to go on the manufacturability and scalability of things like synthetic protein And then lastly while we can read DNA cheaply now and we can even begin to write it, there’s so much about human biology we don’t know We just don’t know what all that DNA means and it’s going to take a lot of time to figure that out >> Navin >> Absolutely right, I think what I would say is some of these things we’re already seeing with the IPOs of some of the food companies so those are here now Similarly, a lot of the material things where there’s no need for FDA approval is beginning to happen in developing countries because there’s a lot of pressure as you build housing,

as you go to fuels right You don’t have natural resources So things like Amorous and others are beginning to get traction outside the US but then I really like this point right There is a famous consulting company, AT&T asked them, hey, will wireless ever happen? And they said, no and the rest was history so my belief is it’s good to be contrarian Some things will happen out of these things in the next one year We are seeing a huge pipeline of companies going public The biggest pipeline of companies which are going public in the pharma space are also bio which are trying to cure diseases not just treat them And in many other areas, they will be opportunity but it remains to be seen on what will happen and I think they’ll be a continuum as we have all agreed There’ll be things in one year timeline, three year timeline, five year, 10 and 20 but I think this is the future Thank you >> That’s great Now to the audience vote (audience applauding) Let us get into some good mo >> Yeah, I wonder about some of the informed criticism about the regulatory apparatus that stands in the way, and then I think I wonder if companies will be able to penetrate that in the same way Uber and Airbnb penetrated or ran around the regulations in their space? They simply move so fast and they provided benefits so tangible that the regulator’s felt powerless to regulate >> I think it’s good I do wonder on the timeline on it I don’t doubt the notion that it’s going to change things going forward All right, we’ll give it a five or 10 more seconds Get your votes in I believe this is our top vote-getter at the moment, right All right, let’s shut down the polls 60, 40 in agreement Good job Navin (audience applauding) >> All right, that’s why his venture is called Black Swan (audience laughing) Trend number five, VR-AI-Bioelectronic wearables will program our emotions in healthful and intentional ways Consumers wake up to our unhealthy addiction to the dopamine hits of social media Immersive, non-evasive wearables biohack us into states of relaxation, mindfulness and peak performance without ads Brian, over to you >> All right, anyone here concerned about social media addiction and its influence on our brains and that of our children? Our inherent need to feel significant and our innate desire for social approval has smartphone users checking their phone 80 times a day and spending a 134 minutes a day on social media We all know this is bad for us, causing loss of focus, depression, anxiety, social disconnection which is really ironic for social media and loss of privacy Yet engineers have so cleverly designed social media messaging apps to optimize daily active users in ad revenue that most of us are powerless to quit but smart entrepreneurs are going to take all that we’ve learned about digital engagement, immersive experiences and the human brain to build new products optimized for psychological health rather than creating addictions to drive ad views We’ve already talked about mindfulness apps like Headspace and Calm, and how popular they are It’s great, they’re simply podcasts Within five years, there will emerge a virtual reality, bioelectronic wearables that engage our senses so completely that they flood the brain with positive input hacking us into states of calm, mindfulness, happiness, peak performance and flow, and they’ll do this non-noninvasively and without drugs VR, virtual reality is already being used to treat phobias and pain and as devices improve, they become more affordable, and we better understand bioelectronic feedback loops and how to tune them, and application creators are attracted to the growing installed base We’re going to start using these new devices as part of daily habits much the way millions of people do yoga or try to meditate These are practices that haven’t changed in 5,000 years So it feels inevitable to me that immersive technology can push mindfulness and human psychology forward by building on business models aimed towards wellness not monopolizing our attention to feed us more ads

Think of this prediction as Deepak Chopra meets Ready Player One >> Oh very good (audience applauding) >> Well that ought to get the paddles flashing So let’s flash our paddles, flash them high so the audience can see (audience murmuring) Three to one >> Am I first or last? >> Let’s go down the line >> Me first >> Rebecca >> So, we’re already seeing some of this, right with the Call Map for example, and there was already Headspace which was a billion dollar plus company and now Calm, and literally what you do is you watch this little bubble You take a deep breath as the bubble expands and then you let it out as the bubble contracts and it’s a million dollar company (audience laughing) But we’re seeing this and this all goes to mindfulness and well-being and and stress reduction, right I mean you laugh It’s a huge company and so I do think this is going to be, it’s going to be something that people look for and will pay for >> Yeah, I guess it’s true that there are some very significant applications in mindfulness and I think that they are quite helpful although I would also say that I don’t think that private market valuations are necessarily indicative of impact especially in this space I think that the wearables to me, it’s sort of the antithesis of mindfulness and disconnection and when you look at the data around wearables today and their ability to really make behavioral change So like fitness apps, most people look at the data and don’t change their behavior Many studies have shown that and also the adoption of wearables has flattened over the last two years And so for all of those reasons, I am doubtful that this trend will materialize in five years >> So sometimes I don’t know if anybody ever remember the Bob Newhart Show where the person comes in and Bob’s this like psychologist The person comes in and have a serious problem and Bob’s advice is just stop it And sometimes I feel like that’s what we ought to just be doing with social media So in response to social media, we’re going to have some some biodevice that’s going to put us into virtual reality and send deep messages into our brains, and I’m going like oh, man That’s my response to social media It’s like maybe I can just can take a walk outside in the park or something so I think it’s going to end up being, I don’t think this is going to be a pervasive way in the next five years to counter impact what Brian very accurately describes, I think is a deep, deep problem caused by social media and cell phones I don’t end up thinking this is going to be a pervasive way to answer it Maybe on the fringes but it’s really not for mainstream Americans >> I think I end up in the same place I love to use these things but as I look at mainstream adoption, the issue is the moment you go to variables besides Fitbit and the Apple watches, these things are like clunky, all right Like basically using them is not easy The price points are high and we’re still trying to solve the basic problems on packaging, pricing, positioning And since we are talking about three to five years, I’m just trying to wonder which company is incentive to essentially make this happen, right So I completely believe that there is a need People in Silicon Valley can afford this stuff We have the resources We can go to the right places and get it, and I love by the way the Calm and the other things right but they’re simple So simplicity, ease of use but the moment it gets into a our VR, AI and this bioelectronic, my head spins And I’m better meditating on my own without any of these things >> Okay Brian, you have what’s left on the clock here to defend your thesis >> Sure so I think the term wearables in this case was misconstrued I do not believe that people will be wearing them day-in day-out in their glasses This is virtual reality, fully immersive for 15 minute hits a day Kind of like you try to meditate and it’s really hard We all suck at it and so if you have a 3D visual environment that is responding to you, bringing your biorhythm down into a state of calm that’s much more what I’m talking about and I think there’s a massive demand for this I think the reason that EDM festivals has become so pervasive It’s cause people go and they lose themselves for a couple of days and they feel different I don’t think this is going to be a cure where it could stop social media This isn’t an alternative This is that 15 minute a day maybe twice a day way to reset and we could just do better than sitting cross-legged in the dark

>> All right audience, it’s up to you (audience murmuring) Where’s your creativity? >> I feel the need to put my thumb on the scale on Brian’s side because every high performance sports team is moving in this direction When you’re trying to optimize how you’re preparing athletes for a competition, and I think the only mistake you made Brian was to tie it to social media There are any number of things that are adding to anxiety, depression, stress in American society today Explosion of traffic you look at something I wrote about in my book that the incredible pressure that kids and families are under to get great test scores and grades to get into elite schools People are stressed out Millennials are unbelievably stressed out >> Rich if I might, I’m not sure I’m allowed to break the rules here but the point of social media wasn’t that’s the only thing making us feel like crap It’s that that is proof that we can use software to make us feel like crap We can use it to make us feel good too >> Okay I will say, just one bit I have a confession to make I actually shared this one last night with my sales staff and you lost them at no ads so thanks that was a good one (audience laughing) >> So thanks, that was a good one >> Yeah that was a good one All right, as we go to a trend number six, 70, 30 against in that one As we go to trend number six, so we’re now in round two of our competition here and trend number six is distributed work forces will become the norm Facilitated by collaboration software like Notions, Zoom et cetera Most Dev teams will have a global presence Eastern Europe and the Balkans will be hubs for outsourcing which in turn will create domestic tech industries in the next 20 years So Rebecca explain that one to us >> So my trend here is that remote workforces work and there’s a couple data points to back this up So at Google, 48% of all medians within Google already happened across multiple cities, multiple geographies within Google, and Google as you can imagine is in a lot of studies They found no difference in effectiveness, performance ratings or promotions and people working in different locations across teams This month Stripe announced that their fifth engineering hub would be entirely remote and they weren’t planning on it but what they found was that remote employees have outperformed all of their expectations And they pointed out things that have come up like Google Docs, Slack, GitHub and Zoom are how most of the work is happening anyway So even if people are in the same location for the social media piece, they aren’t really talking to each other So why pay for the expensive real estate, set them all in one spot and have them just on Slack anyway? Agile work models are becoming absolutely the norm Upwork states that 69% of Millennials and Gen Z managers already are allowing teams to work remotely and are 2 1/2 more times more likely to use freelancers on an ongoing sort of permanent basis And Upwork also believes that two out of every five FTs and the future will be working remotely in the next three years There’s my timing, there you are Okay, on the other side of the hand, people are really happy telecommuting This is something that comes from both sides and according to a survey by remote.com, 60% of all people surveyed said if they could, they would leave their current job and work full-time remote for the same money And the other major drivers, so there’s a couple other things fueling this One, is that it’s really hard to find good talent and good talent is becoming much more specialized So being able to find that talent in other locations is very helpful, and also the whole emergence of co-working So it makes a ton more sense to save the money on office furniture and let people work remotely and then come into the office, and meet at a very nice WeWork facility with lots of coffee and facilities provided And this does extend already far beyond tech into accounting and finance, marketing and the like And so again, the trend here is that watch for distributed workforces becoming the norm not the exception in the next three to five years and also additional tech to sort of pop up to support that (bell rings) Oh my God (audience applauding) >> Very good, right at the bell Hold your paddles high panelists All right a three to one for Why don’t we mix it up this time and Jeff take it from the middle >> Sure, so I voted for it I think everything that Rebecca said, I think is pretty straightforward and it is true It’s demographic trends, it’s skill trends it’s technology trends

All of that is true I think is basically making distributed workforces a reality The only I would quibble with I think with what Rebecca said is I actually think it’s here already I think it’s here now I think it’s it’s a thing today and I’m not sure it’s not obvious So our portfolio companies are all doing it already and they’re not even big companies Companies of 20, 50, 100 people are already distributed around the country and around the world with Dev centers in the Ukraine and Pakistan, in Argentina, Costa Rica that’s already happening So I’m is like there’s no way I could say red because I think it’s already a thing >> Brian, what do you think down at the end? I have the same argument but I think when it’s that obvious you are suppose to say it’s red The criteria is not obvious so I don’t know I just feel like this has happened >> Maybe we need to change man I don’t know the rules, I’m still learning Okay a reason I put it green is at Mayfield, we have been involved with this globalization with China and India And not unlike you over the last 20 years, all our portfolio companies even startups are global, right? So it’s a thing The reason I put a green is the rise of Eastern Europe and the Balkans which is truly showing that it’s not just going to be India, China, Taiwan And basically a few other pockets where call centers were, like Philippines It’s going to be the rest of the world and technology will be the great equalizer but somehow the last point got deleted that domestic tech industries will get build there in the next 20 years, and I’m a no on it And the main reason is I’ve been investing in India for 20 years, nothing has happened, right So luckily, there’s no big company which has come out of India that is besides IT offshoring and other things, right So, it’s something we can take offline but that was not what was presented so I still maintain I’m a green Just to make her happy and remove the blood pressure (Rebecca laughing) >> As a former Googler which I’m sure there are many of in the room I could say not only did we take meetings with remote offices on Google Hangouts but we would do it with people in the next building to save time And so I think the technology is certainly there I think to me what is exciting to me and what I really grew with is these new technology hubs popping up in other parts of the world And I hear what Navin saying It may not be that we’re going to have the next Silicon Valley in Eastern Europe however I look, and that was done in Sub-Saharan Africa with training up talent and placing them with global companies I see what’s happening in Brazil with Nubank becoming a major player and will likely produce a lot of really high-quality talent and new companies And I think that this is just the beginning of seeing a massive change in these markets that frankly have not been a big part of our story from a technology perspective today >> So I think a couple things, so one on the already obvious I’ve been venture for 11 years now and what we still are seeing is you’ll outsource your scaling to Argentina or your QA to Russia or you’ll take a piece of it, right And very few companies right now are already truly international They’re taking pieces that are more tertiary items and popping them somewhere else What we have seen just happened, I mean in the last in pretty short time period is the entire Dev team sitting in Poland The entire Dev team is sitting in Estonia The entire Dev team is in Portugal And now we’re actually moving marketing there too and so you know we used to see QA scaling things like that then you started seeing engineering efforts going over And now you’re seeing like core accounting, marketing functionality and you’re seeing companies like Zapier being a 100% remote And it used to be a VC saw that you know and they said, oh well, the team needs to be in Silicon Valley and we need to be able to touch every one They would say even even a couple years ago, they would say that even today I would say a lot of people would say that And now it’s actually just beginning to feel like it’s acceptable and I think as we go in the next three to five years, it’s going to be the norm that we’re like where are you going to put your teams? Where you going to put your marketing team, where are you going to put your Dev team? And this notion of (bell rings) distribution is what people want >> Very good, out to the audience for a vote >> It’s a nail biter on this one >> Yeah

>> Also a good one >> Thank you Navin >> It reminds me of something Gill Joy said a long time ago when he predicted in the early 90s before the web existed as we know it today When we had a proliferation of the internet, when we had high speed networks He said, “The majority of the smartest people will not be “within your company.” In other ways, they be distributed around the world >> Oh yeah, something to stress on So a few years ago when Marissa took over Yahoo, it was really funny because she didn’t know where most the people were She actually had this work from home policy and she actually had to make people, if you all remembered that headline You had to actually come in and check in ’cause she didn’t know where a lot of those people were and at most it was true but a lot of them But today, all these software platforms have taken over to where you’re tracking everything people are doing so you don’t have that issue anymore >> That’s great, all right at 53, 47 in agreement so thanks and good job and on to the next >> All right this one is Lauren’s trend number seven A democratized network of trust Driven by technological advances, increasing regulation around privacy and consumer demand for both personalization and efficiency A massive network of verified identity will make all of our digital activities more seamless and secure This network will connect new clearing houses that verify our most important attributes without unnecessary redundancy Well from your lips to God’s ears Tell us more >> So the internet was built to identify endpoints on a network not humans but as internet services have proliferated, we as end-users are buried in passwords, and we spend meaningful time verifying and reverifying ourselves online As our offline interactions are increasingly facilitated by online brokers who we rely on as arbiters of trust We can’t help but notice that many of them are failing if you look at the news from care.com recently It’s pretty terrifying Nine times in the last six years, caregivers in the U.S were put on the platform with police records for alleged crimes including theft, child abuse, sexual assault and murder And later, nine people, those nine people were accused of committing crimes while caring for children and the elderly These online marketplaces need help and we’ve seen a trend towards some seamless identity experiences with federated accounts So if you think of Facebook Connect or Shopify Pay for universal checkout, we’re starting to see what’s possible when you can use a singular login or payment credentials across the internet but these are insufficient We don’t trust Facebook or Shopify with our most important identity attributes We needed a company or group of companies built from the ground up on trust and security And never has there been more need for this or more opportunity A new network of verified identity and identity attributes circulated around trusted clearing houses is going to come together and it’s going to be driven by tech advancements like zero trust security, proliferation of API’s computer vision in ML, advanced biometric set et cetera And the regulatory environment is helping to drive it forward too with increasing privacy regulation So it’s time for this identity layer to exist for the internet What does that look like in practice? So when Patty gives me my open water credential for scuba diving, it would get logged into a network such that the next time I go to a dive center, they can look up that I’m open water certified And then when I do a dive with that dive center they can record it against my more advanced certification Even my experiences have value as part of my identity if I want them to, HBO could verify that I watched season eight episode five of Game of Thrones on Sunday so that all the content apps on my phone can send me (bell rings) notifications about how the series just burned itself to the ground (audience laughing) So for all of my own identity data, I should have visibility into it and third parties who want to access it should be allowed to do so by me on a need-to-know basis with as little underlying data as possible You can even imagine this with the State Department at some point in time probably not in the next five years but verifying my passport information so that the hotel I stay at doesn’t need my actual passport copy >> Okay we need to wrap up You’re three secs over So panelists hold your paddles high but Lauren I give you credit for interjecting Game of Thrones for some cheap votes >> Sorry >> That’s pretty good >> I was so into Game of Thrones, I missed the bell, apologies (audience murmuring)

>> Hold it high, there you go >> High, high, high >> Okay Jeff >> Three against >> Yeah, I detect some sympathy vote here >> Yeah, it might be a sympathy vote >> Yeah, well Jeff since you were the last one up, you’re going to be the first to make your comments >> I think Lauren describes a massive problem, And in fact, Tomasz Tunguz described the hunt for authenticity last year It’s very similar kind of I think not the exact same but similar kind of issue and I actually think we have to solve this problem I think we have to solve this identity problem I’d debate on the timing Does it get solved in the next five years in the way that Lauren was describing it But to me I think this idea and I think a lot of it could end up being based on blockchain which is going to be this immutable record about who you are and what you’ve done and what you’ve earned and where you I think that is going to ultimately come to pass it’s just a question of timing but there’s too many forces behind us that driving the need for this solution And so I think it’ll happen which is why it’s was a green >> Okay Navin >> Yeah absolutely, I think this problem needs to be solved but as you say right like when you’ve been in the business too long you just get colored Honestly, we have tried everything from decentralized computing to wallets to God knows any other thing And the only thing I can think about where this is going to get solved is by interference from government And if you look at what’s happening in Europe with GDPR, what’s happening with CCPA in California, those are some who are going to be the drivers But even if we create yet another mere middleman, it’s back to the same Facebook problem right So if truly decentralization doesn’t happen and that’s in its infancy, right It’s the biggest hype, right? Like basically that exists So my feeling is this has to be done in public, private cooperation Governments have to step in EU is looking at it California is looking at it There will be solutions but to do this without the government who anyways wants every information on us It’s going to be just very, very hard This is not China >> We need to get to Brian and Rebecca and Brian >> Yeah, I want this to happen It is a timing question The centralized approach is counter to the privacy trends Government like we’ll be waiting forever for them to figure this out Decentralized is going to wait for blockchain and that just feels more than five years away and then lastly, this is a complicated concept for consumers to really grok It sounds like a lot of work and effort and so far they use, the way we give away our identity and privacy in order to use social media, it just seems like most people don’t care enough >> Rebecca >> Yeah so just echoing a little bit of Brian It’s totally counter to the the privacy trends that I was almost having flashbacks of China’s social credit system right which could fold in It’s just a little bit terrifying I think it goes counter to the GDPR and all the privacy things that we’re seeing in the market today And the other thing too is I mean they’re small They’re things that we all wish could happen I mean just like the care.com issue that’s something that’s been in existence forever And the problem is city, state and local crime databases are not centralized and neither by the way is your doctor’s medical malpractice record You can kill a lot of patients in one state and just go to the next one And sorry so it’s really amazing even it’s not Game of Thrones but like really important like life and death stuff is not centralized at this point in time And so I would hope we’d start there and then go beyond it >> All right Lauren, a minute to answer your critics >> Yeah so, I think these are all fair criticisms and then trying to present a vision I think what is missing from the description is the path to get there and I think that’s the most important So to me, it’s a big distribution problem, right Consumers don’t adopt this on their own You need to find a hook into the market and you need to find a hook into the market that is not highly regulated So that’s why verified passport information is much further out What I think is much shorter term is verified work credentials It’s background checks which Rebecca’s right Those databases are extremely distributed Some of them still live offline but that can be addressed by technology today, and I don’t think you need the data stored in a centralized system I think end-users can choose where they want it stored whether they want to store it themselves (bell rings) or in other third party databases >> All right audience, now it’s your turn >> I bet it would flip

if you said people want to agree I think everybody overwhelmingly wants to agree but maybe there’s too much cynicism here, maybe too much skepticism >> Well the privacy thing is an issue I think it might be a reflection (soft piano music) All right, looks like we’re locked on that >> Yeah, all right, I think we’re locked on that one Six feet, last vote Oh a couple more >> I think we’ve have seven seconds here You had it right >> All right that’s it 61, 39 against on that one Where are we? At trend number eight already Digital technology makes positive inroads on mental health Over the last five years, heavy use of smartphones and our frenetic always-on culture have been cited as causing increases in depression, anxiety in isolation Digital technology fights back from texting with your remote therapist to machine learning unique combinations of therapeutic problems In response digital technology democratizes access to effective mental health treatment, Jeff >> Over 1/4 of the U.S population suffers from anxiety or depression and only a very small percent of that a group gets treated It’s even more serious among college students among the Gen Z crowd and Millennials, it’s getting worse As Brian pointed out earlier, I would agree I think social media is exacerbating this Why is it that only a small percentage gets treated today? It is because it is too expensive to get help Visiting a therapist once a week cost $150, $200, $250 There’s too much stigma associated with getting help You can’t afford to leave work or let anybody know what you’re going to do and it’s not timely Often times you have a crisis If any of you have ever had folks with depression and anxiety in your families among your loved ones, you know it’s very hard to wait till next Thursday to see somebody Cognitive behavioral therapies actually proven this statistically in many many, many studies to be very, very effective It’s the delivery mechanism that just works for too few who can afford it I actually speak from experience I have a wife who is a clinical psychologist I think the way that this happens is that digital technology now makes it possible to interact with therapists on a 24 by 7 basis I’m not talking about videoconferencing with your therapist I’m actually talking about a new modality which is texting with the remote therapist Many people in this room might say texting, what texting with your therapist does that even work? Is that even a thing? It turns out it is a thing, and studies at Duke and Columbia have shown that it is actually more effective for certain populations than actually speaking to the therapist Certainly as effective and it’s available 24/7 it’s available at a quarter of the cost and it turns out to be broadly and very timely What’s going to happen from there and so this is already think is it become very prevalent Consumers are doing this already today, you’re going to see this available in your workplaces It has already now just been rolled out within the last month to all employees at Google This is going to be rolled out to as a benefit because companies want to make sure that their employees have good mental health Ultimately there’s going to be, technology then used to enhance this For the first time in human history, you’re going to have a database of machine readable, anonymized but machine readable (bell rings) data that is going to allow basically machine learning to figure out what works in the therapeutic give-and-take process first time ever This has never been done before and that’s going to allow basically software to instruct the therapists who over time are texting back and forth with their patients as to why don’t you try this? Take this approach This is going to work better You’re going to start to see AI assisted therapy in ways that you’ve never seen before in conjunction with this remote text-based therapy And I believe this will be a prevalent trend over the next five years >> Thanks Jeff, vote panelists Raise them high One, two, three, three to one against Rebecca, why don’t you >> It’s a tough crowd tonight >> Yeah so I do

So we’re at a company called Vida and I do see this AI driven therapy I’m anxious to hear what they say I will say I have depression and anxiety I just had a lunch with a professor today and she was that and it was a vitamin D deficiency so I think it goes back to functional medicine >> Still voting for that– >> And studies have shown that omega-3 in high doses work better than antidepressants >> Whose session is this? >> Actually make antidepressants work better (audience laughing) >> She’s adding a minute to her presentation >> Yeah, good work, Lauren >> I mean I guess when I think about the professions that can be automated effectively, this is about the last one on my list I think that there is something so inherent with the human contact around therapy I think there is a role for the internet to play in matching of therapists And in video or even phone delivery of care but I think it’s dangerous to offer a solution that delays effective treatment So the idea that someone that has anorexia or PTSD could be served by someone on text message, it’s just scary to me and I think that it’s a lot of murkiness in the regulatory environment around it And I think that that’s going to come to the forefront in not-too-distant future around this category >> Navin >> Yeah, I think the technology for this and the use case exists Now you might say we are biased We have invested in a bunch of telemedicine companies, and this is the only area which couldn’t get to mass market adoption And the main reason after we studied for it was the digital devices are the problem themselves People in one window are talking to whoever the consultant is in the other window are chatting to their friends but what we realized more and more is people really want to see their specialist This is not I have a cold, right like give me a medicine I have a rash, let me take care of it and then the other big problem was the business model For this to get into mass market, who’s going to pay? Right, are the consumers going to pay or the insurance going to pay? And you start hitting scale limits so for this, the idea is good There will be early adopters but at least we haven’t seen mass market adoption of the strengths for this particular case For telemedicine, there have been other cases not in mental health The low-hanging fruit is happening >> Yeah, I have similar reasons All of the telemedicine providers offer this as a service Teladoc, Doctor on Demand, MDLIVE, American Well They’ve all have partnerships with the major health plans and they’re offered through your employer It’s the easy one, right You don’t have to see a patient you’re just talking but adoption rates have been low I do like your idea around evidence-based therapy, let’s actually see what the data says is working but again until you get the massive scale of usage, it’s going to be really hard to close those feedback loops in any predictive way >> Roberto, yes >> So when I first described this to my wife who’s a psychologist, she had the exact same reaction Lauren had You got to have face to face, it will never work otherwise My daughter just got her PhD in clinical psychology as well I’m surrounded by it at the dinner table My daughter said, “This will work “My generation will use this.” So this rolled out in JetBlue Do you know how many people, JetBlue employees are using this service right now? 16% in the first two months Think about that so for those people who said this isn’t going to happen, it is happening right now >> How many are pilots? (audience laughing) >> I think he’s saying we’re old >> The point is this is not videoconferencing with a therapist This is texting with your therapist as you’re feeling and your therapist is texting back This turns out to be a thing and it is now rolling There’s tens of thousands of consumers using this This is not Telavideo with the psychiatrist This is actually happening today so I would argue that there are popular misconceptions Consumers are using this in droves and this is why now all the big insurance companies are signing up for this to provide this as not just the EAP, Employee Assistance This is actually becoming a mental health benefit Google ran a pilot at YouTube with this and they saw the uptake at YouTube like nothing they had ever seen in either telemedicine or mental health And that’s because I don’t see many Millennials in this audience but Millennials and the Gen Zers are into this in a big way Sorry if I insult the Millennials >> I guess we’re here– >> Sorry if I insult the Millennials

(audience laughing) >> All right, well let’s go to the audience here What’s the vote? >> Not buying it? >> Doing well man, doing well >> I liked it >> Oh wait, the Millennials are kicking it Yes, thank you A lot of them here Come on vote Millennials, vote Millennials Prove your elders wrong >> Yes, exactly I thought there being a little bit of a connection with Lauren’s idea that you would need verified identity if you’re going to go therapy Let’s go, we’re down here >> This is actually not the easiest– (panelists talking) >> And now I’m going to give example of– >> It’s a top space, yeah >> Oh, they got >> The two years, I’m going to >> You turn them there with that Millennial comment >> Thank you, very good >> Keep voting Millennials Keep voting, don’t stop Vote early, vote often >> All right three, two, one Votes are in at 54 to 46, very good >> Nice (audience applauding) >> Nice work >> Trend number nine I think this is a big one and one I’ve been waiting to hear to Navin talk about The Renaissance of Silicon will create industry giants All right, take it away Navin >> Absolutely, how many people– >> Oh yeah I got to read this I was so anxious (audience laughing) I got a preface it, just give me two sentences here >> Don’t take my time >> We’ll give you your time >> With the end of Moore’s Law, a new semiconductors are required for a Cloud-native, Data-dominated, AI-powered, IoT world The rise of these new players will put the Silicon back into Silicon Valley Navin over to you >> Absolutely, how many people in the room work in the silicon industry because I just want to see people if they understand even Moore’s Law How many people? >> Navin, does it count if the Church of Clubs first-ever speaker was Bob Noyse back in 1985? >> My time, my time (audience laughing) So I think, I need to get back 10 seconds >> We’ll give you 10 >> So let’s talk about the trend right so I’m personally very excited to see Silicon coming back to Silicon Valley, and taking us back to our roots As you know Silicon is the ingredient of the information technology industry Nothing can happen in any field without Silicon In the last, can we put the slide up? In the last 40 to 50 years, there was something called the Moore’s Law Their processor speed doubled every 18 months and what’s happened is the x86 era everything converged into a single CPU where all functions happened on one chip and software was good enough, But today if you go to the center column, we are hitting plateauing of the Moore’s Law due to limitations in physics, due to limitations in size, limitations in clock speeds You can’t double the CPU processing power every 18 months so is the valley going to give up performance? No, we are not We’re all driven by price and performance so what is going to happen? We’re going to see a flooding of Silicon architectures where the CPU will still exist and do what it does well but it will be surrounded by other components We have already seen that with the rise of GPUs with Nvidia, A hundred billion dollar company We have seen the rise of FPGAs Intel buying Altera for $17 billion Smart mix basically Nvidia bought Mellanox for $6.9 billion So in today’s world, 32 seconds plus 10 extra I need to get Basically, I predict a few Silicon things will happen The first one I predict is a company which is already in product It’s the founder of Juniper Networks because the three, five year rule I need to remember is a company called a data processing unit Data speeds and usage are growing 100x due to apps like Netflix and YouTube CPUs are terrible for handling data intensive applications These applications will run on the DPU Similarly there’ll be an (bell rings) emergence of AI and ML chipsets which the general purpose CPU ain’t going to be able to do Intel just invested in SambaNova, they invited Nirvana And similarly, they will be special-purpose semis for ultra low power IoT applications In a nutshell, there will be many, many Silicon companies which will get created which will reinvent the IT industry And I’m proud of the entrepreneurs and the companies in the room all working on getting

Silicon back into Silicon Valley Thank you (audience applauding) >> All right, let’s see those paddles raised high >> All right That’s good, okay Well let’s start with the critics down on this end Rebecca, have at it >> Oh me first? Yeah I think that getting Silicon back in Silicon Valley is an interesting concept I think there’s very few people here who probably remember that to some extent and I don’t know, >> We have more Millennials than we thought >> Yeah, you look at these investments and I have a couple actually in the older portfolios that are like they’re, it’s such a hard slog It’s so capital intensive still and it’s just a tough place to do it out here It’s not the most efficient place in the world The best teams in the world right but not the most efficient place to do it >> So I agree that this is going to play, GPUs and more advanced trip technology around AI is playing a really important role for us But in terms of building big businesses, to me a lot of the stickiness will exist on the provisioning and management layer And so if you have chip companies coupled with that software maybe you can buy into it I think that’s kind of where Nvidia is going but otherwise I think that the big businesses around new Silicon technology will be built at the software layer Really around deploying because deploying AI and machine learning models is much more complex than traditional software, and that’s where I think you’re going to find the hardest problems and the stickiest problems >> Jeff >> So I certainly don’t think that Silicon, the next wave of Silicon companies will end up being necessarily as big and impactful, certainly as say the Intel’s of the world But I would agree with the Navin that we are seeing, I think a proliferation now of companies for doing data processing units or doing Silicon for AI We’ve invested in a low-power chipset for IoT that is just going gangbusters so you can see for the prevalent applications of tomorrow I think have to take advantage of high-performance Silicon that is purpose-built for those functions And so there are going to be a collection of companies that do that maybe they don’t get to the size of an Intel as an example but I think they’re going to be important companies that’s why I voted green >> Brian >> Yeah, I think with the edge computing trend, we’re back to the future and every device will have Silicon in it And I take your point quite literally, we’re not saying that it’s going to be easy to create chip startups all of a sudden but the opportunity will be there, market size there for a few brand new behemoths >> All right, okay >> Nobody today– >> They were so quick, you can have a two full minutes here >> Absolutely, right >> ‘Cause I know you’ll take it (audience laughing) >> I would love to So I think first of all right, this innovation is not only about startups If you look at every big cloud company and every big, and I didn’t say this is about startups Every big hyper scalar Amazon is building chips, Google is building chips, Facebook is building chips, Microsoft is building chips It’s impossible to retain EE engineers in startups today Look at Apple, doesn’t buy products from anybody, the chips It’s all built in-house so the amount of innovation which was my point which was happening in Silicon Valley had stopped 20 years back Now it remains to be seen how much of it happens in the big companies because they have already realized We hear software is eating the world, it actually did It’s finished You need to innovate on the hardware side too You’re reaching limitations of what CPUs can do They are good at certain things, for running application code They’re bad at running everything else Why is Qualcomm 100 billion? Why is Nvidia 100 billion? Why is Broadcom 100 billion? I would love software companies to be worth 100 billion My partner Rajeev can count on his hands how many software companies are worth 100 billion, right And then my question to the venture community is I understand it’s been hard to find Silicon companies but I think that’s a thing of the past $130 billion got invested last year The VCs if you can’t even provide capital, what are we doing? I think like we need to get into the real estate business maybe or the hedge fund business I think my call to people is if they understand anything about physics if they understand anything about technology,

go back to the basics, right We have taken the easy route of (bell rings) taking shortcuts but it’s time to go back to the basic Solve innovative problems and face physics and sciences and there’s enough money available Remember there’s soft language in fund >> Let’s see what the audience has to say >> Oh, Navin, we may have a winner >> The audience loved it (audience applauding) >> Nice >> I’m strongly in your favor on this one One of my mentors in early days at Forbes is George Gilder We work together on a lot of long pieces that he wrote and I was as editor He’s got a book called “Life After Google”, and he writes about companies like Bitmain in China that was really built to power cryptomining but crypto may be in a winter and it may not emerge, who knows? But some of the Silicon that emerged out of the effort to mine faster rates is pretty darn interesting >> They loved it >> Well that constitutes tonight’s first runaway winner Congratulations Navin >> Nice job Navin (audience applauding) >> It was that extra 10 seconds I think That was pretty good Very convincing >> That was the lucky charm >> Very good >> You helped me >> All right trend number 10, our final trend of the night is Federal Defense and intelligence budgets will dramatically unlock for startups Cyber warfare is at DEFCON1 Our democracy has been breached Traditional military industrial contractors lack the software skills and agility to counter the threat while big tech companies are held back by picketing employees Startups will fill the gap Brian tell us about this one >> This is an ambitious one >> All right the United States is under relentless cyber attack from our foreign adversaries Military networks and systems have been breached Secrets have been stolen as was our last presidential election (audience laughing) Modern warfare has shifted from tanks, planes and ships to software data, autonomous systems and AI Cyber attacks are an existential threat to democracy the U.S. Department of Defense budget for 2019 is $687 billion of which $50 billion is for software and information technology Traditional military contractors like Raytheon, Lockheed, Northrop Grumman have been around since World War II and if you give them $10 billion in a decade, they’ll build a great aircraft carrier but world class software is not their thing And historically, VCs have advised their portfolio away from the government due to opaque bureaucracy, long sale cycle, barriers to selling direct, complicated requirements and contracting procedures But my prediction is that defense and intelligence will emerge as viable tech market opportunities over the next five years yet at exactly the moment that our nation needs Silicon Valley technology to help defend itself, world-class tech companies like Google, Amazon and Microsoft have their employees picketing (bell rings) against doing business with the U.S. Defense A small but vocal minority with the power to coerce Google to drop out of projects like Jedi which is a $10 billion program Over the last 18 months, I’ve started to see Defense Innovation Unit, SOCOM, Navy SEALs show up on customer logo slides and startup pitch decks And these entrepreneurs are almost sheepish about admitting it because most VCs shoo them away but we think over the next five years, we’ll see tech startups and cyber security AI and autonomy develop strategies and go to market play books to tap into the massive national defense budget and serve our nation in its time of need >> All right as the last one, we’d allow to go over for a few moments so go to vote panelists And this is split again, no at three to one, I think Yeah okay, why don’t we start down with Navin Why are you green? >> The reason I’m green is this is a big need and I think it’s going to create opportunities for companies all across The only thing we need to watch is can the startups actually win this business? Are they going to be dependent upon the big contractors and the big cloud providers because they’re the ones who going to own the account but since the prediction is, there’s opportunity for startups, absolutely

How much of it goes to them versus the big cloud providers and the big integrators is unknown to me but opportunity exists >> Yep, Jeff >> So I voted red for basically the exact same analysis that Navin just had which is I think the defense budgets are controlled by the very large contractors and a very, very large defense companies and I think it is really, really hard for startups to penetrate that world I think there will certainly be some activity on the fringes but I think as a mainstream trend making a major, major dent where defense dollars are going over the next five years Well I’d like that to be the case ’cause I certainly agree with the problem statement I just don’t see startups breaking through with just massive labyrinth of these defense contractors, consulting organizations and large companies who really have a stranglehold on the defense budget so I just don’t see the startups breaking through >> Lauren >> Yeah so I’m actually right in between the two of you I would say I think it’s really unlikely for startups to become kind of full-stack defense contractors I think Palantir was a complete anomaly in that sense for lots of reasons but I do think that and I believe that the defense contractors more broadly will continue to own this market but I think that there will be technology solutions developer platforms that get put in to the solutions being aggregated by the defense contractors that are really meaningful and without which won’t solve these problems So for example like deepfake detection technology, I don’t think that’s going to be built inside Lockheed I think that’s going to come from Silicon Valley or beyond and plug in or someone really like AR visualization layer for drones, I think that will come from startups but via an SDK will be plugged into the solution by defense contractor >> Okay, you better stand out >> Let’s just say this will absolutely happen and I don’t how much I can tell you We’ve had companies that have quickly and about matter faster than an enterprise software company can move land multi-million dollar deals with directly with the government We’ve had bigger deals come in via defense contractors who desperately need this technology It’s ironic and a little hilarious to me that Google did what they did given they think that money from Saudi and stuff is okay but we won’t go there So yeah I think this is absolutely happening We looked at one of your companies pretty closely and know what’s going on there but the CIA is out here talking to a lot of companies all the time They’ve devoted $600 million to the Amazon platform and others to get it cloud-based and there’s a lot of effort in the government today because they need things like video and face detection They can’t handle things like the Vegas shooting happening again and how long did it take them to get that tape together And then what they can’t rely on is China for it right so we need that technology and if Google’s not going to serve it then somebody has to So yeah there’s a huge opportunity for these companies and the government’s willing to come and just say how much >> Brian >> I just think times are changing because the need is so acute Palantir has done a billion dollars in Defense Department contracts And while the founders were special and had unique pedigree, the reason they got that money is because the need was there I don’t think– >> They’re trying to replace Palantir now The government is coming out here saying now we want to replace it >> When you’re done with startups >> Do it again >> The primes won’t go away but they simply don’t have the talent or the agility and could get these projects done on time So they will subcontract out and these contracts aren’t million-dollar contracts They’re tens they’re hundreds of millions and they are being fast-tracked like never before because we’re out of time So I do think that it’s a new day and if you’re ignoring this, it’s at your own peril Just this morning, I got an email from a former eBay employee saying he landed a big Air Force contract I don’t even know what that means yet (bell rings) (audience applauding) >> Very good Okay out to the audience for vote >> Another strong one I think I read somewhere where the first billion dollars of Silicon Valley’s memory chips in the Silicon era were bought by DARPA or NASA and without that early boost in the Silicon industry maybe Silicon Valley wouldn’t have happened >> Syria was a government contract too

It says Caleb– >> Good job man >> Thanks >> Syria was started as a government contract? >> Syria was the Caleb project and the government >> It’s going to happen, yeah >> Okay, get your votes in We’re at the last ten seconds All right, that’s that’s a strong one, 70, 30 for (audience applauding) >> Okay and now our supercomputer powered by some of Navin’s new chip companies are tallying the ultimate results to reveal the overall winner Let’s take a look at– >> They’ve already done it >> Oh they have, oh okay >> It’s the DPU, the DPU who did that >> Yeah, well the wizard wand Do you want me to still talk about the wizard wand here? Okay, well the winner is going to get a commissioned item called a wizard wand created by the ACME wizard Wand Supply company whose slogan is when it’s just you and the Prince of Darkness, only an ACME will do >> Actually can I give it to somebody else? >> Yeah >> I’ll be very dangerous with it >> You can present wand >> Oh, that’s beautiful The wand is fashioned of blue and white celluloid with sterling silver accents and a crystal sphere end cap >> I wouldn’t worry >> You can’t hurt anyone with it >> Yeah, I was worried with the anger (audience applauding) >> Very good Navin >> Thank you very much >> And Navin if anybody ever doubts you on one of your funding decisions at Mayfield, you pull out your wizard wand and you say that you see the future >> Absolutely, I have an ALP meeting coming up next week I’m going on and upload this >> And if they just still disagree you’ll poking them with >> No, that I won’t do >> Well done >> What about gold star? >> Oh gold star, well Mike and I had already decided we were going to award the gold star to number nine anyway The revival of Silicon and Silicon valley and so I think it’s almost redundant at this point but I’ll do it anyway It pales in comparison to the fine wizard wand It’s a good slide (audience applauding) >> Thank you >> Lauren I will tell you, I voted for your fertility one but Rich is a Silicon valley guy through and through >> I think you’re Bob Noyse thing really helped I think we brought the tradition back, the first speaker probably I’m coming here the last time >> You’re coming full circle with Bob Noyse >> Well that’s great Well that is our evening ladies and gentlemen We got it done >> Well done, well done >> Congratulations (audience applauding)

Uncategorized

Integration Best Practices – Import Sets

welcome to video 2 of our integration best practices training again I’m David Gatling the senior certification engineer for the certified integration team and with me I have John Anderson a solution architect for the pre-sales team in this video we’re going to be covering several of the best practices that relate to a data import into ServiceNow we’re going to provide you an introduction to import sets we’ll briefly describe the differences between webservice imports and and really show you the details of a data source import make sure to catch video 3 of this series in which we’ll go into greater detail for a web service integration additionally we’ll be creating a data source and showing you many of the considerations best practices and benefits around managing data source integration ant it and to cap this off we’re going to be showing you a scheduled data import just at a high level to get you introduced to the concept of an import set table we provide you with in a temporary staging area to bring data into ServiceNow which then provides multiple layers of added functionality from mapping to transforming that data to your destination table these really give the customer far greater control over data coming into their instance and provide a much more elegant solution for bringing data into and managing that data within ServiceNow so like every other table within ServiceNow an import set table also has web service API is extended to it additionally we can use a data source to bring data into this import set table in this video we’ll be outlining exactly how to define and configure that data source as well as the various pieces that go along with it like a mid server John will be walking you through the various steps around creating this data source and configuring it for your imports a table and your target table in this scenario we’re going to assume that we have an application with its service now that handles our research items on a particular topic so we can go in here and we would a list of research items based off of website URLs text from that URL as well as any discussion comments that our team has on that research item research items can go through different states they have a life cycle of new in progress or done and they can be assigned to different users for completion or or other assignments with this research item in the scenario we’re going to integrate with a third party research company that research company will do some research on our behalf and push that research to us also they have a staff doing the research and they want to move their user records into our user table as well and we want to be able to tell the difference between our system users as well as their users in their system that they’ve pushed over to us so in this integration example we’re going to create a data source that makes a JDBC database call to their database which houses a MySQL database with a user table we’re going to pull those users into our user table and associate them with a correlation ID and display value so that we can tell that they’re from this third party source that way when we get research items that may be handled by that third party we’ll we’ll see who is assigned to that specific research item in that company so to do this the first thing we’re going to do is we’re going to modify the user table let’s first make sure that our update set is selected which it is because we want to be able to capture these changes so the first thing we’re going to do is go to our user table and add a couple of fields to show correlation so we’ll search for the user table and choose the sis user table and a common practice or a best practice if you will is that if you are in a grading data from one system to another that if some of the tables especially tables that are built off of the task table will have two fields the first field called a correlation ID field which stores the third party unique identifier for that record as well as a correlation display field which is a free-form text field that identifies the source from where that record came the user table in this case does not have those two fields so we’re going to go ahead and add those will do correlation

ID that will just be a string of 40 characters then we’ll add correlation display will also be a string of 40 characters we’ll update that and we’ll just verify that those were created and they were there they are right there okay so now we’ve added those fields now we want to create a data source that will allow us to connect to the third party database so we’ll go into our data sources list and to start to stick with our in order for this to automatically show up in our integration menu let’s just always start the data source name with Acme research and I’ll say users here and let’s give it an import set table name we’ll just call it acne research users and and in this case we’re going to be pulling from database so we’re going to be leveraging the JDBC protocol now while it is technically possible to make database calls directly from the instance to your database endpoint it is a best practice to leverage a mid server for security and also it offloads the work to the mid server so if you have a long query it’s not tying up the instance at all it’s we’re letting the mid server handle that so I’m going to select a mid server that we have up and the third party database format is going to be in Maya UIL format the database name is test in this case and the port is 3 306 and the user information is there and I’ll put in the server information now we can either bring in all rows from that table or we could do a specific SQL statement in this case I’m just going to bring in all the users from the table the name of the table is I believe it’s called research users research user and a best practice is to use this last run date time if at all possible this is a Delta query and this will allow us to keep track of what was the last time stamp that we what was the latest modification that we got on a database query and we saved that so in subsequent queries we automatically add a where clause that filters any records that were modified after a certain time stem this particular table does not have a timestamp field so we can’t leverage that but if it did it would be a best practice to do so so we’re going to go ahead and click Submit now that we’ve set up the connection let’s test that connection really quickly so we’ll jump over here and let’s do a test load of 20 records and now we see that the mid server has completed that query and it brought in 8 records we could look at the data that the raw data that have brought in as we can see here there’s a schema for this database table it looks like user names are in a user token field first name is a field called first last name as a field called last there’s a unique identifier ID field looks like for that database that we can we can grab and leverage and looks like some encrypted password information we don’t need the passwords in this case we’re not going to worry about it and it looks like in this case the phone numbers are stored in the database kind of with this some sort of encoding bu s is for the business phone or the office phone and cell is for a mobile phone so we’ll want to handle that as we bring in that data as well so that’s our loaded data let’s the next step for us is to create a transform map now that we know the third-party database schema we can create a map it will map that schema over to our ServiceNow system user table so we’ll name this transform map we’ll name it research using transform and we want this data to go into our user table our sis underscore user table this tool to help us do this mapping exercise so on the left will be the schema from the database and on the right is the schema from our system user table so we’ll scroll down and say we want the first name and we want the last name we don’t worry about the password we’re not interested in it we we don’t want phone

at this point because we need to do some transformation around that so we’re going to skip phone as well we do want the user token and we want the UUID and I believe that’s it for now let’s go ahead and match these up to user fields let’s go to the first name field and the last name field as well as the user ID field for user token and let’s set the UID to our correlation ID field that we created on the user table and we’ll go ahead and save that this now gives us the mapping between the third-party schema and the ServiceNow schema note I do want to also set the correlation display field to contain information that this is coming from the Acme research system though I don’t have a corresponding field that I’m mapping to I can still create a field map for that and what I’ll do is on the target field we’re going to select correlation display and instead of mapping it to a field on the database since is no field that matches this we’re going to use a source script source scripts are scripts that can run during the mapping process of a single field these field source scripts really should only be used if you’re transforming something quick and easy on on the one field in this case all we want to do is is hard-code a string to that field indicate that this is part of the Acme system to do that anything at the end of the script that has this stored in a special variable called answer is what will end up being stored in the target field so in this case since we’re just doing some hard coding a value setting we’re just going to say answer equals Acme this will set the value of Acme in the correlation display field there we go now we need to handle the telephone number the telephone number can either go to the mobile phone or it could go to the office phone there’s not really a good way to do that inside of a field map because the target field it might be either the business phone or the mobile phone target field so we’ll want to create a new transform script that will handle this and we want to do this just before the third-party data is saved to the target table there’s two special variables available in transform scripts their source and then there’s target so our first step is to split up the phone value based on the colon character so to do that we’ll say coming from the source is the U phone field and we’re going to split on the colon character now we’re going to say okay if that first part is B us for business then we’re going to set the actual phone number on our business phone field in ServiceNow the business phone is just phone so we’ll say target phone equals parts 1 this is the other side of that that : otherwise we’re going to take the target value for mobile phone and set it to the phone number we’ll go ahead and save that all right now we’re transforming the data we’re setting a correlation display variable to a hard-coded Acme string we’re also bringing in the UUID and setting it on the correlation ID the next thing we need to do is do a coalesce meaning a coalesce is a type of primary key essentially we’re saying all right if data coming in in this field or in a certain number of fields is the same as data that already exists on the table in those fields we’re going to update that record rather than create a new record in this case what we want to say is if the correlation ID already exists in the user table we want to update that record with this incoming information rather than creating a new user that might have a duplicate correlation ID if this fall is true with that uu IDs are unique then the correlation ID should always in this case be unique in in our user table so we’re going to select that select that as our coalesce field now one best practice that you need to be aware of is that if you’re going to be potentially

dealing with large tables or growing tables it is always wise to create an index on the target table for the core for the coalesce field in this case if if there’s a good chance that our user table is going to hold millions of records and that we’re going to be bringing in a large number of records with this import we would want to set have ServiceNow customer support set up an index on this correlation ID field on the user table so that when the system tries to look for that particular value it’s looking against an index rather than making a full query on the table all right so now that we have this we’re going to save it and let’s jump back to our data source and let’s perform the actual load so we’ll click on load all records okay that’s successful so now let’s run the transform and the transform is complete so now if we jump to our user table we could set up a filter on correlation display starts with acne and these are the eight user records that were brought over from that database system let’s check to see if our phone numbers came in okay we got our first name in our last name and the mobile phone field was filled out for Angela Mia so it looks like our transform script is working nicely as well notice that this was a polling mechanism when we use data sources David Gatlin also mentioned that we have web services attached to our import set tables as well whenever you’re doing bulk data transfer it’s best to use data sources and pull that data in they’re more tailored to a bulk import and they they offer better performance on the system the web services are a great alternative if you hope to push data into ServiceNow in a transactional fashion meaning one record at a time so your best practice rule of thumb is if you’re dealing with bulk data go through these data sources to pull that data if you’re dealing with transactional data try to set it up so the third-party pushes that data in through the web service API for your for your import set table now the next thing we’re going to do is to set up a scheduled import so that we don’t have to do this manually all the time to do that we’ll just go to schedule imports and we will create a new schedule we’ll call this research Acme research users and if you were to do this on a daily or weekly best basis and this is this has a good possibility of being a large import you would want to make sure that you would set it for a time that’s going to be the least impossible impact on your instance and we’ll choose our Acme research users data source we’ll go ahead and submit that now if you recall in our previous training video we had created a menu system for our integration we did not include a scheduled data import however a module and since we’re leveraging that we really should add that to our integration menu so I’m going to go ahead and right click and edit the application and we’ll jump down here and we’ll add a new module and we’ll call this scheduled imports and we’ll say this is a list of Records we’re going to take this off of the scheduled data import table and we only want to show the ones where the name starts with Acme research and we’ll go ahead and submit that we’ll also want to put an order for that we’d probably want that to show up looks like data sources are 20 to 30 so we could say we want one scheduled imports to be twenty to thirty one that will refresh the navigation pane so if we come here and type in Acme we can see that just under data sources are scheduled imports and if we click that there’s the scheduled import that we just created that tells the system when to import those users there are a couple more things I’d like to highlight very quickly about transform Maps and import sets along the lines of best practices you might notice that a data source can have more than one transform map associated with it now if at all possible we recommend that you keep this trim keep this down to one transform map although we do allow you to have more

than one on some rare occasions it does make sense to do more than one transform map just be aware that if you do have multiple transform maps let’s say if you have three transform maps associated to one data source every record that comes in through that data source is going to generate three records in the staging table one for each transform map so if you’re ever tempted to have more than one map you need to ask yourself if that’s really necessary or can you do this in a way that that you can have just one transform map another thing I wanted to mention relates to the transform scripts themselves within the map this particular transform script that we had in our example is pretty simple it does a string split and sets values to fill based on the split results the processing time and the hit to the system resources are very low with this however if we were to try to do lots of database calls using glide record or web service calls to a third party during that transform this script will likely that script is likely going to be executing with every record coming into the system and that’s going to slow down your import and consume resources so if you are bringing in thousands of records through a database pull your transform script would be executing a thousand times and if your script would is inefficient or is interacting with other systems in a slow manner that’s going to slow down your import and consume valuable resources within the instance so the end the end statement there in best practice is to keep these scripts as concise and efficient as possible so that they can handle large amounts of data coming through them alright thanks John that’s fantastic I can see that the force is strong with you still so the the best practices that were showed today really focused on how you manage import sets coming into ServiceNow specifically that we prefer import sets over direct web services proper coalescing JDBC imports of utilize a mid server keeping your transform scripts simple and concise and avoiding multiple transform Maps per in per set additionally indexing on a coalesce field for potentially large target tables reduces any potential for performance impacts thank you for watching video 2 the import set best practices be sure to check out the next video in the series which is the inbound web services best practices or video number 3 in the series you

Uncategorized

Bear-proofing My Log Cabin

Hello friends! Three years ago I installed a bear-proof door on my log cabin that I’m building in the remote location of Karelia the door has a time-tested design with a couple of interesting design ideas that are worth mentioning here a few words for my first-time viewers I’ve been developing my log cabin camp in the northern forest away from people and roads since 2014 during our short summer we have White Nights so I could accomplish a lot even in the span of one month summer vacation here are a few examples of what I was able to build here in five years I made this primitive fire-carved log furniture using trees downed by a severe storm I built dam and a small pond with a step-down ladder at the local stream built a hybrid tent / bed from recycled materials such as this PET rope cut from plastic bottles I also made a storage dome for slabs and boards using sticks and stretch film in addition I cleaned the campsite from fallen trees which I recycled for my log cabin construction I decided to build my cabin without chopping down a single live tree using a round notch reinforced with a square peg also known as the saddle cope many of the downed trees were huge which is why my cabin was atypically built from extra-thick logs the roof was built from halved logs with a polymer underlayment installed under the sod to protect the roof members from moisture and UV light some of my other bushcraft projects didn’t get into this footage but I will leave the links below let’s get back to the door story while installing the bottom logs I drilled a mortise for the door’s bottom wooden hinge and the upper mortise was drilled a year later so now we need a door with two rounded build-in tenons that will act as primitive hinges to complete the set two years ago I cut narrow boards using a primitive sawmill jig made from two jointed boards and a couple of screws however these boards are too narrow for my door design I wanted to make it bear-proof using only two extra wide boards that milled and dried last summer I even bought a more powerful chainsaw with a longer guide bar to be able to milll extra wide slabs before milling slabs I resharpened an original crosscut chain to 10 degrees using my improvised vise made from a stump besides acquiring a better chainsaw for the task I made a sawmill similar to the Logosol or Alaskan Mill however I realized the jig will not work well for the boards required width as my Stihl MS260 couldn’t handle even smaller than needed logs that’s okay though as I know how to cut slabs freehand it is time to go to the place where I found a huge dried pine my traditional packframe made from pine planks a bird cherry branch and a rope cut from the bottle will help me to bring my tools food and video equipment to the site in one trip I highly recommend you to make or buy a packframe if you do similar activities in the woods I found this dried pine last summer not far from my camp the bottom section has started to rot but the rest of the trunk is still intact and dry it is quite dense to it is not difficult to cut a branchless tree trunk all you have to do is to cut out a deep wedge on one side and make a partial straight cut a little higher on the opposite side of the trunk then you simply hammer in a wedge into the straight cut this is one of the easiest and safest ways to cut down a tree however if the tree is asymmetrical in shape or has a lot of its branches on one side you will might need to use a different method I dropped this dried pine on a few skinny logs that I laid out in advance this way it will be easier to mill the door slabs later I have two doorways in my cabin so I need to cut four wide slabs for two doors I cut the slabs longer than needed as you can always cut them shorter naturally dried northern pine is called “Kelo” in

Finnish because it is considered to be valuable in natural circumstances very old pine trees usually die while standing and then slowly dry for years “Kelo” logs and slabs are fairly dense and don’t crack as they lose wood stresses during natural drying processes over the years while standing vertically the core of the “Kelo” is light while the sapwood is colored gray by fungus that lived there during the drying process the fungus has been dead in it for years and you can safely use this slab for construction I actually kind of like it’s golden-grayishglow when you apply oil on it you can tell that the “Kelo” board is very old and has seen a lot during its lifespan all of the “Kelo’s” sap has fully crystallized by now making the wood extra dense which means you have to resharpen your carpentry tools more frequently I have shown how to effectively cut a log into boards without any sawmill attachments in one of my previous videos I will leave a link to it below in case you wanted to see it in more detail for now I will just say this it is easier and faster to cut large slabs with the bottom tip of your chainsaw using swinging motions than when using mini sawmill attachments this free-handed slab cutting technique is three times faster and takes half the amount of gas those are the numerical facts I was able to measure okay the last slab is milled and I can now check their quality and geometry all four slabs passed my quality control and now it is time to take them back to the camp closer to my log cabin I put the glove under the slab to cushion my shoulder and started to make my way back while carrying the slab I couldn’t help to be glad that the “Kelo” wood is so dry and comparatively light also I was happy that this log was located right in the trail that I cleared a few years ago so I didn’t have to struggle through shrubs or a typical Karelian rocky terrain those soothing thoughts were giving me needed strength and again a few good words about my trusty packframe after feeling pretty tired from carrying the first slab I decided to carry the rest of them using my packframe’s upper arch as a back support this unusual carrying technique is a lot more ergonomic spreading the slab’s weight evenly on my arms and shoulders there was another log section from the same tree that I wanted to mill into slabs but I had to leave it to rot there because my Stihl MS 260 failed me the next day look at what happened to its cylinder-piston group Stihl service told me I was using a bad fuel mixture but they couldn’t explain to me why the same fuel mixture poured from one tank didn’t do any damage to my 20 year old Stihl MS 180 I wonder if the real reason is that the new MS 260 was made in Brazil while my trusty MS 180 was still made in the USA two decades ago there is no time to get upset though I need to make a temporary woodworking bench under the open sky I just milled a horizontal surface on a fallen tree and stretched a tent above it to get an all-weather temporary woodworking shop with a steady workbench while making my door with round tenon hinges I decided to cut out the round tenons first this way the large board is cut to its size and it is easier to shift it around on my workbench I took my time to the layout job because there was only one chance to do it right I don’t have spare dried boards of that size laying around this is when the carpenter’s mantra “Measure twice cut once” really came in handy I made a couple of precise cuts and carefully shaped the round tenon this slab got noticeably lighter there will be two tenons on the door so I shaped the second tenon in the same way the second board just needed to be cut to size to tightly fit the doorway that was installed into the

cabin two years ago this old inexpensive hand plane is still doing a decent job in smoothing the slab ideally a jointer plane would be better but it was too heavy to bring it here so I will have to work with what I have it is a pleasure to make shavings with a hand plane arguably it is the most classical woodworking activity out of them all now we can shape the square tenons using a handsaw a chisel and a hammer note the tenons are 52 millimeters (two inches) in diameter while this slab is 80 millimeters (three in a quarter) each tenon is cut asymmetrically to be flush with the inner side of the door such asymmetry is needed to maximize the door’s range of motion and to prevent air drafting okay both door tenons / hinges are cut and we can join both boards together my new two and a half inch wide (6 cm) chisel would have come in handy now but I still managed to do a decent job with a small chisel back then I cut the angled sides of the sliding dovetail joint using a regular hacksaw I know it is not an ideal saw to make such cuts but again I used what I had at my disposal it is raining but everything is dry in my woodworking shop thanks to the canopy stretched over and I can start to fit my sliding dovetail joint I rubbed a piece of charcoal on the plank and it will show where the plank is getting tight inside the sliding dovetail all you have to do is to use a sharp chisel to slightly remove the marked area and methodically repeat the procedure till you achieve the desired fit chiseling is a simple but tedious work that I kind of welcome in such times when it is raining outside and I can’t really do any other projects at the camp or in the woods of course it is not necessary to have such a perfect joint fit in the log cabin’s door but since I’m enjoying this process and I have time why not when I finished fitting the horizontal door rails I rounded over the door’s back edge the rounded edge will allow the door to open all the way note it is important not to remove too much of the door’s material as it could result in creating a gap between the door and the doorway in the closed position okay the rain finally stopped and I can take the hinged board to the log cabin and try it on I would say the slab lost almost half of its weight by feel through the whole process or perhaps I was too excited to start the door fitting I tried the board in the doorway many times each time making small adjustments to the wooden hinges but it was an enjoyable activity I’ve never seen doors with build-in wooden hinges before which is why even a partial success would have made me happy as a continues the door fitting process I grew to like its primitive and reliable design even anymore when you come to knock on such a door you could instantly tell that people who live in the house are hardly superficial in order to do the full installation of the door with built-in wooden hinges I had to lift a large sum of weight this included the doorway’s head along with the whole roof that is made of logs and halved logs plus the weight of the wet sod as you probably guessed I used two wedges to accomplish this task the wedges have to be strong so I made them from the oiliest branches that I could find I will be honest I enjoyed that fidgeting process and even thought to myself that perhaps the second door should be a double door like in a saloon but maybe I watched too many Westerns back in the day I owe you an explanation as to why I’m making two doors in my log cabin and why they open outside some will say they need to open inwards especially in this area where snow can block in the door

from outside why do you think I made them open outwards nevertheless? talk to me in the comments below I’d like to hear your version I probably tired you with my comments by now so I will let you watch the rest of the fitting process without them okay now we can assemble two slabs and two rails into the door to avoid gaps I decided to plane matching grooves and put a spline in between the boards I’ve never used a plough plane before but I like cutting long grooves with it a lot a plough plane is a very simple tool it’s literally a chisel-like cutter affixed in a jig at an angle plus an adjustable side rail if you leave the side rail in the same position and make two long grooves in both boards they will inevitably match like a mirror image okay now I will make short perpendicular splines to install them into the corresponding grooves at that stage I had some difficulties because my makeshift workbench was not specialized enough for the task and I was using a ratchet strap to hold down a short plank this is when I decided to make a shaving horse the next season here is a little preview how it was made the next summer if there is enough interest in my audience I will make a separate video about the shaving horse but let’s get back to the door project I finally have all of the necessary door members in place and it is time to put it all together if I had made any mistakes in my calculations I would have to make another door as the doorway is already installed but the careful measuring and layout paid off and this door turned out to be nearly perfect in its dimensions the assembly process is fairly straightforward and I will let you watch it while enjoying the sounds of pure bushcrafting I will only add that I reinforced the rails with dowels and made sure there is an adequate gap between the door and doorway to compensate for seasonal wood expansion and contraction this is Max Egorov from Saint Petersburg Russia if you liked this video perhaps you could share it with your friends let good people watch good videos p.s. I only produce

one or two videos max a month and if you don’t want to miss new content like this subscribe and click the notification bell to stay up to date with all of the latest content due to new youtube’s recommendation algorithm its notifications have become more erratic and unstable otherwise I hope to see back on Advoko MAKES

Uncategorized

SB 20150413 Predictive Analytics for Sales

people to sell better sell more effectively if you’re an enterprise solution we you know sell it to the 1.4 trillion u.s. market that people spend on budgets and sales teams within it opens to the facility context we apply a lot of data size machine learning so we know from a recruiting pitch you know we are hiring looking for you know people with expertise in big data I do large leader clusters applying scholar to solving interesting problems you know translating your algorithm finishing running from let’s say my lab or are or for some prototypes which says things like that into the big data context of enterprise change we’re looking for back-end engineers front-end engineers you I anything and everything that is involved in building a great business application so if you’re interested I have these flyers out in front please grab one as you exit and consider you know flurry as a next place when you develop your careers I just glad if you have any additional announcements or any other people have including announcements this will be a great time to you know grab the floor and describe what your company does and why don’t be a great place for future employees to join okay we have just here sure one recruiting announcement I’m here representing an inside vault which is a company developing a advanced SEM marketing platform we’re actually the first to deploy code on Scala all of our back-end has been installed upon almost almost three years Brian Watson is another one of our architects lead developers here you can talk to either of us about the opportunities were located over in redwood shores and the company has been expanding and growing quite a debt thank you thanks Jay any one ends before including announcements they have damn hook in the back I’m not recruiting I’m actually teaching a class on sunday hello I’m dan I’m teaching a class on sunday and hacker dojo I invite everyone this Sunday we will be wrestling with Scala before we had been teaching Python this sunday will be the first Sunday where we’re teaching Scala and we have an interesting problem that we’re wrestling with we were predicting the stock market so I invite you thanks for anyone in the residence going once going twice that’s it recording on Smith’s done so there’s a part of the agenda is a speaker for today I am honored and privileged to introduce the speaker for today dr. Lin Tang who is heading the data science at cladding dr tan previously was at walmart labs processing billions and billions of purchases through online walmart com where he really you know made significant improvements in the product click through rates on the email campaigns and demand jam on market market imagine now he said clari transforming the way selling happens in enterprises again by using data science machine learning dr. Kang has brought in a new way of looking at the whole process of selling now he’s going to touch on a few of the elements of machine learning data size and how Scala is used to implement these algorithms so without much further ado please welcome dr. time ok ok ok thanks of encode for the introduction to see this talk about

prediction antics for sales so a cure Clara is doing a very tiny job because you know what sales people do and we’re selling product to sales people that’s kind of one of the most challenging test ok so I’m reading the data science efforts here so right now in the Silicon Valley data excuse me right now in the Silicon Valley data is everywhere across different domains no just sales like marketing and sports they also using antics to do these analyses make agriculture so almost any domain when everybody’s heard about this big data data science most people would be nice let’s think about this so like once you have data you can figure out a certain way to convert the data into some value and that becomes your golden ography if you have data you have the value but in reality that’s what most eyes almost company look like when dealing with their data so they did stop somewhere in a hidden corner which bad like nobody knows going on nobody knows exactly what is going on there but it’s just there probably is a lot of data corruption some work flow issues but it’s just dumped there it’s a dump it’s not own value so that’s why and that is coming to play so what are not trying to do is basically fine to mine this Gator trying to mine the gold from the data so that’s why we call the next new era acne so when we talk about that it’s actually the different types on antics so here gives you a very brief overview of ethics so um well look at all those company claiming to do an annex most likely they are doing description ethics so what it trying to do is whisk a scanning all the historical data trying to see what is going on so it’s more like inside trying to tell you okay what happened then based on that you have aggregated view or some nice visualization with dashboard to see okay these are these are all the events have happened then on the next step is trying to do prediction antics so this is a based on what has happened you’re trying to figure out what will happen in the future and then this isn’t this is essential trying to give some insights to customers then at the same time you also want to do some Diagnostics information as well because a typical way you give some prediction most people would ask why do you predict like this why would this happen so that’s why let go when you’re trying to provide some prediction attics most of time I think you should provide some diagnostic matrix as well then beyond this our insights information the next step you want to do is try to provide some foresight say that your customer does not like your prediction you want to change the outcome then other question which basically you are trying to pre provide some prescriptions to customer saying that okay if you want to change the outcome these are actions you want to take so that’s essentially an attic submit actions so as you can see the anammox basically from these four different types of Maddox visit trying to convert data into a certain value so that is insane from what Clara tries tries to provide basically from descriptive atlantic’s too prescriptive kinetics but for Clara we are working on a one particular domain sales so in case you are not quite familiar with the sales process I just give a very brief introduction about this typical sales process and know what are the pains with a sales people so on a sales funnel typical work like this sue some marketing efforts your problem will get some leads or some prospects which are potentially interesting your product or service then from there you’re trying to identify which are likely to buy your product or service then visit you convert it into one opportunity then once you convert into opportunity then sales people will just jump in trying to contact it our contact this account and trying to negotiate about the price set up trials so that was salespeople basically trying to convert this opportunity into a custom so in the first step that’s what is it there many companies doing some live scoring kind of giving all this prospect trying to

rank which lead a cure is the most likely to convert into your customer but what we are trying to focus is sales management so it century is like what’s up once an opportunity is created we’re trying to see what kind of efforts we can do with what’s going to help we can provide to help these salespeople to convert into a customer to convert a one opportunity into a customer so when you look at this opportunity lifecycle I can especially in the b2b world this psycho is very long and there’s the same that on average the time to convert one opportunity into a customer takes nine months so this is a very different from what I have been doing at walmart labs or on the Yahoo basically like you for the consumer market you just have one Clady you admitted it get something with you if you can amazon just place one order like that but for b2b world there’s a lot of connector building up the relationship and trying to identify what your customer really need you go she ate about what are the requirements and then trying to set up finalize the price so in all these processes you can see here the log things going on but there’s one critical factor is basically time on the longer it takes the less chance you will close this opportunity will close this deal because there could be different factors like some it might be the case that at the beginning there’s some budget but often sometimes the budget has been eliminated or has been on hold where you identify certain champing which have been seeing a lot of good things about your product but they have moved to some other countries then you have you need to basically start the whole process so what we are trying to do is basically can we somehow provide some insights to tell the sales people to understand okay which opportunity they should put more empty should put more attention to so typically like for sales people don’t manager like hundreds opportunities then you want to keep track of each one by yourself you want to have certain way okay prioritize all these different tasks or prioritize all these different opportunities see which one you should pay more attention to like that okay so that’s what is there what Clara trying to provide is a century we have two components one is from an annex aspect is basically de exhaust it’s a mobile app for sales rep people so they can easily update cm records check status of the contact and connector you link to information and also get some related news about certain color comes so this is essentially because it’s so easy to use then is essentially you’re trying to engage all these sales rep to use your app then based on that we’re trying to collect all this related activity data into our server so we then based on data which we press on that we can provide some data science insert insights to managers so for sales managers what we provide is a deal proof Russian view is essential huma view of old opportunity has been owned by him and we’ll buy his team then you can see what is going on which deal has been pushed to the next color like that then based on that will also provide some an ethics dashboard basically like some visualization to see what is going on with all these deals so this is essentially some descriptive fanatics then the next thing we try to provide is a Clarice core which basically trying to assess how likely this deal can be closed as one so this is some information this those people are looking for and then from this risk analysis about these opportunities you can equate them to do the forecast so because this Clarence core actually provides a very important role in all this compare all this product so today tonight what I’m going to focus on with other what our efforts were trying to do to like compute this Claire score equity okay so he’ll give you a very like a give you a pic of what is Clarice for look like so essentially when you have a deal say like for this deal Mackers expansion then it has five days to close Thank You a basal with your clerical saying that okay Clarice arose registry so this score is basically trying to predict how likely what opportune what opportunity can be closed as one so better how likely this this you can win this opportunity then the score is range from 0 to 100 so if it’s zero that means this the dealers who is lost if it’s 100 that means it’s one but if it’s in between it’s basically just a probability as to to estimate how

likely this deal can be close as one okay then I’m in a declare a like this product based way you will also provide the details this is trying to provide some diagnostic information about this course so like for example in this example will basis show that this deal this opportunity is called a nidus wait because the closed state has been putting so the most like and then the forecast has changed from best-case to commit so all these factors actually give are some signals in that okay this deal is moving in the right direction it’s moving the right progress so um so basically this can kill all these sales managers yr do is at risk then they can take some decision will take some action about it okay so in order to come up with this Clarice call what we’re trying to do is try to formulate as a machine learning problem so what do we try to do is click all these historical give activities including mobile like some mobile activities including your email activity like what with an artist sales sales web has sent an email to the contact and also can in the meetings if they have a lot of meetings that’s probably a good sign and then another important social informations to see on records so right now mostly are from Salesforce then from there we basic get all kinds of events about about these sales rap associate with these opportunities then we basically convert into some labeled samples so identify all events associate with one opportunity then because we scan all the historical records we know that okay in the past few years which deal has been closed as one which gear has been closed as lost then you could construct some label information and whether it’s one lost then you can fit into some machine learning engine picturing our model then you get your prediction models take the prediction phase you just apply the model then to computer school okay so um but then when we’re trying to come up with this machine learning engine there are several challenges on the first one accurate the physical requirement is a biscuit any octave want to have a customized model so you say if we have 101,000 the customers then probably you need to build like 1000 different models at least one model for each customer but we want to automate the whole process we want to make this process scalable we don’t want to say you have to do some manual configuration to figure out which which views which event should be used but this one how do we scale this up and then another requirement is we want to H I model update and the development because typically in data science and particular natus the model you basically you try to keep your keeps trying to explore new ideas and then trying to see which one is working then you want to deploy the model as soon as you found this working so you want to automate this process as well the one we’re trying to do a base it will look at a couple options trying to see ok how can we set up this how can we set up this motion than the engine then what we end up is basically trying to set up this predictor and as a micro service the first thing we need to do is try to decide what are the language to use for development so on for clarity then there’s some because of most of our code base is in Java so that’s kind of a one concern we really have we have to consider and so we look at a Java but then on the issue is that Java isn’t really bad i had certain operations special collection operations and also it’s a very airy probe and for data scientists java properties is coming the least favorite language to use and it requires a lot of effort even just to do some simple task and then are we we try to see we try to see ok whether python whoa are they say these are the two most popular languages for dat assignment whether or not to a working um what do we say that because for my past experience like a yahoo or the warm elapsed people they have to separate these data science mag release data science from production so essentially you use Python or are to do some prototyping and see which one is working then trying to convert the model into production so of course using these two probably don’t because there are many existing in libraries available is pretty convenient for data scientists to do some quick experiments but the problem that deploying these models is essentially trying to rewrite everything

and this really requires a lot of efforts so at the end we decide okay we should not pursue this too then the remaining one we consider is basic scholar so scholar I think the first thing for us is that because gamma is compatible with Java so many of these existing functions api’s can be called directory so this is a connector for us it’s kind of a very interesting case is not pure scala it’s going to mix job on with scholar and then also because colin has many many features which is really convenient to use i think i guess most people here if you are interesting scholar you have worried it could a lot of things about the good thing that the bandages scholar and we are picking scala another reason is that because problem because it’s park because spotted it has a machine learning library and it’s written in scala so there’s also kind of putting another putting another vote into the choice then you want to set up the predictor nessam we also want to separate this model improvement or deployment from product release so what we try to do is basically we hide this ridiculous component into a REST API so that we can change or we can build a model any time we can change the model any time which would not affect the product release so in this way we can for the design is we can keep continuous trying to improve the model and just deploy the model director for other component of the product we everything just talked to this rest api um in terms of machine learning libraries as i mentioned like a spa guy has a machine learning library it is still pretty primitive I think it’s still pretty preliminary because it only covers a very basic ones like some k-means clustering some single cloud a simple classification a linear regression logistic Russian and most of these methods are implemented based on some gradient descent method so I mean you measure again you mention either you guys are trying to machine good good killin sales man’s behavior yeah yeah i think you know i said we have different opinion but i probably i’m not sure yet sir lightening up I think you know if I want to steal my service we are company mm-hmm I wouldn’t really care about I mean I would like to actually sell to your Co instead or to you we’re cool so of course they are the true side I’ll calling one is a salesman’s activity which are very hot you wanna machine gun now the other side if you look at linking how they do that actually clicking are not saying they’re the internal software but they have met in social network and they know how they can convince I mean killed their sales personnel field sales person to call or email so the AP mean better energy at the other side is you mentioning on traditional sales funnel in model in slope eye I took a deal with you and pray we look at that new mode of like Alibaba given AWS Amazon adrius enterprise market place and in social station of a pic when I saw an enterprise app I immediate click it and if I don’t like in three or four five days I immediately stopped myself is crucial so there we were very fast okay okay thanks for the point um so for the first one i think if i understand correctly he said that because we’re selling product / enterprise we should rather than focus on sales reps behavior we should have focus on more on the exact aerial managers right so that’s a that’s a one valid point so basically like because foot at the end for enterprise the decision makers are executives so we’re basically what are we trying to do is we’re trying to convince them that there’s a lot of value from clary product so one way to show that is that okay if your sales if your team sales rep everybody likes this product and fund is valuable then that then that’s one way to convince this manager or director will be P to you to buy your product okay and another thing I want to emphasize that this particular matrix is basically trying to provide insights or values to managers and executive for example AG forecast so instance you’re trying to basically provide insight for them so this is that’s why I like at the beginning I mentioned about these particular net is mainly focus on sales managers so this is one way basically showing the value that we can help salespeople to close deals faster and a more revenue then visit we’re trying to convince this executive buy our product okay for another for the second question I think there might be some new ways too busy

trying to accelerate the sales process but what we are seeing that it’s a new trend and but for many especially media to big-sized come enterprise typically this sales process as I mention it still takes a long process and for any wedding sales people working on that particular market is still a pain trying to manage so many opportunities so I think both are valid so but basically it depends on what kind of marketer you are focused on okay so on any other questions okay okay let’s come back to this motion in the library’s arm so that two other choice available as well besides park so that one is called weaker accuracy are we casting existing like for long while and this it covers almost I I guess it’s probably most comprehensive in the library retain retaining Java so um but the issue that because mostly on a single machine of course there are some efforts trying to basically pull this week a library into how to map reduce but still I think the right now I still a lot of pain trying to convert in soon into some parallel or distributed environment another one actually motion any library probably if you are working doing MapReduce you probably heard about this mohawk so um I think at the beginning the implicit just solely focus on machine learning under the heart of MapReduce environment and a lot of activities going on even though I know that certain algorithm would not implement cracker is so there’s some it really depends on what kind of algorithm you are using and the documentation can be numb to it the documentation is also a mess I haven’t decided so I like in the past I have been trying to use my heart but never succeed i just ate it takes too much time to understand what is going on there so this why right now spark comes up and actually why do you say that my heart they’re trying to change all their dividable efforts to shift the farm how to be MapReduce into spark platform so that’s also a sign so considering that that’s why we decided to choose spark so sparked a up there one several advantages one advantage has some machinery library and the second advantage is because is written in Scala many of these scholars collection operations you can just apply the recce into the spark environment and the such thing i really like accurate i really like is that spark an easy run in a local mode or you know one notes you know one like a one node cluster because for many of these data science efforts you don’t need to a hugely adoni to run your algorithm say like a one pair of eyes a bit and waiting for like i say one hour two hours to get the result most time your Papa you just get reasonable size data set and run some quick friend to see what is going on and then this you if you write your code in spark then you can run all this quick expander into it probably in your local mode like in on a laptop there once unit need to deploy into the production you busy you just change your spark configuration to their cluster rather than just your local laptop so this basically you don’t need to worry about what this spot cluster is it just focus on okay trying to make sure your code is working in spark environment so I really like this feature so that that’s why at the end which we use a lot of spark trying to develop this particular and that is features okay so come back to predict her own ethics let’s look at what we try to do so here next well we’re trying to walk through a couple efforts we have been we have been done you ought to address this a predictor at an end its problem in sales domain so for data data the first question is what kind of data are we handling so data we had is accurate mostly see I’m a historic O’s so basically all these records are in the form of events so basic each event you have some field basically saying that which field has been changed and then so this field could be staged could be the sales people spoke as close to a few size or its come is this is in the end up competitor like that and then the value is very heterogeneous it could be a boolean value numero para new category value or just some free form text like sometimes people just take some notes so all kinds of values and then you have some updated time and update owner so given that what we say is actually all

the data is as a probably you working on any domain it is never perfect you’ll always find some issue with data so the first thing we found that is sequential events because we see that the event type accurate highly heterogeneous as I mentioned their category values the numeral values their free form text so you have to do a lot of transformation trying to make sure you want to feed the data into your machine learning engine you have to do a lot of normalization to make sure it derived the right type of value from the raw data and another thing we notice that in events can be highly correlated using in all the asylum records they could define a formula basis they just say this field chained to this value plus this field chained to this value then I update my forecast to something like that so sometimes it could be just almost like identical updates it’s just in terms of different different different names so this actually can be a one we try to some like some type of linear regression to this this actor gives a lot of trouble I’m trying to pop up what are the right factors to show to the user and another thing about the debate equality because most of these existing cell records require human beings to do the update to require some manual entry you know that everybody is lately why do I want to like to type all this information so then it’s no wonder that we see accurate almost across all the organizations there are there are significant percentage of opportunities which is just created and closed within one day so it’s just created a necropsy so as you can see down you wanted to you wanted to devise some useful information from the data you actually need to do some data cleaning as well and so on one-nothing we also saw the bodies because most people probably out and they are reluctant to to update so if somehow a few is missing should we consider it’s really not happening or is just missing how do we differentiate that how do we can see this is factored into our models so all these posts are challenges so when we’re trying to address this our problem the fourth model which thought about is trying to mock for me as a Markov chain so in my cup chain basin you’re trying to assume that your event your current event is only depending on the most recent event so suppose you have an event sequence like is from you one too young you visit your own you can factorize like this this conditional probability the probability of you want of the first event happening times the probability of the second event given the first event so busy in this way you can just compute the probability of the event sequence and then you ought to conclude this transition probabilities it’s actually pretty straightforward you just scan your data do the counting and the computer conditional probability so they said it’s just one scan of the data then you you get the model so here just choose one example on of the transition problem James say like for any opportunities created there with 20 percentage chance in the world move to a stage saying customize then when they are the stage customized with 28 plus twenty eight percent they will move directly to like one basically winning the deal with thirty percent move to Los skills so you want to predict like say for active you how likely they’re going to be close door close as one essentially we just consider this doing a random walk stop from your current value to random walk to see whether or not you reach the one lost and now what’s the probability you reach one of the nose so what am I trying to do is a century okay yeah and okay there’s another thing we need to consider that events happening at the same time so because we noted that many wraps for the update four or five different views at the same time then you have to consider okay which even one way to consider should this event happen before the other one or should it just consider simultaneous so that end which is more this event sequence as a current flow to say like you have the first event they have time timestamp t2 you’re busy h3 event happening there had timestamps three you have two events happening at the end you close this dear as one and at the same time there’s some other events happening so you can see that you can see you say there’s some kind of flow from this beginning just try and slow go forwards and see whether or not you can reach this out distinct note 1 so you want to compute the probability busy you just consider

one or loss as a single dose and it from your current status you do some random walk you can do and in trying to see how likely you are going to reach this node so say like with a team with like say you the random walk once on x 800 time you reach the node 1 200 times you reach the note lost their basic you know that you have eighty percent chance of winning the deal so that realistic is it resumed the market moment works as a mathematical model for our sales process because the market models you have to constrain them the current state depends only on the previous state and it doesn’t tell me what happened before but it is that realistic that’s a very good point and that’s accurate what I’m going to talk in next because um this microchip model is very simple because it memories it only depends on your current status but the way a thing about this event sequence accurate of history matters typically like say if you have some clothes days push out that tends to be an exercise if you have some reverse change like from a later stage to early stage that’s also an exercise so we have to consider the pistol information not just the current value so what we try to do is we try to bring a posterior probability so what do we what are we trying to compute the probability is based on the base theory so we’re trying to see okay you will have some prior probability of winning or lost then you also have this from the historic potater you try to compute the probability of seeing this sequence of seeing these zippers given the outcome is one so essentially you just separate one deals those opportunity which closed as one from those which closes lost and then trying to build a different model each then you can apply this formula to see ok what’s the prosecutor probability for this deal to be close as one so in this way you are essentially just basic computing to computing the probability like given the supposed deal outcome is one how likely you are going to see such an event sequence so this will basically tells you how likely this can be closed as one okay and then there’s still some problems with this market model one thing we noted that actually the same field dependencies pretty hard basically like say if you move the stage from different stage you want to model the the same dependency so then you can easily change this Markov chain model say into a model each field as simpler marketing the you end up with came up train but on the other hand you lose the cross field dependency so let’s say if you change your stage but somehow you’re CRO your forecast has to be changed as well that you didn’t capture this Crossfield dependency so that’s why we have to move on to a different model it’s called Markov process with states and so here to give you one toy example of this marker marker process which stays so say you have so here this s 1 to s T basically showing the state and why hearing the observations so the Y observations just depending on the state then you can factorize event sequence like this they say as this pistol as the transition probability between states and the conditional probability of observation given the state so then as you can see even though the formula looks a little bit that looks a little bit like a complicated but accurate in training process it’s essentially just requires computing this transition probability and this are conditional probability so if you have familiar with this a speech recognition or natural language processing you probably know that one classical example of this mock-up process which states is visited hidden Markov model so in a hidden Markov model based on the state a heater you’re trying to infer what are the hidden states but that one has been used very successfully in all these like sequence data like a speech recognition trying to understand language natural languages but for us I think we have advantages that we’re trying to switch to a different called the visible Malcolm model so it’s essential we or soup will please i’ll assume that the states are visible whatever years basically use opportunities stage as a state so in that way we don’t require extensive like it extensively em algorithm trying to figure out what the hidden state as in hidden mahkumat then we use opportunity stages state than all other fields is

just consider as observations it’s solely depends on the on this stage so this model a cure is very efficient because thinning is basically just counting and computing the conditional probability as mentioned here because you know that what other state what are the observations you essentially just count and compute these probabilities ok so far so good the way trying to comprehend like overcome this figure out what are the risk social with one opportunity there are many other factors you can consider as well so you can consider the time staying and particular state status the past wings if we somehow sales rep is tends to tends to weigh in the past then is more likely to win for this one particular opportunity as well if this account is a connect either winner then if you are trying to redo the business with them is you’re very likely to be a strong and then the number c IM update is the recent updates and also there’s some periodical patterns like say at the code at the end of the corner people tend to close the deals and update the same records at the end of the year that’s also like a lot of updates going on and also we wish to bring the email and a canada activities and also the content information into the as well so there’s so many different factors consider we cannot hop code this one by one will definitely the more scalable approach so that’s why we switch to one one one method whether trying to do this automatic official earning so this is omar architecture so our data is stored in some no secure database then there’s some Dana’s logo busy trying to load the data once we load this audit opportunity dater event information then the first thing we’re trying to do is try to join the label information so as you see actually this label information can be customized and depending on your tasks then combined in this label information and this data you can define all kinds of features they say in this feat generator you just define all kinds of features and then once the feature generator you basically send to our in terms indexer is that you’re trying to convert all these features into some index numbers because if you apply any kind of like a motion in the library to assume that your papers of where formatted and basically then so the in practice it basically you have to find raw data and then feed feed goes to all these transformations then become your training data so once you have the training data then you can just feed it into spot motion on your library build the model and then and then save the model into the mystic you library so that’s basically the overall architecture what I’m going to talk about more accurate is not about the model anymore but about this feature generation process because that’s the most time-consuming part for almost all data science efforts ok so I’m sorry oh we’re basically we combine this feature index information and then do some transformation to save as a JSON object in the database yes yeah yeah for prediction we’re not using Spock because I know so that’s why later I will talk about it we need to do some transformation by the model to some conversions so yeah butch kind of knows i will use a manga labs Cassandra so there’s a any other questions okay um so feature engineering I know that the many data scientists like to talk about models but I want to switch ask people to emphasize more pay more attention to feature engineering because like most time if you look at all these motion in the libraries they are weird event model there you don’t need much effort to like rewrite the model we write some algorithm unless some specific task so busy here thanks to spiders are some machine in the library there are kind of some standard machine learning model there so we don’t even worry about that but in practice the features are really a time-consuming effort most time you probably kind of come up with new features reasonable features and trying to determine with these features are relevant to a mod to the model so that’s why we have this feature generator is essentially defined as an abstract class so essential once you fit fit the data

into the into this class it will yield a collection of instance ID and the feature and the value so this spacer we’re trying to convert all the future value into some numeric value so that is easy to do the processing means so whatever you have some new feature you basically just define walk-ons whatever transformation you visit by extending this abstract class and then each feature generated each each feature generator is based on map to our DD in spark with oddity is a lazy evaluation so essentially you can apply a sequence of transformation there so it’s pretty convenient and then because most time you visit your field generator is defined just over the same actor or different collection data set so this can be highly paralyzed about so you basically apply all kinds of future generators the same data set by exploring by using the capability of spark but keeping the data in memory so that’s a really nice one then um then the next one is you want to do some feature selector okay so essentially you’re trying to get the instance ID because you want to group by the incidence ID to get all the features belong to one instance okay then the feature it could be different you could define different types so on then it could be like a duration at a certain stage or send out the emails something like that which are instance instance is basically getting take 1 theta instant visit one record yeah yeah it’s one opportunity yeah because you want to be saline you it emits this trip like this then you you are allowed to all kinds of aggregations like goodbye by instanceid for not identify water features yeah how do you make your system scalable both from Arizona Eva’s of the software as well on live okay from the label software activity as well as the holiday ok so to make it a scalable I think for AG reason level I think that’s what they assigned to do you have to pick the right algorithm you don’t want to do some exhaustive search or whatever I think that’s a that’s like for most data scientists I think they are able to tell which are the list of what are the time complexity to be improvement to visit to run the algorithm and then the next thing is it’s basically trying to keep the software scalable that’s why we’re basically trying to have this architecture based have some very generate class and full base alike here for all these different components like future indexer it can be standardized and then label generator and fitch generator you can basically justifying a visit different abstract class that you can just extend this class so in this way you can easily just focus on okay what are the features you want to add what are the labels you want to train to predict so in this way you don’t need to worry about the other component once you set this up and for the hardware scalability I think that’s one thing i like about spark is basically you keep spark configuration separately from all the whole process so that you can run spark invasive something like sometimes use up to debugging you can run a loop on a laptop but then if your business that is really huge you can just change the configuration to be a spark roster so that’s basically we’re trying to separate these different components so that you can in the like I say you have more you have very large data you want to convert in the cluster then you can easily do that so right now it really depends on the organization like I say for large enterprise they might have billions of events but for medium size it probably have just millions of events so it’s really actually using sperry it’s not necessary for just model training because there are many tasks you can paralyze as I said for future generation and also for like a parameter tuning you want to try different models and also all this can be paralyzed so far we still fight kind of flat we didn’t consider the hierarchical way so I understand that iterative process cool right can’t till the iteration just just to make sure the iteration is that when you when you kind of modified feature

generated class and then run the whole thing and then so human or data scientists modify this process and then you run the pole politician again proceed see if that’s the result right yeah so basically actually saving the model into the database you have to some criteria and then this iteration is done for all customers or for particular customer I would have customized like feature classes i’m going to say most time is you define a family of features ok so the i just want to repeat the question here i think the question is like in this one will add the features in this iterative process whether or not we are applying to one particular customer or just all customers and so ok so in this case actually we’re what we try to do is invoke trying to define a family of featured future generators say like southern fuse is this value for a curation at this duration of sort of field at adma at this the original is such a certain stage or the email sent out at the statistic so you basically apply the similar almost the same similar similar feature generators to almost all like a different organizations you we try to avoid it like a specific configuration for one particular customer it that way you need to won’t be scalable oh so in the future generator right now we recently we basically consider there are some other transformations say like you do normalization you do some transformation or scales with all defining the into the field generator because you can define a sequence of operations into the same rdd so in that way you don’t fit yeah okay I think the question is like because different organ different customers have very different set of features or different views how can we come up with reasonable features advanced is that the question um so far we haven’t do that because a way look at all these factors it’s a century quite generic it kind of get it’s quite generic basically time staying at one particular status so basically you can look at all the fuse and trying to see how long like a tubular how long this opportunity at that particular field value and then this one is pretty straightforward and rhesus the updates and this one actually require some quarter information like we because different organizations define quarter differently so you have to do some specification about that an email and Canada activities that we’re trying to associate with the stage of the absorber counties so in that way you can also basically standardized across different organizations golden results will come here against and what is your right now oh yeah so later I will talk about the how do we evaluate the models yeah okay let’s just move forward so that’s for the future generation actually then the next one is we we want to do is feature selection based on once they’ve general all these relevant features and for prediction phase you want to do some selection because certain features actually takes a long time to generate so you really want to identify those relevant features so what we do is that we basically when we train the model will place a typical pre for some spots models like with l1

regularization so you’re basically you just identify which features are relevant then for training into training phase you try all possible features but the in prediction you own in general these features as a necessary so that you it will save time for prediction then this okay okay so so basically in this case then in when we say when we sterilize the spot models into the database you have to do some transformation about it because they keeping the role model basically does not pitch it does not satisfy our requirement so basically we identify which features are relevant and then combined with those feature generators and saving into the model so in the prediction phase you just pick out those development features to use okay so here so that’s basically general a general process for future generation in the selection then the next one is basically we want do some model comparison to figure out which model which feature sets are more reasonable to use right so um then this is how we do the evaluation so basically saying like right now today’s April so things then we try to go back sometimes they go one months back and at that point we use what data available until this point to train the model then we are trying to predict the outcome for all those active views at that particular time then we will reset your weight for one month to see okay what I have active calm compared with our predictions so essentially you busy go back one month and kinda pretty good to counter it out and trying to see whether or not your prediction match with a counter result so in this way we are able to evaluate different models or different feature sets and so you wanted to compare the different models wing because this is the center of classification problem then you have different metrics like accuracy f-measure but this is highly depending on one threshold what do you matters more because a web if I we are trying to predict the probability so what a madam or is that like rather ranking of these opportunities so that’s why we use that’s why like this metric I’m not usable for model comparison what do we want to use the use of obviously curve or a UC or some publicity metra like publicly so I want to go into details but this as you can see this is just one example shown them okay based on the model the AUC is like point 88 so then you can based on this number to do to do model selection and the fear just use one example one customer like they say what other score disposition look like as you can see in the range from 0 to 100 for all these open gills but if you if you notice that actually they are essential to spikes here when we try to use like an oldest existing emotional in libraries like svm or logistic question you see there’s two spikes there the problem with this is that many classifiers are where optimized for binary classification say positive separate positive from negative they are not tuned for probability estimation so that’s why you are seeing such as spike here so basically 12 was the low end the other one towards the higher end so that’s why it requires us to add a calibration process so essentially trying to calibrate the role prediction school into the prop into a proverb it so what way will I always trying to use this isotonic regression this is essentially like this here you’re trying to minimize here so basically the problem formula like this you’re trying to keep the order to ranking order of all these opportunities base of the raw score but you want to minimize the probability from the label so the label can be 0 1 depending on the outcome so if the outcome is 1 then your label is one if your outcome is 0 if outcome is like lost then this label is just 0 so essentially you keep the relative order of whether all of all these opportunities but the one you want to fund out figure out what’s the true probability which are closest to the closest to a true label true outcome so that’s its aim is a basic idea so you can use an isotonic real question to do that and you can even in force for the smoothing basically trying to consider like the state so at this particular stage is the average one probability say is qualify then you want to change the label to consider both the outcomes and

average probability say like if for one gives at this particular stage you map to poll certify but if velocity owes you map to point 25 so by using this calibration actually to help smooth the distribution log as you see as you see right now the distribution is concentrated more toward this uncertain area close to 50 and if you look at the score distributions with spec stages so here we have nice stages and basically showing the progress basically like from the beginning to the end as you can see the scores as you see the range is kind of getting higher and higher and here at the end the score actually is pretty from 218 so sometimes like that so in this sense actually once we change once we add this calibration the customer pretty happy about school because it reflects it to reflect the stage progress um again cocl records ok the question is one other stages so typically in CRM records each organization wants tea set up the CM database they define certain stages so if they do not define certain stage probably like Salesforce will just give you some default stages how many labels you head um the question is how many labels do we have so right now because we are only consider the outcome as either won or lost so is it such a binary classification problem what is the issue behind your theoretical bottom looks like quite an active but anyway we are question is do you have any sort of rule to split the data into a table data and a test date or you don’t use any level okay the first question is in the first question is what are the intuition behind the model so because right now we’re basically we are kind of doing a future generation gender all kinds of features and then just let machine learning to decide which features are more important I think that’s that you know to go in like a sail faster say more yeah and so in way expensive work that’s my so I think as I as I emphasize at the beginning the motivation that one a sales rep managed like hundreds opportunities you want to know which opportunities at risk so this Claire score is busy just given one numbered tell you okay you can you can solve all these deals based on this score and see which gives added have a low score right thing that you you know that okay should you pay more attention to do that what should you follow up with your contact to do something so in this way you don’t need to basically check record one by one like I say what’s the stage what’s my last update with this opportunity when did my contacts send me the message so this is more like a summarized view or all the opportunities status you can just rank on this opportunity to pay more attention to and the second question is how is any room for us to set up the training and test so actually right now the training process is basically like this as we say we just choose one time point and then we go back for up to two years collect all the possible historical events and activities to train the model and then for Molly if we just want to train the model basically you just like say from today you go back for up to two years in the model so essentially we don’t have a specific rule triangle vism we want to keep the model just up to date do you predict both winner loss and also the score right yes if so how granular is the score on the question is like how granular it to score is so as I mentioned right now we’re trying to figure out the probability of closings one the granular the granularity is a century from you can say because it’s a it’s accurate probability school but the worm will show to the customer we just show like say from 0 to 100 so that’s okay so I i get if i understand correctly you

are trying to see why we are waiting for one months to get outcome okay so then that’s essentially what do we did basically like to say this is today we go back for one month then collect all the information before that one months ago to train the model then trying to predict at this particular time and then we visit check the outcome today with our prediction so in that way we don’t need to wait so it’s essentially you are taking a time machine just travel back one month earlier and do the model there then trying to do the prediction compared with today’s right now it’s the false positive rate do you know any tune of approach Salesforce all courses you know bigger brother they are walking up on this a sales position um I don’t have a good idea about that because I they might have something working on data science but I don’t have a pretty good idea your magicks to identify independent features okay the question is whether or not we conduct any principal component analysis to identify important features and relevant features so on the reason actually we thought about that but we didn’t try that the reason that typically this PCA or SVG like approach you’re trying to identify a latent dimensionality a latent feature which is possible linear combination of different features but what we want to do as I show at the beginning we’re sharing a score we want to show the random factors if we just compute the latent factor then it’s difficult to explain to customers why we’re scoring like that so that’s why then we didn’t push into the auction just with your features how many features actually how old is your vector features we’re having to try the neural networks again so that’s I know that right now deep learning is coming a very hot that possibly can be used here as well but we have an important more attention to that what good is an order of you or you know feature vector how many you can treat conjured soon I so so question is about dimensionality of features so it’s really depend on your future generators so you can keep adding features all the time or you can you can just do certain transformation for example there are many new moco attributes such as deal size but accurate you don’t want to use the gear selector cream but you want to discretize likes a small deal medium size or large size so in that way you busy you you’re converting into categorical attributes then one attribute I could convert in the suite you can convert this attribute industry binary trees if you wanted to classification so the the number of features could range from hundreds to thousands so the fact that you actually look at to your historical records is that a problem for sale okay so the question his questions are about because typical sales process have this seasonal pattern so if we just go back for two years for training whether this would cause any issue so that’s a very reasonable valid point and so we’re trying to tackle this seasonal pattern but it’s so far we haven’t put much effort we’re thinking about trying to assign a larger instance way to recent records but they’re still somewhat coming on but you can include it as a teacher you know months of the year or something yes yes that’s another way to it again on so typically right now we just returned the model or companies because we we just want to keep the model update up to date yeah I’m not quite sure get your oh so I’m

the question is whether whether we are applying some global features or general features across different organization which is specific features for each organization so that once the prob I didn’t mention market that we are building a model for each customer so on so essentially you can claim that the features actually identify the relevant to the outcome is just semester for that organization but the future generation process is more general across different organizations okay how do you handle your model do you keep walking keep defining your model or you will make you like a setup bottom with we on your model library of gold that’s awesome the other the other side as I no question it would have any sort of integration with the marketing existing back in something or you want to walk on y’all um okay so the first question is a hard way keep our models as you see here because you haven’t told you I asked you the question what is that gonna matter because it’s harder to define or Paul specifier cold markets is very hard what if you’re golden modem is the best to sales results or whatever because I had us happy so how do you compare against the Golden Ball it cannot exist my work okay um I think the first question still kind of figure out what are the best model okay and so that’s actually that’s actually a pretty tough question because in many business applications you don’t have a good sense of what stupider things to optimize so here for us where I could we have we know that we want to reduce the sales cycle likes it and trying to improve the close rate for salespeople but it cannot be turned directly into the into into your motion any model that’s why right now we are essentially using a proxy metric they say using this somehow like a mad accuracy like a pretty predicted outcome to do this model comparison this is not perfect but that’s what that you have to live with so you want to convince buyers actually once they run some try study we basically show that after this their sales people using our product they’re closed read what is closer it before and after like what’s a life average lifetime to close a deal before and after because for for most sales exactly they don’t care about this number this more for data sign is trying to figure out what would the model to use for them is they’re just showing the value that’s most important okay oh you know dressing the second question is whether or not we are interesting extending to marketing arm deaths I would live in to our CEO that’s probably worth considered that sometime in the future okay okay so there’s a one loss apiece I want to talk about the advantage of our current architectures can’t do segmentation because right now we are building one likes a we are building one model for each organization on typically for this medium to large-sized enterprise they might have different sales teams given sales teams might showing different patterns so that actually requires us to build kinda you want to build like say for this region you think probably like it sells faster but for different regions it says the text it takes much longer time to sell so essentially this requires us to second and all this opportunity in two different types and then build a model for each segment so this is essentially you are building like under one organization you you’re trying to segment all these opportunity into different segments then you build model for each segment so what do we do actually is rather than providing up one model you’re trying we’re trying to

provide a mixture models on the first thing we need to do is figure out the segment because you don’t know which should be the right segment there might be say different sales teams so you can put one way is that you can put a sales team as an indicator into as a feature that’s one way and so actually I’m going to talk about shortly so each segment would include a set of companies with some common features is that um it could be so so if you just try to make your models more precise right yes so you can think about this like if you look at the whole organization as one as the one piece you just build one model for it then you can zoom zooming at certain cluster or sort of segments and try to be more precise model for that particular segment and you can zoom in to the extreme case it just one opportunity it’s just one segment so you have to find figure out the good balance between a watch through the segment size because if you zoom in to just won a few data points your mother will be able to generalize and if you just look at look at old up take all opportunities as one piece that your model bases and your limited capability to model what is going on okay so um so what we try to do is to some attachment of the adaptive segmentation so that two approaches I just talked about this briefly as we’re running short of time so one way is you are trying to do some clustering so if you know crossing then public he means clustering that’s the most tender one so you can modify chemists tendulkar string and busy you’re trying to change the distance metric to be the model fit basically like say each segment you build a model then for each data instance you’re trying to apply to all the models and figure out which segments keep belongs to so in this process you can go doing automatic segmentation the model building but but this is a very time-consuming because it’s an iterative process so that’s why we’re is we resort to a different one it’s doing some supervised learning for segmentation so you’re basically you try you try to come up with some more like a meta features to figure out the segment then you can just apply some decision tree classify the figure out okay how to separate how to separate these opportunities in two different into different region into different segments so you can essentially apply the same architecture here to do the same to tutor segmentation but here the label is still the label basically that to the two iteration the first iteration is like use your outcome as a label and use some meta features to construct this entry then you apply the decision tree to generate a new label no I’m sorry let’s say you then you apply your decision tree to separate all this data into different segments then for each for each segment you you return the mode so basically you don’t require iterative process just one just two iterations okay so that’s pretty much for all the efforts of some kind of efforts we have done in predict you’ll notice so we basically talked about particularly nasty details this has a huge potential and at the same time it’s a lot of Chinese with the data so we talked about Malcolm models we talked about the classification how do we scale up this feature engineering how do we do the calibration and then find into the segmentation computer to view the model and right now in production actually we can enable this particular metric select model training and prediction just one click it’s just one feature flag you enable that then it was star the whole process building the model and the figure out what are the relevant features for this particular organization and then a typical completed within just a couple of hours and you will also deliver some insight report basically chance trying to turn a sound like say typical how long does it take to close one deal like some metrics which people like as well without average data science core for your team like that and here i just want to share some lessons i learned through the whole process and my past experience in data science as well there’s one thing I want to raise up is when people talk about

scalable machine learning some people I guess most people probably trying to impress efficient training some people probably care more about prediction time but accurate is one important piece which is missing is the development time if you consider this sign is like a developer’s engineers if you need requires a lot of time to just come up with one model it does not scale you have to have some stable way to in order to develop models fast so I think for motion nanny or data science this should be paid more this should be we should pay more attention to this and also from experience future engineering is very important because if you apply this standard classifiers SVM logistic regression linear regression yeah there might be some difference but mostly if your features are similar are the same they won’t be show much difference and but if you’re adding new features that could make a big change and the last one is you should take the right tool and framework to build them to build your product as I think that’s why I really appreciate our scholars park here because it really has a lot of convenient features which enable us to do this fast deployment and prototyping at the same time did you find Skylar with smart to be better than quite a little fork um compared with I don’t have much it’s okay so the question is why some for spark whether i found scala for spark is better or worse than Python for Spock so I don’t have much experience with Python for Spock but my basically here the public could be run my basically something because spark the native language in Scala I I would assume they should have better support for scholar and just for motion many libraries most of those algorithm will be first available in Scala version then they will consider poured into Java or Python questions when working with spark did you run into any short comic shortcomings versus working with Hadoop um I think okay compare Hadoop with spot with it any shortcoming about Spock so I think the first shot is for many people Scala is could be a challenge especially they have no experience with Scala that could be a little bit of scary another thing is our memory management actuaries some time can be tricky if you have large cluster and how to keep your memory management efficient that can be tricky as well because it is no free lunch it’s not like just that you load the data and everything runs smoothly what is a what is the Maximus maximum size of your spark cluster so far and are they keep on running so that you spoil spoil your data in a data link away I’m not sure you shut down all your unit is Pablo instance how do you handle your spa customer in some box inside on us so far okay so the spot cluster is running basically it depends like because right now we are many just using spark blaster for training purpose so basically you can just launch it spark roster if necessary because we in prediction phase we didn’t your spark cluster and it’s so far because we have certain number organizations but we don’t need we don’t need like some we don’t need to process the data set like yahoo or google requires thousands of those so right now it really depends on the how many customers you use actually to install the data from day one and you never drop anything keep on keep on going no okay so it’s a white cluster have to be like commodity clustering so that you don’t the same one else okay that’s a long way to store your data so I but I I I think that’s one way to do but I I don’t want to comment different algorithms with in MLM as provided or have you developed your own algorithm is on here we actually didn’t use some other agree we did implement some other algorithm like the for the mock-up model because it’s coming straight forward so just we implement by ourselves and you see if you check machine any library that there’s no there’s no agrees i’m available for model sequential data so

so either it depends exactly so in segmentation if we’re using clustering you have to define we have a visit provide the number say like how many segments do you want some company yeah so you can you can depending on the company size you can specify that so that’s just one parameter so we we didn’t pay particular attention kinda like they say they say you you probably want to keep in a reasonable number segments otherwise you don’t want to say like split the company the company data into 100 segments then you end up with y different models because there’s another thing you need considers in prediction phase you would need to keep all the model in memory then our if the more models you have the motions you do your record so you have to make a balance like but we just with reasonable size with the both sides are like a four or five different segments and then most times that should be enough yeah I was so so fitting into this question did you try to experiment this kind of the clustering to actually see how when your class this year would find and send right get exam you know round came in so you know the numbers that cannot be clustered um so the question is whether I run canopy yes clustering before k-means clustering right I know that I’m ah my heart has that but okay we i we didn’t try but you oh I just was wondering if you could experience visit so do you serve your prediction in real time the question is whether or not we serve the prediction in return of course we serve in real time no what I mean is like rather let’s assume immortal has been trained in your case then how long does it take to propagate to the module which does the survey is it within an hour half a day okay so as I mentioned that our architecture is basis through this rest api so essential once your model is built you can just force the you can visit just force a prediction server to update model instantly what is memory about because it is basically just just low the model into memory idiot for that if they’re like millions of records it can be because when we are doing the prediction is specially okay it is model updating is just the instant because once models built you can emit an updated model prediction it depends on how many data you have so that’s a separate holika no real time you how do you get your new time because real time training real-time predictions of the Holy huh and also you mentioned EK the rest go yeah i mean a cas real time protection should be streaming they need a key so that’s streaming yeah now the rest of our API mo yes so just to clarify the orderly rating training and reading the border is done offline then we knew that the question was how soon as I’m on propagate to the portion the module as its elizabeth is Lola Buzzard cake so is the theme of one so he’s hopping a correct model to the URL good point so the ultimate in real time is like users hitting the website inactivation attribute something happen and that’s immediately showing up in your prediction mode engine starts the ultimate in yeah we don’t update the model as every new data point you a microsecond you do Maketa now because you know when you take a look at the long-term big picture we’re looking at some years worth of data so in that context yeah it really depends on the commercial like for example a yahoo when you were working on this fun page optimization you you want to keep up taking the model continuously but it really depends on the domain into the in the b2b like a so it tippy you won’t see like a tons of activities within like it within few minutes you really have to make a decision depending on application you don’t want like I build a real time

trend trainer but you found out like it’s just a waste of time to do that for your application things not was work distant correctly isn’t actually every rule data coin oh you know you know training set would come for manual update coming from a club from a customer he has actually to make this entry for you to have this additional entry to to edit to your train set right okay so the question is whether or not it it requires manual entry to add a new paper into the cranial date I’m a part of your customer from his experience he has to get simulator into it into something from where you take it candy is it the closest um so it can be separated so for features like if you have some features derived from the CRM records which right now Salesforce is a dominant one which basically most most of time it requires human beings for manual update or change the value or to the update so that’s why we clearly trying to develop a mobile app too acidic this process basically just to make the update so easy if you have tried to Salesforce you know what I’m talking about so it’s probably take you just 10 minutes just to once and you’re still kind of figure out where to locate the field to the update so that’s why we’re we think that this mobile app is a really powerful we keep all these sales rep engage keep all the improves data hygiene and so that we can build a more accurate but on the other hand like some other activities like email can the activities that’s kind of separate it’s like when they set up please our event invitation with account then we major can’t get we mitigate the data is more like automatic and log information so what was it cleaning as well as food producers but you cash so the question is whether or not we keep the data in memory obesity words around how much really I think right now for which you know we have like eight gig and in the way it depends on the exercise so for 1 million entries for 1 million Tracy you can easily load them yeah question how being is moving without just dig us like that you process ah we are processing billions of events and that’s what I can tell I don’t want to be too specific so so if you have the weights that are in the ball you could use those to figure out what actions might be taken to improve this way those your application give those suggestions ok so just suggest saying is that if we already lent the weight associate with different features then we can use libertad to make suggest same up about actions to take to improve the score that’s a very that’s a fair point the issue we noted that there many features are not actionable so let’s say there’s one feature let’s say you send a price quote to your to your customer to to your account if they didn’t send we cannot suggest them to send right so um so we that’s why I like in order to provide some pre screw prescriptive an ethics we need to figure out what our actions can be taken then we decide which which action to suggest so that’s kind of we have to take care of your separately okay need more questions okay thanks for your loss

Uncategorized

Lesson 2: Practical Deep Learning for Coders

So one of the things I wanted to talk about (this really came up when I was looking at the survey responses) what is different about how we’re trying to teach this course and how will it impact you as participants in this course Really we’re trying to teach this course in a very different way from the way most teaching is done, at least most teaching in the United States Rachel and I are both very keen fans of David Perkins, who has this wonderful book called “Making Learning Whole, How Seven Principles of Teaching can Transform Education” We are trying to put these principles in practice in this course I’ll give you a little anecdote from the book to give you a sense of how this works If you were to learn baseball the way that math is taught, you would first learn about the shape of a parabola Then you would learn about the materials science design behind stitching baseballs, and so forth And 20 years later, after you completed your PhD and post-doc, you’d be taken to your first baseball game and you’d be introduced to the rules of baseball And then 10 years later, you might get to hit The way that in practice baseball is taught is we take a kid down to the baseball diamond and we say, “These people are playing baseball, would you like to play?” And they say, “Yea, sure I would.” “Okay, stand here, I’m going to throw this Hit it, now run.” Good, you’re playing baseball So that’s why we started our first class with, “Here are 7 lines of code you can run to do deep-learning.” Not just to do deep-learning but to do image classification on any dataset as long as you structure it in the right way So this means you will very often be in the situation (we’ve heard a lot of your questions about this during the week); Gosh, there’s a whole lot of details I don’t understand Like this fine-tuning thing What is fine-tuning? And the answer is we haven’t told you yet It’s a thing you do in order to do effective image classification in deep-learning We’re going to start from the top and gradually work our way down, and down, and down The reason that you are going to want to learn the additional level of detail is so that when you get to the point where you want to do something that no one’s done before, you’ll know how to go into that detail and create something that does what you want We’re going to keep going down a level and down a level and down a level through the hierarchy of software libraries, through the hierarchy of the way computers work, through the hierarchy of the algorithms and the math, but only at the speed that’s necessary to let’s make a better model, or let’s make a model that can do something we couldn’t do before Those will always be our goals So it’s very different I don’t know if anybody’s been reading the Yoshua Bengio, Ian Goodfellow and Aaron Courville book (www.deeplearningbook.org), which is a great mathematical deep learning book It literally starts with 5 chapters of everything you need to know about probability, everything you need to know about calculus, everything you need to know about linear algebra, everything you need to know about optimizations, and so forth I don’t know that in the whole book, there’s ever actually a point where it says, here is how you do deep-learning Even if you read the whole thing I’ve read 2/3 of it It’s a really good math book Anybody who’s interested in understanding the math of deep-learning, I would strongly recommend But it’s kind of the opposite of how we’re teaching this course So if you often find yourself thinking, I don’t really know what’s going on, that’s fine But I also want you to always be thinking about, okay how can I figure out a bit more about what’s going on So we’re trying to let you experiment Generally speaking, the assignments during the week are trying to give you enough room to find a way to dig in into what you’ve learnt and do a little bit more Make sure you can do what you’ve seen and also that you can learn a little bit more about it So you are all coders So you are all expected to look at that first notebook, and look at what are the inputs to every one of those cells, what are the outputs from every one of those cells How is it that the output can be used as the input to that cell, why is this transformation going on This is why we did not tell you how do you use Kaggle CLI, how do you prepare a submission in the correct format, because we wanted you to see if you could figure it out and also to leverage the community that we have to ask questions when you’re stuck [Time: 5 minute mark] Being stuck and failing is terrific because it means you have found some limit of your

knowledge or your current expertise You can then think really hard, read lots of documentation and ask the rest of the community until you are no longer stuck At which point you now know something you didn’t know before So that’s the goal Asking for help is a key part of this, so there is a whole wiki page called “How to ask for Help” It’s really important, and so far about half the times I have seen people ask for help, there is not enough information for your colleagues to really help you effectively So when people point you at this page, it’s not because they’re trying to be a pain, it’s because they’re saying I want to help you but you haven’t given me enough information So, in particular, what have you tried so far What did you expect to happen? What actually happened? What do you think might be going on? What have you tried to test this out? And tell us everything you can about your computer and your software? Show us screenshots, error messages, your code The better you get at asking for help, the more enjoyable experience you’re going to have because continually you’ll find your problems can be solved very quickly and you can move on There was a terrific recommendation from the head of Google Brain, Vincent Vanhoucke, on a Reddit AMA a few weeks ago where he said he tells everybody in his team, if you’re stuck work at it yourself for half an hour (you have to work at it yourself for half an hour) If you’re still stuck, you have to ask for help from somebody else The idea being that you are always making sure that you try everything you can, but you’re also never wasting your time when somebody else can help you I think that’s a really good suggestion, so maybe you can think about this half an hour rule yourself I wanted to highlight a great example of a really successful how to ask for help a great example of a really successful how to ask for help Who asked this particular question? That was really well done, really nice What’s your background before taking this class, maybe you can introduce yourself quickly I graduated from USF two years ago with a bachelor’s in Data Science You can see here that he explained here what he’s going to do, what happened last time, what error message he got He’s got a screenshot to show you want he typed and what he came back He shows what resources he’s going to use, what these resources say and so forth Did you get your question answered? Kicho says, yes, Rachel emailed me Okay, great Rachel says the question was so clear, it was easy to answer As you might have noticed, the wiki is rapidly coming out with some great information, so please start exploring it You’ll see on the left-hand side, there is a recent changes section and you can see every day there’s lots of people who’ve been contributing lots of things, so it’s continually improving There’s some great kind of diagnostic sections If you are trying to diagnose something which is not covered and you solve it, please add your solution to this diagnostic section One of the things I lvoed seeing today was Tom was asking a question about how fine-tuning works, we talked about the answers and then he went ahead and created a very small little wiki page There’s not much information there, but there’s more than there used to be This is exactly what we want You can even see in the places where he wasn’t quite sure, he put some question marks So now somebody else can go back and edit his wiki page, and Tom’s going to come back tomorrow and say now I’ve got even more questions answered This is the kind of approach where you’re going to learn a lot [Time: 10 minute mark] This is another great example of something that I think is very helpful

Melissa, who we heard from earlier, went ahead and told us all, Here is my understanding of the 17 steps necessary to complete the things we were asked to do this week So this is great for Melissa to make sure she understands it correctly, but everybody else can say that’s a really handy resource that we can draw on as well There are 718 messages in Slack in a single channel That’s way too much to expect to use this as a learning resource, so this is my suggestion as to how you might want to be careful of how you use Slack I wanted to spend maybe quite a lot of time talking about the resources that are available, because I feel like if we get that out now then we’re all going to speed along a lot more quickly Thanks for your patience as we talk about some non-deep-learning stuff We expect the vast majority of learning to happen outside of class In fact, if we go back and finish off our survey, I know that one of the questions asked about that How much time are you willing to commit most weeks to this class? And the majority was 8-15 hours Some are 15-30, and a small number are Less than 8 If you’re in the Less than 8 group, I understand that’s not something you can probably change If you had more time, you’d put in more time So if you’re in the Less than 8 group, think about how you want to prioritize what you want to get out of this course Be aware that it’s not really designed that you’re going to be able to do everything in less than 8 hours a week, so maybe make more use of the forums and the wiki and focus your assignments during the week on the stuff that you’re most interested in Don’t worry too much if you don’t feel like you’re getting everything because you have less time available For those of you in the 15-30 group, I really hope that you’ll find that you’re getting a huge amount out of the time that you’re putting in Something I’m glad I asked (because I found it really helpful) was “How much of the material from Lesson 1 was new to you?” And for half of you, the answer is Most of it And for well over half of you, Most of it or Nearly all of it If you’re one of the people that I’ve spoken to during the week who’s said, “Holy shit, that was a firehose of information I feel kind of overwhelmed, kind of excited,” you are amongst friends Remember, during the week there are about 100 of you going through this same journey So if you want to catch up with some people during the week and have a coffee and talk more about the class, or join a study group here at USF, or if you’re from the South Bay, find some people from the South Bay, I would strongly suggest doing that So for example, if you’re in Menlo Park, you could create a Menlo Park Slack Channel and put out a message, “Hey, anybody else in Menlo Park availabe on Wednesday night, I’d like to get together and maybe do some peer programming or whatever.” For some of you, not very much of it was new and for those of you, I do want to make sure that you feel comfortable pushing ahead, trying out your own projects and so forth Basically in the last lesson, what we learned was a pretty standard data science computing stack – AWS, JuPyteR notebook, a bit of Numpy, Bash This is all stuff that regardless of what kind of data science you do, you’re going to be seeing a lot more of (if you stick in this area) These are all very, very useful things Those of that who have spent some time in this field will have seen most of it before Hopefully that is useful background So last week we were really looking at the basic foundations, computing foundations necessary for data science more generally, and for big learning more particularly This week, we’re going to do something very similar, but we’re going to be looking at the key algorithm pieces [TIme: 15 minute mark] So in particular, we’re going to go back and say, “Hey, what did we actually do last week,

and why did that work, and how did that work.” For those of you who don’t have much algorithmic background around machine learning, this is going to be the same firehouse of information as last week was for those of you who don’t kind of have a software, Bash, AWS background So again, if there’s a lot of information, don’t worry This is being recorded There are all the resources during the week So the key this is to come away with an understanding of what are the key pieces being discussed Why are those pieces important? What are they doing (even if you don’t understand the detail) So if at any point you’re thinking, okay Jeremy’s talking about activation functions I have no idea what he just said about what an activation function is, or why I should care Please go on to the InClass Slack Channel, and probably @Rachel I don’t know what Jeremy’s talking about at all; Rachel’s got a microphone and she can let me know Or else put up your hand and ask So I do want to make sure you guys feel very comfortable asking questions I have done this class now once before because I did it for the Skype students last night I’ve heard a few of the questions already, so hopefully I can cover some things that are likely to come up Before we look at digging in to what’s going on, the first thing we’re going to do is see how do we do the basic homework assignment from last week So the basic homework assignment from last week was can you enter the Kaggle Dogs and Cats Redux competition So how many of you managed to submit something to that competition and get some kind of result? Okay, that’s not bad So for those of you who haven’t yet, keep trying during this week and use all of those resources I showed you because quite a few of your colleagues have done it successfully and therefore we can all help you And I can show you how I did it The basic idea here is we have to download the data to a directory, so to do that I just use “kg download” after using the “kg config” command kg is part of the Kaggle CLI and Kaggle CLI can be installed by typing “pip install kaggle-cli” This works fine without any changes if you’re using our AWS instances and setup scripts In fact, it’s fine if you’re using Anaconda pretty much anywhere If you’re not doing any of those two things, you may have found this step more challenging But once it’s installed it’s as simple as saying “kg config”, with your username, password and competition name (kg config -g -u userName -p password -c competitionName) When you put in the competition name, you can find that out by just going to the Kaggle website and you’ll see that when you go to the comptetion in the URL, it has here a name So just copy and paste that, that’s the competion name kaggle-cli is a script that somebody created in their spare time and didn’t spend a lot of time on it There’s no error-handling, there’s no checking, or anything So for example, if you haven’t gone to Kaggle and accepted the competition rules, then attempting to run “kg download” will not give you an error, it will create a zip file that actually contains the contents of the Kaggle webpage saying, Please accept the competition rules So for those of you that had to unzip that, and it said it’s not a zipfile If you go ahead and cat that, you’ll see it’s not a zipfile it’s a HTML file So this is pretty common with recent-ish data science tools, and particularly with deep-learning stuff A lot of it’s pretty new, it’s pretty rough and you really have to expect to do a lot of debugging It’s very different than using Excel or PhotoShop or something So when I said “kg download”, it created a test.zip and a train.zip, so I went ahead and I unzipped both of those things, that created a test and a train, and they contain a whole bunch of files, cat.1.jpg, and so forth [Time: 20 minute mark] So the next thing I did to make my life easier was I made a list of what I belived I had

to do (I find life much easier with a to-do list) I need to create a validation set, I need to create a sample, I need to move my cats into a cats directory, my dogs into a dogs directory, I’m going to have to fine-tune and train, I then need to submit I went ahead then and created Markdown headings for these things and started filling them out So Create Validation Set and Sample A very handy thing in JuPyteR notebook is you can create a cell that starts with a “%” (percent sign), and that allows you to type magic commands There are lots of magic commands, all kinds of useful things They do include things like “%cd”, and “%mkdir” and so forth Another cool thing you can do is you can use the “!” (exclamation mark) and then type any Bash command The nice thing about doing this stuff in a notebook rather than Bash is that you’ve got a record of everything you did So if you want to go back and do it again, you can If you make a mistake, you can go back and figure it out This kind of reproducible research is very highly recommended I try to do everything in a single notebook so I can go back and fix the problems that I always make Here you can see I’ve gone into the directory, I’ve created my validation set I then used 3 lines of Python to go ahead and grab all of the jpg file names [g=glob(‘*.jpg’)], create a random permutation of them [np.random.permutation(g)] And so then the first 2000 of that random permutation are 2000 random files, and then I move them into my validation directory, valid I did the exact same thing for my sample, but rather than moving them, I copied them And then I did that for my sample training and my sample validation, and that was enough to create my validation set and sample The next thing I had to do was move all my cats into my cats directory and my dogs into my dogs directory Which was as complex as typing “%mv cat.*.jpg cats/” and ” mv dog.*.jpg dogs/” And the cool thing is that now that I’ve done that I can just copy and paste the 7 lines of code from our previous lesson, these lines of code are totally unchanged I added one more line of code which was save_weights Once you’ve trained something, it’s a great idea to save the weights so that you don’t have to train it again You can always go back later and say load_weights So I now had a model which predicted cats and dogs through my Redux competition My final step was to submit it to Kaggle Kaggle tells us exactly what they expect The way they do that is by showing us a sample of the submission file And the sample shows us that they expect an id column and a label column The id is the file number So if you have a look at the test set, you’ll see everyone’s got a number So it’s expecting to get the number of the file along with your probability So you have to figure out how to take your model and create something of that form This is clearly something that you’re going to be doing a lot, so once I figured out how to do it, I actually created a method to do it in one step So I’m going to go and show you the method I wrote So I’ve got this utils module that I usually kind of tuck everything in, but I’m going to put this in my vgg module (vgg16.py) There’s a few ways you can possibly do this Basically you know you have a way of grabbing a mini-batch of data at a time, or a mini-batch of predictions at a time So one thing you could do would be to grab your mini-batch of size 64, you grab your 64 predictions and you just keep appending them, 64 at a time to an array until eventually you have your 12500 test images, all with a prediction, in an array That is actually a perfectly valid way to do it How many people solved it using that kind of approach? Not many of you That’s interesting, but works perfectly well [Time: 25 minute mark]

Those of you who didn’t (I guess) either asked on the forum or read the documentation and discovered there’s a very handy thing in Keras called predict_generator And what predict_generator does is it lets you send in a bunch of batches (something we created with get_batches) and it will run the predictions on every one of those batches and return them all in a single array So that’s what we want to do If you read the Keras documentation (which you should do very often), you will find out that predict_generator generally will give you the labels but not the probabilities (cat 1, dog 0) In this case, for this competition, they told us that they want probabilities, not labels So instead of calling the get_batches which we wrote; here is the get_batches we wrote; you can see all it’s doing is it’s calling something else which is flow_from directory To get predict_generator to give you probabilities instead of classes, you have to pass in an extra argument, which is class_mode= and rather than “catgorical”, you have to say “none” So in my case, I actually modified get_batches to take an extra argument, which was “class_mode” And then in my test mode that I created, I then added class_mode=”none” And so then I could call model.predict_generator passing in my batches, and that is going to give me everything I need So once I do, I can say vgg.test, pass in my test directory, pass in my batch size That returns two things, it returns the predictions and it returns the batches I can then use batches.filenames to grab the filenames because I need the filenames in order to grab the IDs So that, looks like this There’s a few predictions, there’s a few filenames Now one thing interesting, at least for the first 5 is the probabilities are all 1’s and 0’s, rather than .6, .8, and so forth We’re going to talk about why that is in just a moment For now, it is what it is It’s not doing anything wrong, it really thinks that that’s the answer Because Kaggle wants something which is isDog, we grab the second column from this, and the numbers from this, paste them together as columns and send that across So here is grabbing the first column from the predictions, call it “isdog” Here is grabbing from the 8th character until the dot in filenames, turning it into an integer to get my IDs Numpy has something called stack which lets you put 2 columns next to each other So here is my IDs and my probabilities And then Numpy lets you save that as a CSV file using savetxt You can now either ssh to your AWS instance and use “kg submit”, or my preferred technique is to use a handly little iPython thing called FileLink If you type FileLink and pass in a file that is on your server, it gives you a little URL like this, which I can click on and it downloads it to my computer And so now on my computer I can go to Kaggle and I can submit it in the usual way I prefer that because it lets me find out if there’s any error messages, or if there’s anything going wrong on Kaggle, I can see what’s happening So as you can see, rerunning what we learnt last time to submit something to Kaggle really requires just a little bit of coding to just creat the submission file, a little bit of Bash scripting to move things to the right place, and rerunning the 7 lines of code, the actual deep-learning itself is incredibly straightforward Here’s where it gets interesting When I submitted (my 1’s and 0’s) to Kaggle, the first thing I submitted was isCat rather than isDog, so that put me in last place

[Time: 30 minute mark] Then when I was putting in 1’s and 0’s, I was in 110th place, which is still not that great Now the funny thing was I was pretty confident that my model was doing well because the validation set for my model told me that my accuracy was 97.5% I’m pretty confident that the people are not all of them doing better than that So I thought something weird’s going on So that’s a good time to figure out what does this number mean? What is 12, what is 17? So let’s go and find out It says here that it is a LogLoss, so if we go to Evaluation, we can find out what LogLoss is, and here is the definition of LogLoss LogLoss is known in Keras as binary entropy or categorical entropy You’ll actually find it very familiar because every single time we’ve been creating the model, when we compile the model we have been using categorical_crossentropy So it’s probably a good time for us to find out what this is The short answer is that it’s this mathematical function But let’s dig into this a little bit more and find out what’s going on I would strongly recommend that when you want to figure out how something works, you whip out a spreadsheet Spreadsheets are my favorite tool for doing small scale data analysis They are among the least well-utilized tools among professional data scientists Which I find really surprising because back when I was in consulting, everybody used them for everything They were the most over-used tool So what I’ve done here is I’ve gone ahead and created a little column of isCats and isDogs, and a column of possible predictions And then I’ve gone in and I’ve typed in the formula from that Kaggle page And so here it is It’s the truthLabel times logOfThePrediction minus (1-truthLabel) times log(1-prediction) Now if you think about it, the truth label is always 1 or 0, so this is probably more understood using an if function It’s exactly the same thing Rather than multiplying by 1 and 0, let’s use an if function If it’s a cat, then take the log of the prediction Otherwise, take log of (1-prediction) Now this is hopefully pretty intuitive If it’s the cat and your prediction is really high, we’re taking the log of that and getting a small number If it’s not a cat, and our prediction is really low, then we will take the log of 1 minus that So you can kind of get a sense of it by looking here Here’s like a non-cat, which we thought was a non-cat, and therefore we end up with 1 minus that, it’s a low number Here’s a cat which we’re pretty confident isn’t a cat, so here is log of that Notice this is all with a negative sign in the front just to make it so that smaller numbers are better So this is LogLoss or binary or categorical crossentropy And this is where we find out what’s going on because I’m now going to go and say well hat did I submit I submitted predictions that were all 1’s and 0’s What if I submit 1’s and 0’s Ouch We’re taking logs of 1’s and 0’s That’s no good So Kaggle’s been pretty nice not to return just an error I actually know why this happens because I wrote this functionality on Kaggle Kaggle modifies it by a tiny bit, like .0001, just to make sure it doesn’t die So if you say 1, it actually treats it as 999 If you say 0, it treats it as .0001 So our incredibly over-confident model is getting massively penalized for that over-confidence So what would be better to do would be, instead of sending off 1’s and 0’s, why not send across actual probabilities you think are reasonable [Time: 35 minute mark] So in my case, what I did was I added a line which was Numpy clip (np.clip) and clip it

to .05 and .95 So anything less than .05 becomes .05 and anything greater than .95 becomes .95 And then I tried submitting that, and that moved me from 100th place to 40th place Suddenly I was in the top half So the goal of this week was to try and get in the top half of the competition and that’s all you had to do, run a single epoch and realize with this evalution function you need to be submitting things that aren’t 1’s and 0’s So probably I should have used (and I’d be interested to try this tomorrow) for resubmission, I probably should have done .025 and .975 because I know that my accuracy on the validation set was .975 So that’s probably the probability that I should have used I would need to think about it more though Because it’s a non-linear loss function, is it better to under-estimate how confident you are, or over-estimate how confident you are? In the end, I said, it’s about 97.5 and I have a feeling that being over-confident might be a bad thing because the shape of the function, so I’ll just be a little bit on the tame side I then tried .02 and .98 and I did actually get a slightly better answer I actually got a little bit better than that in the end This afternoon, I ran a couple more epochs just to see what would happen and that got me to 24th So I’ll show you how you can get to 24th position, and it’s incredibly simple You take these two lines here vgg.fit and vgg.model.save_weights, and copy and paste them a bunch of times And you can see I saved the weights under a different filename each time, just so that I could always go back and use a model that I created earlier Something that we’ll talk about more in the class later is that after 2 epochs, I changed my learning rate from .1 to .01, just because I happen to know that this is often a good idea I haen’t tried it without doing that I suspect it might be just as good or even better, but that was just something I tried So interestingly by the time I run 4 epochs, my accuracy is 98.3% That would have been 2nd place in the original Cats and Dogs competition So you can see it doesn’t take much to get really good results And each one of these took 10 minutes to run on my AWS P2 instance The original Cats and Dogs used a different evaluation function which was just accuracy They changed it for the Redux one to use LogLoss, which makes it a bit more interesting The reason I just didn’t say nb_epoch=4 is that I really wanted to save the results after each epoch under a different weights filename Just in case at some point it overfit a bit, I could always go back and use one that I got in the middle.A In this case, we have added a single linear layer to the end (we’re going to learn a lot about this) And so we actually are not training very many parameters My guess would be that in this case we could run as many epochs as we like and it would probably keep getting better and better until it eventually levels off

That would be my guess [Time: 40 minute mark] So I wanted to talk about what are these probabilities One way to do that, and also to talk about how can you make this model better, is anytime I build a model and I think about how to make it better, my first step is to draw a picture Data scientists don’t draw enough pictures Everything from printing out the first 5 lines of your array to see what it looks like to drawing complex plots For computer vision, you can draw lots of pictures because we’re classifying pictures I’ve given you some tips here about what I think are super-useful things to visualize So when I want to find out why my Kaggle submission is 110th place Every time I build a model and think about how to make it better, I run my standard 5 steps The steps are lets look at a few examples of images we got right; let’s look at a few examples of images we got wrong; let’s look at some of the cats we felt were the most cat-like, some of the dogs we felt were the most dog-like, and vice-versa; some of the cats we were most wrong about, some of the dogs we were most wrong about; and then finally some of the most cats and dogs that are model is most unsure about This little bit of code I suggest you keep around somewhere because this is a super useful thing to do any time you do image recognition So the first thing I did was I loaded my weights back up (just to make sure they were there) just to make sure they were there from my very first epoch And I used my vgg.test method I showed you that I created This time I passed in my validation set, not the test set, because the validation set I know the correct answer So then from the batches I can get the correct labels and I can get the filenames I then grab the probabilities and the class predictions, and that then allowed me to do the 5 things I just mentioned So here’s #1 – a few correct labels at random np.where(preds==labels)[0], numpy where the predictions are equal to the labels permutation(correct[:n_view], a random permutation and grab the first 4, and then plot them by index So here are 4 examples of things that we got right Not suprisingly this cat looks like a cat and this dog looks like a dog Here are four things we got wrong It’s kind of interesting You can see here a very black underexposed thing on a bright background Here is something that is on a totally unusual angle Here is somethign that is so curled up you can’t see its face Here is something you can’t see its face either So this gives me a sense of the things it’s getting wrong, it’s reasonable to get those things wrong If you looked at this and they were really obvious cats and dogs, you would think there’s something wrong with your model But in this case, the things it’s finding hard are genuinely hard Here are some cats that we felt very sure are cats Here are some dogs we felt very sure were dogs Question: The weights you’re saving, are those ImageNet weights or the ones we trained on Cats and Dogs? Answer: So these weights, “results/ft1.h5”, the “ft” stands for fine-tune As you can see here, I saved my weights after I did my fine-tuning, so these are for Cats and Dogs Question: In the fine-tuning are you just training the last layer? Answer: Yes, I’m just trainng the last layer We’re not talking about that yet We just used the finetune command, and later today we’re going to learn about what that does These, I think, are the most interesting Here are the images we were most confident were cats, but they’re actually dogs And you can see, well here’s one that is only 50×60 pixels, that’s very difficult Here’s one that is almost totally in front of a person and is also standing upright, that’s difficult because it’s unusual This one is very white and is totally from the front, that’s quite difficult This one, I’m guessing the color of the floor and the color of the fur are nearly identical So again, this makes senses These do look genuinely difficult and so if we want to do really well in this competition, we want to think about should we start building some models of very very small images, because we now know that sometimes Kaggle gives us 50×50 images, which are going to be very difficult for us to deal with [Time: 45 minute mark]

Here are some pictures that we were actually very confident were dogs, but they’re actually cats Again, not being able to see the face seems like a common problem And then finally, here’s some examples that we were most uncertain about Notice that the most uncertain are still not very uncertain, they’re still nearly 1 or nearly 0 So why is that? We will learn in a moment about exactly what is going on from a mathematical point of view when we calculate these things, but the short answer is the probabilities that come out of a deep learning network are not probabilities in any statistical sense of the term So this is not actually saying that there is 1 chance out of a 100,000 that this is a dog It’s only a probability from a mathe mathematical point of view In math, a probability means between 0 and 1, and all of the possibilities add up to 1 It’s not a probability in a sense that this actually tells you how often this is going to be right, this is going to be wrong So for now, just be aware of that When we talk about these probabilities that come out of neural network training, you can’t interpret them in any kind of intuitive way We will learn about how to create better probabilities down the track Every time you do another epoch, your network is going to get more and more confident This is why when I loaded the weights, I loaded the weights from the very first epoch If I’d loaded the weights from the last epoch, they all would have been 1’s and 0’s So this is just something to be aware of So hopefully you can all go back and get great results on the Kaggle competition Even though I’m going to share all this, you will learn a lot more by trying to do it yourself, and only referring to this when and if you’re stuck If you do get stuck, rather than copying and pasting my code, find out what I used and then go to the Keras documentation and read about it, and then try and write that line of code without looking at mine The more you can do that, the more you’ll think I can do this, I understand how to do this myself Just some suggestions, it’s entirely up to you Okay, let’s move on I wanted to show you one other thing, which is the last part of the homework was redo this on a different dataset And so, I decided to grab the State Farm Distracted Driving competition The Kaggle State Farm Distracted Driving competition has pictures of people in 10 different types of distracted driving, ranging from drinking coffee to changing the radio station I wanted to show you how I entered this competition It took me 1/4 hour to enter the competition All I did was I duplicated my Dogs and Cats redux notebook and then I started re-running everything But in this case, it was even easier because when you download the State Farm competition data, they have already put it into directories, one for each type of distracted driving, which I was delighted to discover If I type “tree -d” that shows you my directory structure You can see in the directory train, it already had the 10 directories, so I can skip that whole section So I only had to create the validation and sample set If all I wanted to do was enter the competition, I wouldn’t have even had to have done that I won’t go through it, it’s basically exactly the same code I had before to create my validation set and sample I deleted all of the bits which moved things into separate sub-folders I then used exactly the same 7 lines of code as before, and that was basically done I’m not getting good accuracy yet, I don’t know why So I’m going to have to actually figure out what’s going on with this But as you can see, this general approach works for any kind of image classification There’s nothing specific about Cats and Dogs So you now have a very general tool in your toolbox And all of the stuff I showed you about visualizing the errors and stuff, you can use all that as well So maybe when you’re done you can try this as well

[Time: 50 minute mark] Question: Would this work for CT scans and cancer? Answer: I can tell you that the answer is yes because I’ve done it My previous company I created was something called Enlitic, which was the first deep-learning for medical diagnostics company And the first thing I did with 4 of my staff was we downloaded the National Lung Screening Trial Data, which is 1000 examples of people with cancer, CT scans of their lungs; and 5000 people without cancer, CT scans of their lungs We did the same thing, we took ImageNet, we fine-tuned ImageNet, but in this case instead of Cats and Dogs we had malignant tumor vs non-malignant tumor We then took the result of that and saw how accurate it was, and we discovered it was more accurate than a panel of 4 of the world’s best radiologists Tat ended up getting covered on TV, on CNN So making major breakthroughs in domains is not necessarily technically that challenging The technical challenges in this case were really about dealing with the fact that CT scans are pretty big, so we’ve got some resource issues Also they’re black and white, so we had to think about how we would change ImageNet pre-training to black and white, and stuff like that But the basic example was not much more or different code than you see here Question: Earlier you had a learning rate of .01, is that the right rate???? Answer: The State Farm data is 4 gigabytes and I only downloaded it about 1/2 hour before class started, so I only ran a small fraction of an epoch just ot make sure it works If I ran a whole epoch it probably would have taken overnight So let’s go back to Lesson 1, and there was a little bit at the end that we didn’t look at How many of you have watched this video? Some of you haven’t; you need to Because I mentioned a couple of times in emails, the last 2/3 of it was a surprise Lesson 0 of this class and it’s where I teach what convolutions are So if you haven’t watched it, please do Rachel will add it to the in-class Slack channel and also to the Lesson 2 resources wiki page Really, really important that you watch this video The first 20 minutes or so is more of a general background of it The rest is a discussion of exactly what convolutions are For now, I’ll try not to assume too much that you know what they are The rest of it hopefully will be reasonably stand-alone anyway But I want to talk about fine-tuning, and I want to talk about why we do fine-tuning Why do we start with an ImageNet network and then fine-tune it, rather than just train our own network And the reason why is that an ImageNet network has learned a whole lot of stuff about what the world looks like A guy called Matt Zeiler wrote this fantastic paper a few years ago in which he showed us what these networks learn And in fact the year after he wrote this paper, he went on to win ImageNet so this is a powerful example of why spending time thinking about visualizations are helpful

By spending time thinking about visualizing networks, he then realized what was wrong with the networks at the time He then made them better and won the next year’s ImageNet [Time: 55 minute mark] We’re not going to talk about that, we’re going to talk about some of these pictures that he drew Here are 9 examples of what the very first layer of an ImageNet convolutional neural network looks like, what do the filters look like And you can see here that here is a filter that learns to find a diagonal edge, a diagonal line So you can see, it’s saying look for something where there’s no pixels, then there’s bright pixels and then there’s no pixels, so that’s finding a diagonal line Here’s something that finds a diagnoal line in the other direction Here’s something that finds a gradient, horizontal from orange to blue, here’s one diagonal from orange to blue As I said, these are just 9 There are 60 or so of these filters in Layer 1 of this ImageNet trained network So what happens (those of you who have watched the video I just mentioned will be aware of this) is that each of these filters gets placed pixel by pixel, or group of pixels by group of pixels, over a photo, over an image, to find which parts of an image get matches So which parts have a diagonal line And over here, it shows 9 examples of little bits of actual ImageNet images which match this first filter As you can see they all are little diagonal lines So here are 9 examples which match the next filter with diagonal lines in the opposite direction, and so forth The filters in the very first layer of a deep-learning network are very easy to display This has happened for a long time and we’ve also known for a long time that this is what they look like We also know that the human vision system is very similar The human vision system has filters that look much the same To really answer the question of what are we doing back here, I would say watch the video The short answer is this is a 7×7 pixel patch which is slid over the image one group of 7 pixels at a time to find which 7×7 patches look like that Here is one example of a 7×7 patch which looks like that So for example, this gradient Here are some examples of 7×7 patches that look like that So we know that the human vision system actually looks for very similar kinds of things The kind of things they look for are called Gabor Filters If you want to google for Gabor Filters, you can see some examples It’s a little bit harder to visualize what the second layer of a neural net looks like, but Zeiler figured out a way to do it, and in his paper he shows us a number of examples of the second layer of his Internet trained neural network Because we can’t directly visualize them, instead we have to show examples of what the filter can look like So here is an example of a filter which clearly tends to pick up corners So in other words, it’s taking the straight lines from the previous layer and combining them to find corners There’s another one which is learning to find circles, and another one learning to find curves And so you can see, here are 9 examples from actual pictures on ImageNet which actually did get heavily activated by this corner filter And here are some which got activated by this circle filter The third layer then can take these filters and combine them (remember this is 16 out of 100) in the ImageNet architecture In Layer 3, we can combine all of those to create even more sophisticated filters And so in Layer 3, there’s a filter which can find repeating gemetrical patterns

[Time: 1 hour mark] Here’s a filter, what is it finding? Let’s go look at the examples Well, that’s interesting It’s finding pieces of text And here’s something which is finding edges of natural things, like fur and plants Layer 4 is finding certain kind of dog face, Layer 5 is finding the eyeballs of birds and reptiles, and so forth So there are 16 layers in our VGG network What we do when we fine-tune is we say, let’s keep all of these learnt filters and use them and just learn how to combine the most complex, subtle nuanced filters to find Cats vs Dogs rather than combine them to learn 1000 categories of ImageNet This is why we do fine-tuning So when I answered Annette’s earlier question does this work for CT scans and lung cancer and the answer was yes, these kinds of filters that find dog faces are not very helpful for looking at a CT scan and looking for cancer But these earlier ones that can recognize repeating images, or corners, or curves, certainly are So really regardless of what computer vision work you’re doing, starting with some kind of pre-trained network is almost certainly a good idea Because at some level that pre-trained network has learned to find some kinds of features that are going to be useful for you And so if you start from scratch, you have to learn them from scratch In Cats vs Dogs, we only had 25,000 pictures And so from 25,000 pictures to learn this whole hierarchy of geometric and semantic structures would have been very difficult So let’s not learn it, let’s use one that’s already been learnt on ImageNet, which is 1.5 million images So that’s the short answer to the question why do fine-tuning The longer answer really requires answering the question, what exactly is fine-tuning? And to answer the question what exactly is fine-tuning, we have to answer the question what exactly is a neural network Question: Which layer should you fine-tune from? Answer: We’ll learn more about this shortly, but the short answer is if you’re not sure, try all of them Generally speaking, if you’re doing something with natural images, the second-to-the-last layer is very likely to be the best But I just tend to try a few things And we’re going to see, today or next week, ways that we can actually experiment with it So as per usual, in order to learn about something we will use Excel And here is a deep neural network in Excel Rather than having a picture with a bunch of pixels, I just have 3 inputs A single row with 3 inputs, x1, x2, x3, and the numbers are 2, 3, 1 Rather than trying to pick out whether it’s a dog or a cat, we’re going to assume there are 2 outputs, 5 and 6 So here’s a single row that we’re feeding into a deep neural network So what is a deep neural network? A deep neural network basically is a bunch of matrix products So what I’ve done here is I’ve created a bunch of random numbers They are normally distributed random numbers, and this is the standard deviation that I’m using for my normal distribution, and I’m using 0 as the mean So here’s a bunch of random numbers What if I then take my input vector and matrix multiply them by my random weights, and here it is Matrix multiply that by that, and here is the answer I get So for example, 24.03=2*11.07 + 3*(-2.81) + 1*10.31, and so forth

[Time: 1.05 hour mark] Any of you who are either not familiar with or are a little shakey on your matrix vector products, tomorrow please go to the Kahn Academy website and look for linear algebra and watch the videos about matrix vector products They are very, very simple, but you also need to understand them very, very intuitively, comfortably Just like you understand plus and times in regular algebra; I really want you to get to that level of comfort with linear algebra because this is the basic operation we’re doing again and again Question: Can you increase the font size of the formula? Answer: I don’t think I can But instead what I will do is I will select it Matrix multiply and the blue thing is find derivatives So that is a single layer How do we turn that into multi-layers? Not surprisingly, we create another bunch of weights And now we take the new bunch of weights times the previous activations with our matrix multiply, and get a new set of activations And then we do it again We create another bunch of weights and multiply them by our previous activations to get another set of activations Note that the number of columns in your weight matrix, you can make it as big or as small as you like, as long as the last one has the same number of columns as your output We had two outputs, 5 and 6, so our final weight matrix had to have two columns so that our final activations has two things With our random numbers, our activations are not very close to what we hoped they would be, not surprisingly So the basic idea here is that we now have to use some kind of optimization algorithm to make the weights a little bit better, and a little bit better We’ll see how to do that in a moment But for now, hopfully you are all familiar that there is such a thing as an optimization algorithm An optimization algorithm is something that takes some kind of output to some kind of mathematical function and finds the inputs to that function that makes the outputs as low as possible In this case, the thing we would like to make as small as possible would be like the sum-of-squares errors between the activations and the outputs I want to point out something here, which is that when we stuck in these random numbers; the activations that came out, not only are they wrong, they’re not even in the same general scale as the activations that we want So that’s a bad problem The reason it’s a bad problem is that they’re so much bigger than the scale that we were looking for As we change these weights just a little bit, it’s going to change the activations by a lot, and this makes it very hard to train In general, you want your neural network (even with random weights) to start off with activations which are all of similar scale to each other, and the output activations would be of similar scale to the output For a very long time, nobody really knew how to do this So for a very long time, people could not really train deep neural networks It turns out that it was incredibly easy to do, and there is a whole body of work talking about neural network initializations It turns out that a really simple and really effective neural network initialization is called Xavier Initialization (named after it’s founder, Xavier Gloret) and it is 2 divided by (in+out) Like many things in deep-learning, you will find this complex looking thing, Xavier Weight Initialization Scheme, and when you look into it you will find it is something about this easy This is about as complex as deep learning gets So I am now going to go ahead and implement Xavier Deep Learning Weight Initialization Schemes in Excel So I’m going to go up here and in this cell put in equals 2 divided by (3 in + 4 out), and press enter

So now my first set of weights have that as its standard deviation [Time: 1.10 hour mark] My second set of weights I also have pointing to the same place because they also have 4 in and 3 out And then my third, equals 2 dividided by (3 in + 2 out) Done So I have now implemented it in Excel You can see that my activations are indeed of the right general scale Generally speaking, you would normalize your inputs and outputs to be mean 0 and standard deviation 1 We want them to be of the same kind of scale Obviously they’re not going to be in 5 and 6 because we haven’t done any optimization yet, but we don’t want them to be 100,000 We want them to be somewhere around 5 and 6 If we start off with them really, really high or really, really low, an optimization is going to be really hard to do For decades, when people tried to train deep-learning networks, the training took forever or was so incredibly unresiliant that it was useless This one thing, a better weight initialization was a huge step It was like 3 years ago that this was invented It’s not like we’re going back a long time, this is relatively recent Now the good news is Keras (and pretty much any decent neural network library) will handle your initialization for you Until recently, they pretty much all used this There are some even more recent, slightly better approaches, but they’ll give you a set of weights where your outputs will generally have a reasonable scale Question: Is it arbitrary what dimensions we’re using here? Answer: So what’s not arbitrary is you are given your input dimensionalities In our case, for example, it would be 224×224 pixels, 3 things You are given your output dimensionality, in our case, for example, for Cats vs Dogs it’s 2 The thing in the middle about how many columns does each of your weight matrices have is entirely up to you The more columns you add, the more complex your model, and we’re going to learn a lot about that As Rachel said, this is all about your choice of architecture So in my first one here I had 4 columns, and therefore I had 4 outputs In my next one, I had 3 columns, and therefore I had 3 outputs In my final one I had 2 columns and therefore I had 2 outputs, and that is the number of outputs that I want This thing of how many columns do you have in your weight matrix is where you get to decide how complex your model is, so we’re going to see that So let’s go ahead and create a linear model So we’re going to learn how to create a linear model Let’s first of all learn how to create a linear model from scratch (and this is something that we did in that original USF Data Institute launch video) Without using Keras at all, I can define a line as being a*x+b, I can then create some synthetic data, I can assume a=3 and b=8, create some random x’s, and my y will be a*x+b So here’s some x’s and some y’s, and not surprisingly the scatterplot looks like so [Time: 1.15 hour mark] The job of somebody creating a linear model is to say I don’t know what a and b is, how

can we calculate it Forget that we know they’re 3 and 8, let’s guess that they are -1 and 1 How can we make our guess better? To make our guess better, we need a loss function A loss function is a mathematical function that will be high if your guess is bad, and it’s low if it’s good The loss function I’m using here is just sum-of-square errors, which is my actual minus my prediction squared, and add them up So if I define my loss function like that and I say my guess is -1 and 1, I can calculate my average loss and it is 9 So my average loss with my random guesses is not very good In order to create an optimizer, I need something that can make my weights a little bit better If I have something that can make my weights better, I can just call it again and again and again That’s actually very easy to do If you know the derivative of your loss function with respect to your weights, then all you need to do is update your weights by the opposite of that So remember, the derivative is the thing that says as your weight changes, your output changes by this amount That’s what the derivative is Question: So you start with something random and by iterating you’re going to get something that works? Answer: Yes Let’s try it In this case we y=a+b and we have our loss function is actual – predicted, squared and add it up We’re now going to create a function update (upd), which is going to take our “a” guess and our “b” guess and make them a little bit better And to make them a little bit better, we calculate the derivative of our loss function with respect to “b” and the derivative of our loss function with respect to “a” How do we calculate those? We go to Wolfram Alpha and we enter in “d” along with our formula and the thing we want to get the derivative of, and it tells us the anwer So that’s all I did I went to Wolfram Alpha, found the correct derivative, pasted them in here What this means then is that this formula here tells me as I increase “b” by 1, my sum-of-square errors will change by this amount (dydb) And this says that as I change “a” by 1, my sum-of-square error will change by this amount (dyda) So if I know that dyda=3, if I know that my loss function gets higher by 3 if I increase “a” by 1, then clearly I need to make “a” a little bit smaller If I make it a little bit smaller, my loss function will go down So that’s why our final step is to say take our guess and subtract from it our derivative times a little bit “lr” stands for learning rate As you can see, I’m setting it to .01 How much is a little bit is something that people spend a lot of time thinking about and studying and we will spend time talking about But you can always trial-and-error to find a good learning rate When you use Keras, you will always need to tell it what learning rate to use That’s something that you want the highest number you can get away with; you’ll see more of this next week The important thing to realize here is if we update our guess minus-equals our derivative, our guess is going to be a little bit better because we know going in the opposite direction makes the loss function a little bit lower So let’s run those two things We’ve now got a function called update (upd) which every time you run it makes our predictions a little bit better . Finally now I’m basically doing a little animation here that says every time you calculate an animation, call my animate function which 10 times will call my update function So let’s see what happens when I animate that There it is So it starts with a really bad line (my -1,1) and it gets better and better So this is how stochastic gradient descent works Stochastic gradient descent is the most important algorithm in deep learning Stochastic gradient descent is the thing that starts with random weights (like this) and ends with weights that do what you want to do

[Time: 1.20 hour mark] As you can see, stochastic gradient descent is incredibly simple and yet incredibly powerful because it can take any function and find the set of parameters that does exactly what we want to do with that function And when that function is a big learning neural network, that becomes particularly powerful Just to remind ourselves about the setup for this, we started out by saying this spreadsheet is showing up a deep neural network with a bunch of random parameters Can we come up with a way to replace the random parameters with parameters that actually give us the right answer So we need to come up with a way to do mathematical optimization So rather than showing how to do that wit a deep neural network, let’s see how to do it with a line So we start out by saying let’s have a line, a*x+b, where a=3 and b=8 Pretend we didn’t know that “a” is 3 and “b” is 8 Make a wild guess as to what “a” and “b” might be Come up with an update function where every time we call it, it makes “a” and “b” a little bit better Call that update function lots of times and confirm that eventually our line fits our data Conceptually take that exact same idea and apply it to these weight matrices Question: Shouldn’t there be a random element to avoid local minimums? Answer: The question is there a problem here that as we run this update function, might be get to a point where our function looks like this Currently we’re trying to optimize the sum-of-square errors and the sum-of-square errors looks like this A more complex function would kind of look like this So if we started here and gradually tried to make it better and better, we might get to a point where the derivative is 0, and we then can’t get any better, this would be called a local minimum The question was suggesting a particula approach to avoiding that Here’s the good news, in deep-learning you don’t have local mimimum Why not? The reason is that in an actual deep-learning network, you don’t hae one or two parameters, you have hundreds of millions of parameters So rather than looking like this, over even like a 3D version, it’s a 600 million dimensional space And for something to be a local minimum, it means that the stochastic gradient descent has wandered around and got to a point where in every one of those 600 million directions it can’t get any better And the probability of that happening is 2 to the poer of 600 million So for actual deep-learning in practice, there’s always enough parameters that it’s basically unheard of to get to a point where there’s no direction to go to get better So the answer is, no For deep-learning, stochastic gradient descent is just as simple as this, it’s exactly as simple as this We will learn some tweaks to allow us to get faster, but this basic approach works just fine Question: So what if you don’t know the derivative? [Time: 1.25 hour mark] Answer: That’s a great question For a long time, this was a royal GD pain Anybody who wanted to create stochastic gradient descent in a neural network had to go through

and calculate all the derivatives And if you’ve got 600 million parameters, that’s a lot of trips to Wolfram Alpha So nowadays we don’t have to worry about that because all the modern neural network libraries do symbolic differentiation In other words, it’s like they have their own little copy of Wolfram Alpha inside them and they calculate the derivatives for you So you’re never going to be the situation where you don’t know the derivatives You just tell it your architecture and it will automatically calculate the derivatives So let’s take a look Let’s take this linear example and see what it looks like in Keras In Keras, we can do exactly the same thing So let’s start by creating some random numbers, but this time let’s make it a bit more complex, we’re going to have a random matrix with 2 columns And so to craft out our y values, we’ll do a little matrix multiply with our x, vit a vector of [2,3] and then we’ll add in a constant of 1 Here’s our x’s, and here’s the first few y’s So here 3.2 = (.56 * 2) + (.37 * 3) + 1 Hopefully this looks very familiar, because it’s exactly what we did in Excel How do we create a linear model in Keras? Keras calls a linear model Dense, it’s also known in our other libraries as fully_connected So when we go Dense with an input of 2 columns and an output of 1 column, we have defined a linear model that can go from this 2 column array to this 1 column output The second thing we have in Keras is we have some way to build multiple layer networks, and Keras calls this Sequential Sequential takes an array that contains all of the layers that you have in your neural network So for example, in Excel here, I would have had 3 layers In a linear model, we have just 1 layer So to create a linear model in Keras, you say Sequential, pass in an array with a single layer, that is a dense layer A dense layer is just a simple linear layer We tell it that there are two inputs and one output This will automatically initialize the weights in a sensible way, it will automatically calculate the derivatives, so all we have to tell it is how do we want to optimize the weights And we will say, please use stochastic gradient descent with a learning rate of .1, and we’re attempting to minimize our loss of a mean-square-error So if I do that, that does everything except the last solving step that we saw in the previous notebook To do the solving, we just type fit (lm.fit) Before we start, we can say evaluate (lm.evaluate) to find out our loss function with random weights, which is pretty krappy And then we run 5 epochs and the loss function gets better and better and better using the stochastic gradient descent update rule we just learned So at the end, we can evaulate and it’s better And then let’s take a look at the weights They should be (2, 3, 1) and they’re actually (1.8, 2.7, 1.2) That’s not bad So why don’t we run another 5 epochs Loss function keeps getting better We evaluate it now and it’s better, and the weights are now closer again to (2, 3, 1) So we now know everything that Keras is doing behind the scenes, exactly Not like hand-waving over details, that is it We now know what it’s doing So if we now say to Keras, don’t just create a single layer, but create multiple layers by passing multiple layers to the sequential, we can start to build and optimize deep neural networks But before we do that, we can actually use this to create a pretty decent entry to our Cats vs Dogs competition So forget all the fine-tuning stuff because I haven’t told you how fine-tuning works yet How do we take the output of an ImageNet network and as simply as possible create an entry

to our Cats vs Dogs competetion: [Time: 1.30 hour mark] So the basic problem here is that our current ImageNet network returns 1000 probabilities in a lot of detail It returns not just cat vs dog, but it returns Animals->DomesticAnimals Ideally, it would be Cat and Dog here, but it’s not It keeps going – Egyptian cat, Persian cat, and so forth So one thing we could do is we could write code to take this hierarchy and roll it up into cats vs dogs I’ve got a couple of ideas here as to how we could do that For instance, we could find the largest probability that’s either a cat or a dog (from the 1000) and use that, or we could average all of the cat categories and all of the dog categories and use that But the downsides here is that would require manual coding for something we should be learning from data, and more importantly, it’s ignoring information So let’s say out of those 1000 categories, the category for a bone was very high It’s more likely a dog is with a bone than a cat is with a bone So therefore, it out to take advantage, it should learn to recognize environments that cats are in versus environments that dogs are in Or even recognize things that look like cats from things that look like dogs So what we could do is learn a linear model that takes the output of the ImageNet model, the 1000 predictions and uses that as the input, and uses the dog-cat label as the target And that linear model would solve our problem We have everything we need to know to create this model now So let me show you how that works Let’s again import our VGG model, and we’re going to try to do 3 things 1) For every image, we’ll get the True labels, isCat or isDog 2) We going to get the 1000 ImageNet category predictions, so that will be 1000 floats for every image Then we’re going to use the output of 2) as the input of our linear model and we’re going to use the output of 1) as the target for our linear model, and create this linear model and build some predictions So as per usual, we start by creating our validation batches and our batches, just like before I’ll show you a trick Because one of the steps here is get the 1000 ImageNet category predictions for every image, that takes a few minutes There’s no need to do that again and again Once we’ve done it once, let’s save the result Let me show you how you can save Numpy arrays Unfortunately most of the stuff you’ll find online about saving Numpy arrays takes a very, very long time to run and it takes a shitload of space There’s a really cool library called bcolz that almost nobody knows about that can save Numpy arrays very, very quickly and in very little space So I’ve created these two little things here called save_array and load_array, which you should definitely add to your toolbox They’re actually in the utils.py, so you can use them in the future And once you grab the predictions, you can use these to just save the predictions and load them back later, rather than recalculating them each time I’ll show you something else we’ve got Before we even worry about calculating the predictions, we just need to load up the images When we load the images, there’s a few things we have to do, we have to decode the jpg images and we have to convert them to 224×224 pixel images (because that’s what VGG expects) That’s kind of slow too So let’s also save the result of that So I’ve created this little function called get_data, which basically grabs all of the validation images and all of the training images and sticks them in a Numpy array Here’s a cool trick In iPython notebook, if you put “??” before something, it shows you the source of it So if you want to know what is get_data doing, go “??get_data” and you can see exactly what it’s doing, it’s concatenating all of the different batches together Anytime you’re using one of my little convenience functions, I strongly suggest you look at the source code to see what it’s doing They are all super super small

[Time: 1.35 hour mark] So I can grab the data (the validation data, the training data) and then I can just save it so that in the future I can, rather than having to watch and wait for that to pre-process, I can just go load_array and that goes ahead and loads it off disk It still takes a few seconds, but this will be way faster than having to calculate it directly So what that does is it creates a numpy array with my 23,000 images, each of which has 3 colors and is 224×224 in size If you remember (from Lesson 1) the labels that Keras expects are in a very particular format Let’s look at the format and see what it looks like Here they are, the format of the labels is each one has two things – it has the probability that it’s a cat and the probability that it’s a dog, and it’s always just 0’s and 1’s So here is [0 1.] is a dog, [1 0.] is a cat, [1 0.] is a cat, [0 1.] is a dog This approach where you have a vector where every element of it is a 0 except for a single 1 for the class you want is called one-hot encoding and this is used for nearly all deep-learning So that’s why I created a little function called onehot that makes it very easy for you to one-hot encode your data So for example if your data was just like 0,1,2,1,0, one-hot encoding that would look like this (1,0,0), (0,1,0), (0,0,1), (0,1,0), (1,0,0) So that would be the kind of raw form, and that is the one-hot encoded form The reason that we use one-hot encoding a lot is that if you take this and you do a matrix multiply by a bunch of weights, [w1,w2,w3], you can calculate a matrix multiply, these two are compatible So this is what makes you do deep-learning easily with categorical variables So the next thing I want to do is I want to grab my labels and I want to one-hot encode them by using this onehot function And so you can take a look at that You can see here that the first few classes look like so [array([[1.,0.],[1.,0.],[1.,0.],[1.,0.]]), but the first few labels are one-hot encoded like so [array([[1.,0.],[1.,0.],[1.,0.],[1.,0.]]) So we’re now at a point where we can finally do Step #2 So to remind you, Step #2 was get the 1000 ImageNet category predictions for every image So Keras makes that really easy for us We can just say model.predict and pass in our data So model.predict with train data is going to give us the 1000 predictions from ImageNet for our train data, and this will give it for our validation data And again, running this takes a few minutes So I save it and then I will load it So you can see we now have the 23000x3x224x224, it is now 23000×1000 So for every image we have 1000 probabilities So let’s look at one of them (trn_features[0]) Not surprisingly if we look at just one of these, nearly all of them are 0 So for the 1000 categories, only one of these numbers should be big It can’t be lots of different things, it can’t be a cat and a dog and a jet airplane Not surprisingly nearly all of these things are very close to 0, hopefully just one of them is very close to 1

So that’s exactly what we expect [Time: 1.40 hour mark] So now that we’ve got our 1000 features for each of our training images and for each of our validation images, we can go ahead and create a linear model So here it is, here’s our linear model The input is 1000 columns, every one of those ImageNet predictions The output is two columns, it’s a dog or it’s a cat We will optimize it with, I’m actually not going to use SGD, I’m going to use a slightly better thing called RMSprop which I will teach you about next week It’s a very minor tweak on SGD that tends to be a lot faster So I suggest in practice we use RMSprop, not SGD, it’s almost the same thing Now that we know how to fit a model once it’s defined, we can just go model.fit and it runs basically instantly because all it has to do … Let’s take a look at our model [lm.summary()], it’s just one layer with 2000 weights So running 3 epochs took 0 seconds and we got an accuracy of .9735 After another 3 epochs an accuracy of .9770, even better So you can see this is like the simplest possible model I haven’t done any fine-tuning All I’ve done is I’ve taken the ImageNet predictions for every image and built a linear model that maps from those predictions to Cat or Dog A lot of the amateur deep-learning papers that you see, one was classifying leaves by whether they’re sick, one was classifying skin lesions by type of lesion Often this is all people do, they take a pre-trained model, they grab the outputs and they stick it into a linear model and then they use it As you can see, it actually works often pretty well So I just wanted to point out here that in getting this .9770 result, we have not used any magic libraries at all It’s more code than it looks like because we’re saving stuff as we go We grabbed our batches, we grabbed the data We turned our images into a Numpy array We took the Numpy array and ran model.predict on them We grabbed our labels and we one-hot encoded them And then finally, we took the one-hot encoded labels and the 1000 probabilities and we fed them to a linear model with 1000 inputs and 2 outputs And then we trained it and we ended up with a validation accuracy of .9770 So what we’re doing is we’re digging right deep into the details We know exactly how SGD works, we know exactly how the layers are being calculated And we know exactly therefore what Keras is doing behind the scenes So we started way up high with something that was totally obscure as to what was going on (we were just using it like you might use Excel) and we’ve gone all the way down to see exactly what’s going on and we’ve got a pretty good result So the last thing we’re going to do is take this and turn it into a fine-tuning model to get a slightly better result What is fine-tuning? In order to understand fine-tuning, we’re going to have to understand one more piece of a deep-learning model, and this is activation functions This is our last major piece I want to point something out to you In this view of the deep-learning model we went matrix multiply, matrix multiply, matrix multiply Who wants to tell me how can you simplify a matrix multiply on top of a matrix multipy on top of a matrix multiply? What’s that actually doing? A linear model on a linear model on a linear model is itself a linear model So in fact, this whole thing could be turned into a single matrix multiply because it’s just doing linear on top of linear on top of linear

[Time: 1.45 hour mark] So this clearly cannot be what deep-learning is really doing, because deep-learning is doing a lot more than a linear model So what is deep-learning actually doing? What deep-learning is actually doing is at every one of these points where it says activations, with deep-learning we do one more thing, which is we put each of these activations through a non-linearity of some sort There are various things we can use Sometimes we’ll use tanh, sometimes people use sigmoid, but most commonly these days people use max(0,x), which is called ReLU, or Rectified Linear So when you see rectified linear activation function, people actually mean max(0,x) So if we took is Excel spreadsheet and added max(0,x) and we replaced the activation with this for each layer, we now have a genuine modern deep-learning network Interestingly it turns out that this kind of neural network is capable of approximating any given function In the lesson, you’ll see that there is a link to a fantastic tutorial by Michael Nielson on this topic What he does is he shows you how, with exactly this kind of approach, where you put functions on top of functions, he actually let’s you drag them up and down to see how you can change the parameters and see what they do He gradually builds up so that once you have a function of a function of a function of this type, he shows you how you can create arbitrarily complex shapes So using this incredibly simple approach where you have a matrix multiplication followed by a rectified linear (like max(0,x)) and stick that on top of each other on top of each other, that’s actually what’s going on in a deep learning neural network And you will see in all of the deep neural networks we have created so far, we have always had this extra parameter “activation=” and generally you’ll see “activation=’relu'” That’s what it’s doing, it’s saying after you do the matrix product, do a max(0,x) So what we need to do is we need to take our final layer, which has both a matrix multiplication and an activation function, and what we’re going to do is we are going to remove it And I’ll show you why If we look at our model, our VGG model, let’s take a look at it (vgg.model.summary()) What does the input look like? The very last layer is a dense layer, the very last layer is a linear layer It seems weird therefore that in that previous section where we added an extra dense layer Why would we add a dense layer on top of a dense layer, given that this dense layer has been tuned to find the 1000 ImageNet categories Why you would you want to take that and add on top of it something that has been fine-tuned to find cats and dogs? How about we remove this and instead use the previous dense layer with its 4096 activations and use that to find our cats and dogs? So to do that, it’s as simple as saying model.pop (that will remove the very last layer), and then we can go model.add and add in our new linear layer with 2 outputs, cat and dog [Time 1.50 minute mark] So when we said “??vgg.finetune” earlier, here is the sourcecode – model.pop, model.add

a dense layer with the correct number of classes So it’s basically doing a model.pop and model.add(Dense,..) So once we’ve done that we will now have a new model which is designed to calculate Cats vs Dogs, rather than designed to calculate ImageNet categories and then calculate Cats vs Dogs So when we use that approach, everything else is exactly the same We then compile it, giving it an optimizer And then we can call model.fit Anything where we want to use batches by the way we have to use something underscore generator So model.fit_generator and we pass it in batches, and if we run it for 2 epochs, you can see we get 97.35 If we run it for a little bit longer, eventually we will get something quite a bit better than our previous linear model on top of ImageNet approach In fact, we know we can We got 98.3 with fine-tuning earlier So that’s the only difference between fine-tuning and adding an additional linear layer; we just do a pop first before we add Once we calculate it, I would then go ahead and save the weights and then we can use that again in the future So from here on in, you’ll often find that after I get my fine-tuned model I will often go model.load_weights with ‘finetune1.h5’ because this is now something that we can use as a pretty good starting point for all of our future Dogs and Cats models Okay, I think that’s about everything that I wanted to show you for now For anybody who is interested in going futher during the week there is one more section here in this lesson showing you how you can train more than just the last layer But we’ll look at that next week as well So during this week, the assignment is very similar to last week’s assignment, but just take it further Now that you actually know what’s going on with fine-tuning and with linear layers, there’s a couple things you can do One is, for those of you who have not yet entered the Cats vs Dogs competition, get your entry in And then have a think about everything you know about the evaluation function, the categorical cross-entropy loss function, fine-tuning and see if you can find ways to make your model better And see how high up the leaderboard you can get using this information Maybe you can push yourself a little further, read some of the other forum threads on Kaggle, on our forums and see if you can get the best result you can If you really want to push yourself then, see if you can do the same thing by writing all of the code yourself, don’t use our fine-tune at all, don’t use our notebooks at all See if you can build it from scratch so that you really understand how it works And then, of course, if you want to go further, see if you can enter not just the Dogs vs Cats competition but see if you can enter one of the other competitions we talk about on our website, such as Galaxy Zoo, or The Plankton competition, or the State Farm Distracted Driver competition

PreviousNext

You Want To Have Your Favorite Car?

We have a big list of modern & classic cars in both used and new categories.