top of page

Ross Wightman on The Robot Brains Season 2 Episode 7

Transcript edited for clarity to the best of our ability, however, errors may be included due to the nature of the audio recording.


Pieter Abbeel: One thing every researcher knows, but few people outside of the community might know is that most foundational research is actually happening in open source. Indeed, most researchers published their research findings openly on arXiv, a freely accessible, open repository originally mostly used by physics researchers while the code underlying many AI breakthroughs is often published on GitHub, where it's readily accessible to anyone. Usually, these papers and code bases originate from the leading universities and tech companies. But the open culture in the community means that in principle, aspiring researchers can independently access and study the latest A.I. breakthroughs and code bases from their start, making their own contributions. I'm saying this in principle because that's how I used to think of it. In principle, possible, but likely too tall an order for anyone to make happen. Today's guest, Ross Wightman, has shown this is actually possible. As an independent researcher, Ross has grown into one of the most prominent contributors to A.I. research and code bases. In a previous life, an engineer at a Canadian unicorn startup, his current full time occupation is building new models that are freely available on GitHub for use by anyone and everyone. Ross, so great to have you here with us. Welcome to the show. 


Ross Wightman: It's amazing to be here. 


Pieter: Well, before we dive into what you're doing today, let's maybe talk a bit about the journey, how you got here. As I understand it, you actually used to be an engineer at a Canadian startup. How did you land there and how did you become an independent AI researcher and contributor? 


Ross: Well, OK, that story goes back to 2004. I was working with a company doing scientific imaging. So cameras that go on micro microscopes, low light, low noise. A group got together and decided they wanted to make great cameras for other applications. So we started the company that made surveillance cameras, essentially IP video cameras. We built the camera from the ground up, all of the software in the back end that recorded then transmitted them over networks for viewing and accessing recording events and etc. So that start, I was with them for for nine years from pretty much Day One. I was the original firmware developer, and then I ended up taking over the software team as a sort of system architect and tech lead there. And then, yeah, by the end, I was the director of Software Firmware and we had built a pretty amazing system. And at some point it was time to move on to other things. 


Pieter: So, well, first of all, I think it's really interesting because in today's world, cameras seem just so readily accessible. And you know, building your own camera system seems almost crazy. But I remember from my own Ph.D. student days working at the Stanford helicopter project. We had to put cameras on the ground to track the helicopters. And it was a big endeavor to figure out what are the right cameras, which ones have enough bandwidth, enough resolution, and low enough latency. And it was just a whole research project in itself to even buy the correct cameras for the problem. It sounds like you actually went ahead and completely built your own cameras. 


Ross: Yeah, it was pretty crazy. From like the initial trial one week old version, one of the cameras was through a FPGA. So we were writing the deal code and like controlling the sensors with logic ourselves and streaming the data through the FPGA over the network. And then some pretty good SASE chips became available that had all the compression code built in. So we started using those and it saved a lot of time and effort. And now, you can pretty much just buy camera kits that have most of the code in the chip set up. And you just put it in your own housing or you can buy an OEM camera and tweak some of the APIs inside to match them.


Pieter: Back then, computer vision wasn't really working here right when you started doing this. So maybe the cameras were just streaming, but not really intelligently processing the data yet. 


Ross: Yeah, that was a huge change and actually a kind of scary change back then. It didn't seem like such a big deal making cameras for surveillance applications. There were always humans monitoring them. Or after the fact, you'd go back and look at the recording. The analytics at that stage were all pretty much hand-tuned computer vision. They didn't work very well at all. And then a few years after I left that company and I started tinkering again, I was like “Wow, like, this is incredible what you can do now. It's also very scary going back to that application, specifically what's possible.” And so at this point, I'd very much try to avoid overlapping those two worlds. I know there's still lots of people working on it, but it needs a lot of sensitivity in how you tackle the problems and how you deal with your data and what the end use cases are now. 


Pieter: Why did you feel ready to leave the company and what was next for you at that moment? 


Ross: I feel I'm an early stage company type person. I like to have my hands in everything. Things were going very well with that company, but it was starting to grow into a larger company. The pace was starting to slow a little. Things are becoming a little bit more formal, and the environment was just changing to the point where I decided I'd be better off on my own or maybe starting something new 


Pieter: At that moment, as I'm reading up on you over the last week, it seems like one of the things you started doing was angel investing. And the other thing is kind of just hacking on your own, plugging away at building your own A.I. systems which will be the main part of our conversation, of course. But I am curious to touch upon the angel investing part. How did you get into that at the moment and how is that going? 


Ross: It's going well. It comes in fits and spurts. Yeah, after I left the company, I wanted to support other startups, especially in the Vancouver ecosystem. So I joined some angel groups and started going to pitch events and talking to founders of interesting companies. And when I found a fit or an interesting idea and I’d make an angel investment, I still do that. Its just that I don't often find companies that are in an area that I'm interested in or doing something exciting. Vancouver is definitely a much smaller startup ecosystem than the Valley, San Francisco or many other American cities or even worldwide cities, but it's been improving steadily over the years. 


Pieter: Well, often having strong universities is the feeding ground or one of the big feeding grounds, as well as big tech companies where people get tired of working at a big company and want to do something new. So it seems like in principle, Vancouver should have it.


Ross: Yes, it's definitely growing in the startup ecosystem. I definitely like to see more hardware startups here compared to the ambitious ones in the States. The targets in Canadian startups are a little more modest in tackling the extra costs and complexities of hardware products or hardware and software hybrids, especially robotics. I don't see too many companies like Covariant in Vancouver. There is one in particular that I was quite excited about recently. But yeah, not too many of those. 


Pieter: Yeah, I guess so. Of course, robotics is always, in some sense, one or two notches harder than something purely software-based because once you're interacting with the physical world, mistakes become more costly. You can break physical things, which tends to be harder to repair and so forth. I can see why there is a natural trend towards pure software-based products whenever possible. Now, a lot of people, when they want to get into AI, they tend to join either a university or a big company that has an AI research lab as part of their efforts, right? But in your case, you just started tinkering with it on your own. 


Can you say a bit more about that? How did you even start? What was your initial thing you tried to do and grow from there? 


Ross: Well, after leaving the company, I started looking for startup ideas. What would be next? So I spent quite a bit of time brainstorming and tinkering and a lot of the ideas and interesting things that I was seeing out there all revolved around A.I., so that became a focal point. I didn't actually know that much about it at the time, so I figured, Well, I've got to learn and I learn by doing so, I found Kaggle. And that's really how I started my journey in deep learning. Through Kaggle, I started entering in some challenges and that was highly active, so at some point I had to stop that, but I learned a lot through doing that. And that's actually how TIMM, the PyTorch image model library, started. I was collecting models for different vision based challenges on Kaggle. Eventually, that became the thing that I started working on more and more as opposed to entering in the challenges. 


Pieter: Now, I'm not sure everyone is going to be familiar with Kaggle. What is it?


Ross: It's a data science competition platform where data scientists, researchers, engineers, anybody really from around the world can compete in different challenges or on different data science topics like vision and LP. Different companies and organizations will come to them with a problem, often a dataset included and a metric that they'd like to evaluate the challenge on. And then people will enter in that and come up with solutions. And there's a leaderboard through the whole challenge. And then at the end, there's a winner or a number of winners ranked on the final Test metric 


Pieter: We've often talked about the imminent competition where, you know, there was an image recognition competition that really led to the breakthroughs in AI that we're seeing today at the realization that deep neural networks are best at the promise we're trying to solve. And effectively, Kaggle is like multiple ImageNets running in parallel at the same time, with datasets being posted, competitions being run on all kinds of problems, right? And the beauty is that, I mean, there's two sides to it, and one side is what you are doing, you're solving the problems. But if somebody wants a problem solved, they can also post on there and just see how all people do on it and see if the problem is solvable or not. If you know, some competitors try hard. 


Ross: Yeah, it’s a pretty exciting platform and a great way to learn and get involved, even if you don't rank highly. It's pretty competitive, especially now. There's so many people on it, but you still can learn a lot, and it's just going through the forums and reading the solutions to the different challenges. It's pretty, pretty educational and eye opening. How creative some of the solutions can be. 


Pieter: I don't know about that. Are typically the solutions published? 


Ross: Not always full open source. It depends on the terms of the specific challenge. But most of the winners will be more than eager to share their solutions, at least at a very high level on the forums. So if you're going after a challenge as ended, you'll typically see like solution for place number two, number three, and you can read through and see some of the block diagrams of their solution and some how they tackle the problem, even if they don't release the whole code. But sometimes it is released as well on GitHub or wherever. 

Pieter: So it's so interesting for me to learn about. The way I would start learning typically would be when I go to classes online and, you know, listen to what's being said there and try to do some homework exercises and work my way through. But in your case, you're just like, OK, Kaggle is where people compete, where the best people who want to, you know, test their their capabilities effectively on new data sets, go and, you know, try it out and you just dove right into that, which is really intriguing to me. 


Ross: Yeah, that's definitely my learning style. I mean, I try to watch the videos and do as many online courses as I can, but I usually don't make it very far before I get the urge to start hacking and tinkering. I've gone through some of your lectures. And there is like so many different resources out there, but I always come back to just getting into the code and trying to make things work. 


Pieter: Now, of course, the beauty of the code is that in some senses, right? I mean, a lecture has abstractions as higher level explanations, but doesn't always cover how you get it to work because it's often stated that the more mathematical symbolic level. Whereas once you're busy with a code, you know that if you have it up and running, you're all set. This this is real, and especially since the machine learning this Dorcas trained and test data and the test on Kaggle is actually you don't have it available to yourself. It's run by the organizers. You actually know for real whether your system is it's doing well or not. Yeah. Now the other thing you mentioned and I we haven't really talked about much in the podcast, if at all, is that as you participate and start participating in casual competitions, you started using PyTorch. Can you say a bit about what is PyTorch and how come it plays such a big role in everything AI deep learning these days? 


Ross: Well, PyTorch is a Python based machine learning framework. Say fairly specifically focused on deep learning, but you can use it for pretty much anything. It's I think it's one of its more and more popular frameworks at this point in time. Especially among researchers who want to build things quickly, experiment, iterate fast, it's got a great community, which was what drew me to it in the first place. And just like from the ground up, it was designed to be, I think to be easy to use, easy to experiment with had quite a bit less boilerplate than some of the other options out there, although many of them at this point are converging to a very similar interface and user experience. So PyTorch is now starting to look like many of the other options out there, or they are starting to look more like PyTorch. Well, yeah, I still stick with my torch. I've been playing with actually jacks a bit too recently and I've been enjoying that. It's pretty powerful and pretty fun to to play with. And I think it will be pretty useful going forward, especially if you're doing larger scale machine learning, deep learning on on TPUs and many distributed systems 


Pieter: Going one step deeper, maybe when you're working with PyTorch or other deep learning frameworks. What is it that you're actually doing as you try to compete in a Kaggle competition? 


Ross: Well, you're looking for, I guess if you're using deep learning as your approach, you're looking for network architectures that can perform well on your problem, have enough capacity, but also at the same time will work within the constraints of the hardware that you have. Many of the Python based libraries, especially PyTorch, make it really easy to switch out different networks, change the layers, change the the size of your models by scaling the amounts of layers that are stuck together. The image sizes If you're doing an image challenge that you're feeding to the network, you can easily iterate through different choices of parameters that control the model architecture, the size of it, or the optimization parameters. So what kind of optimizer you might be using? What the learning rate is? And then also, additionally, the data that's being fed into the network, that's often the most important part. Making sure that you're handling the data well, inspecting the data Python notebooks are especially useful for for doing analysis of your data and with the Python with Python based frameworks. Plus the notebooks you can iterate through training your model, looking at your data, analyzing the results with your your data in mind, and just keep quickly iterating and improving the results. 


Pieter: Now there you are. You're playing or playing or working, slash playing with with partners and you're competing in these Kaggle competitions. I mean, was there some kind of ramp up? How did you see this evolve for yourself from, you know, maybe initially being not so competitive to actually doing well in this competition? What were some of the things maybe that in your learning process were really helpful? 


Ross: Yeah, I dove in there when I started, I didn't know what a resonant was and think when I got into cable. It wasn't that long after I started with Torch7, which was like the old and started participating in the forums was huge. Seeing what other people were doing, getting feedback from the community was was pretty was pretty important to those early days. It always goes back to the data, which is I love the modeling aspect and that's what Tim's been focused on. But really, when you're doing the challenges, having a good model is important, but handling your data appropriately is really the key. And for Kaggle, especially also understanding the metric that you're your challenge is evaluated on, it is really important, and some challenges are won or lost based on hacking the metric almost which, you know, maybe not the thing that you want to do in industry, but then Kaggle, it can get you a gold medal. 


Pieter: Now it's interesting because you start out, you know, thinking you're ready to start your next company, right? And you realize, Hey, I saw a lot of the innovation is happening. You want to become familiar with that. You start diving into Kaggle competing and then at some point something happened, right? Because right now, your deep learning models that you release on GitHub are often the standard reference, especially on. Tim, the the torch image models repo, it's it's it's coming to reference for anybody out there and there you are. You're just this independent research you're doing here, you know things on your own. All of a sudden, people start using your models. Was there a moment where you just saw this transition, where all of a sudden you felt like you're just playing in competitions and all of a sudden people are using your work? 


Ross: It's just sort of it's amazing what has happened, but I've spent time going back trying to figure out where if there was any aha moments and I think it was just a long continual evolution and grind. I mean, I'm a I'm a grinder. I plug away at things until I until I solve the problem or get the result that I'm looking for. So, yeah, after I decided to focus more on on Tim, it's just been slow but steady uptick. I was looking at my star, the chart of the GitHub stars recently for a presentation that I was preparing. And yeah, it's been a it's been a pretty steady ramp with a bit of an acceleration in the past year, especially just more and more people finding out about it. It's definitely like everybody on cable these days that's doing image challenges. Seems to be aware of it. It's it's used commonly there. People have captured some of the weights and models in these standalone notebooks that can be used in the offline challenges there. And then now researchers and companies organizations are definitely using it based on messages that I'm getting in discussion forums and on the GitHub repo and whatnot. There's been a couple, I think, key model architectures or papers that I've reproduced that sort of caused a little bit of a bump here and there. The Vision Transformers was a really big one where the the Google, the Google Brain Zurich Group released that paper and then I had the some code and the train weights up before they managed to get their version out. And I think I had to change like two or three lines in my paper based reproduction before for it to match the actual official code, which was pretty, pretty neat. And then from that point, many of the vision transformer variations have been based on the TIMM code with the warts and all. I had a couple little mistakes in the first version that I smoothed over with a Boolean flag here or there to make it work with my original one or the official one. And that somehow managed to propagate into pretty much every vision transformer implementation that I've seen so far. 


Pieter: Hmm. And that's a really big deal, right? Because I mean, if we think about A.I. in the last ten years, 2012, imagine that moment with convolutional networks, especially a specific type of knowledge that architecture was trained to get the best image recognition performance. And since then, essentially convolutional networks, a recurrent known that works for sequence modeling, were for many years the main architectures. But then a few years ago, the transformer architecture was introduced, mostly in natural language processing at first. But then the Vision Transformer was, in some sense, the first big breakthrough of that new architecture in computer vision. And when those things happen, the devil's in the details is always my impression. I'm sure you feel the same. And when the first time something like that happens, it's very hard to reproduce. And so when you come out with a piece of code, I can actually do it just from the paper. That's pretty unique in the early days of such a new, you know, knowing that architecture three able to train it properly. I noticed that. I mean, you've been hacking away cargo competitions, then your code becomes more widely used and then actually you start writing papers and you you put a paper on archive. How to train your vision transformer. How did that come about?


Ross: That came about due to actually my reproduction of TIMM in the Vision Transformer. The Google Group that worked on that made that paper. Lucas reached out to me afterwards and we started chatting a little bit. And the transformer models had come out around that time by a Facebook group, and we were discussing the merits and differences. And the Google Group really wanted to do some more work on training with, I guess, smaller datasets or open accessible datasets like Image Net. And show that it was easier to or is better to do, transfer learning from more data than to try and crank up the augmentations and regularization and train just on imaging at one K. So they involve me in that paper because it was very much focused on, I guess, a practical application of visual vision, transformers and how to train them well. And so we kind of did this hybrid. They did some of their research on their implementation with their TV use. And then I was doing some of the experiments on GPUs in the TIMM code base and we kind of pulled it all together to make some observations about augmentation and regularization in the context of training your vision transformer as well. Since those models are very data hungry and they benefit from either significantly larger data sets or really, really, really cranking up the the augmentation of your smaller data set to, I guess, essentially make it appear as if it's larger. If you're augmentations are like convincing and that they they fit the the natural and natural images that you're using. 


Pieter: Now I'm curious because you're connecting with the Google team at the time. And I mean, I can't imagine at this point, people are also trying to recruit you, not just tried to collaborate with you, but then you know they must, you know, at least hint at this notion of, well, maybe you could just join Google or could join, you know, any of the other companies. And so but but you remain an independent researcher, right? And I'm curious, what's your thinking around that? And you know, why is it so exciting for you to remain an independent researcher? Well, I 


Ross: Guess I'm a bit of a lone wolf. I like to wake up in the morning and be the one to decide what I'm working on that keeps me engaged and I move between so many different projects and ideas. And I really enjoy that. And I guess being in a position where I don't have to rely on the paycheck. Yeah, it's just I prefer it this way. There's definitely been conversations with different companies, and some of them have been intriguing. But at the end of the day, signing up just hasn't made it to well on my priority list versus what I'm doing right now, which is building what I want to do, exploring the ideas that I want to and contributing to open source. 


Pieter: Well, it's hard to imagine having it having a bigger impact than the way you're open sourcing your models and they're being used, I mean by pretty much everyone envisioned looks at your models built on top of them. So it's hard to imagine having a bigger impact by, you know, putting yourself inside one specific company. But at the same time, when you're inside a company, often you know there are other benefits like you might have bigger compute resources, which does play a role and air and deep learning. And so I'm curious, what are your thoughts on that and how do you ensure that you know you can always run the experiments you want to run?

Ross: Yeah. Well, that's definitely been a challenge that I've run into where I run into quite regularly. I, as I get further into it, the experiments that I want to run and start getting bigger and bigger and require more and more data, more compute, especially in the past year. Some of the side projects I've been working on, especially related to video and some of the other multimodal models, I'd like to do more experiments on that sort of clip daily style models. They require huge data sets and lots of compute. So the TensorFlow research cloud or I guess, it's TPU research cloud now I'm a part of that, and that's been super helpful recently. Access to TPUs in the cloud for research purposes from Google, I actually spent some time on TIM adding support for PI towards Excel, I guess almost a year ago now, and it's been working pretty well for me. So I've been training many recent models on use. Some conversations with Graph Core might be able to get some experiments running on their hardware. Once I add some support, that's still pretty early days. We are definitely. And also in video they some some calculators that I know that work and in video really pushed my case and they sent me a refurbished, I guess, one of their demo units of a GE station, the V100 version. So that was quite helpful. As well as hitting my garage right now. 


Pieter: Yeah, winter time for you. You got plenty of heating going on then once you have some of those GPUs running full time. 


Ross: Yeah, I'm in Whistler, and it's already below zero and the garage is like the tropics, it's crazy. 


Pieter: Yeah, maybe maybe you need to distribute your compute cluster over other other houses near you, and nobody needs regular heating anymore. 


Ross: I should start. That should be my business to heat your house with training models. I think I've seen a company actually that's proposing to do some sort of furnace heater based on some sort of accelerator.


Pieter. See? Might as well, seems a waste of energy in data centers…..Now, I think that's really intriguing. I mean, the companies are just starting to realize in many ways the importance of this open source contributions and how is, you know, helping them in so many ways. And that's why why they're helping you, right? Because they're seeing that it helps what they're doing. It helps other people get up to speed on them. Even if they cannot hire you, they can hire other people who know how to use your models and learn from you. And the interesting thing that I hadn't seen before until recently pointed out is that GitHub explicitly allows people to effectively endorse other people. And I was browsing that yesterday because I hadn't seen this before. And so when I browse your profile, it turns out people can. And there's a few things, and I think you came up with those I imagine thought they were really cool and funny and interesting. It's possible to buy you a beer for those times when three thousand plus hours are tossed out the window due to bad hyper parameter settings. So next thing that happens? Let me know I'd love to buy you a beer. 


Ross: Yeah, the GitHub sponsors was a pretty interesting program, and I signed up for that eventually as my costs and especially in cloud, started rising. And it's been helpful. Several organizations have contributed like HuggingFaces.


Pieter: Yeah, I noticed. Also Andre Karpathy himself, director of Tesla. He's one of the sponsors. That's pretty awesome. And I noticed one of my students, actually. It's one of your sponsors. Arvind Srinivas is one of your sponsors. I'm like, Wow, my students are sponsoring Ross. This is so awesome. 


Ross: Yeah, yeah, no. We had some interesting conversations about some of his hybrid transformer models like HaloNet and BotNet, and he really liked the repository, so he decided to sponsor me and contribute. 


Pieter: Yeah, and it's pretty amazing, and I recommend everybody to check it out because you put some creativity into the different kind of sponsorships you can take on. Also, really like the burger that is fuel for late night debugging. I'm just imagining, you know, working late at night and ordering a burger to keep it going. Now on this topic, I mean, you mentioned it's nice. You have no boss. You can focus on the things you want to learn, the things you want to contribute to, and you can do it all in open source because I mean, it's your work. You can put it out there for anybody to use. What does that mean in practice? What is a day in the life of Ross Wightman look like? I mean, today, I guess you're here. Wake up and you're on the podcast, but let us pick a different day than today. What does it tend to look like? 


Ross: Yeah. What is it? That's a good question. Yeah, I get up and get through the normal start of the day breakfast type stuff, and then I sit and go upstairs and sit in front of my computer. And basically, there's so many things that I have written down on some list of things that I could do. Sometimes it can be actually overwhelming just to sit there and be like, Wow, that's a lot of things that to tackle and check off. So I try to, I guess, cycle between longer term vision projects and taking little chunks of those. And I guess the more, more immediate, smaller tasks like, you know, fix some bugs and this model. And one of my main goals as a sort of a self-directed researcher developer is just to always get something done, not to get bogged down in all of the possibilities or all of the shiny things that can distract you. But every day, to at least pick a couple of things that I can make some forward progress on. And ideally by the end of a week, show some movement on some different tasks. So if I have a really challenging project or idea that I'm working on that. maybe I'm roadblocked, and I just can't make it through an abstraction or detail in a bottle. It's not training properly and is blowing up. I'll put it down and then go back to the bug list or the other tasks and just pick something that's maybe a little simpler. Switch gears. Get that done. Check it off, make some progress and then tackle the harder problems on another day or another week. And then like back to that, also just part of promoting Tim and making people aware of it is to have something every week or two to to have a tweet about something interesting, a new development to keep engagement up. And I have a goal for for making that progress to be able to share it with people. 


Pieter: Now when you're when you're I mean, I definitely know the feeling of having a whole list of things that's way too long to make progress on everything. You know, one day or even one week and the importance of, you know, just being happy to take one thing at a time. But I'm curious, in your case, when you take that one thing and your go about your day. I mean, is this a day where you're just sitting there on your own? Is there a lot of online interaction with other people? What what is it like? You know, when you're actually working on something, 


Ross: It’s usually just me hacking away. I get into the zone and I'm kind of gone to the world working or whatever I'm working on. The interactions tend to come when there's issues or bugs or feature requests where people are bringing me. Emailing me or through the GitHub discussions or issue tracker is like asking questions about what I've built or I can't reproduce this or this is not working kind of thing. That's where most of my interactions come. Also, with some organizations that are using using the models. There's conversations there, but on the day to day to day basis when I'm developing, it's like me, the code and a bunch of papers all over the screen and just kind of getting away at it until I get hungry. And it's like, OK, time to go eat something. 


Pieter: And are there any, any things that you do to kind of mix things up? I mean, like, you know, go for runs or play a musical instrument? Or are there other things you do to, you know, maybe keep saying? Because I imagine you cannot code, you know, 16 hours a day. 


Ross: Yeah. Well, I mean, I've got a 19 month old toddler. So that was a big, big change in life. And that's where most of my time outside of this is spent these days. Also, I mean, the angel investment thing that we discussed earlier, that's also definitely I spend some time, I'll allocate some time for that or for discussions with different companies and due diligence and whatnot. Before the toddler, being in Vancouver, Whistler. I'm a very avid hiker skier, so lots of off trail scrambling, hiking, backcountry skiing. With a toddler, we go walking on the trails nearby, just not quite as adventurous or no like 20 inch vertical meter days anymore. 


Pieter: Yeah, but I hear that children at a young age can learn to ski quite quickly, so you might not have to wait that long before. That's the thing again. 


Ross: Yeah, we just had some snow in the driveway yesterday, actually, and we were trying to convince them to try and put on these little plastic skis with a little clip on the front to see if he wanted to go down the driveway. What he was like. No, I just want to ride my bicycle. 


Pieter: Oh yeah, he likes riding a bicycle. I think skiing is not too far off. It might just be another half year or a year. Yeah. Now you put out another paper recently. Actually, I love the title, “ResNet strikes back.” Can you say a little bit? How did that come about? And it did. In my mind, it's really deep learning. The devil is in the details, and it's people like you that spend so much time on the details that come up with new insights. But how does it come about for you? And what are some of the new insights in this work? 


Ross; Yeah, that paper, I think it was a long time coming actually. For people who are familiar with TIMM, they'll know that I've trained a lot of my own moderates on ImageNet over the past year or two, often to better accuracy, better performance than many of the original weights that were trained when the models were first introduced in their original papers. So in doing that, I was often mixing up recombining different training ingredients, especially on the augmentation side and getting some really good results. And then I would see new papers of new architectures coming out, and they were often comparing their new architectures with ResNet But going back to the original paper and the original accuracy numbers posted there, which were trained not using many of the techniques that are now common. And so I felt often felt that the comparisons were not fair and say, Yeah, well, your architecture is interesting for sure. But claiming that your three percent or whatever better than ResNet. It's not actually true because you focused on your architecture and setting up the training recipe for that. So that performs very well. But then the same care and attention wasn't spent on the baselines, which it doesn't make sense. It's hard when you're a researcher with limited time and limited resources to focus on both your new idea and also spend the same amount of time on all the other existing architectures. But I guess like at least some amount of effort or some awareness of whether there's better baselines or can the comparison can be made a little bit more fair? And I often find that some of the techniques are there. Architectures are optimizers or other augmentations. The new edition can be in the noise versus when you when you compare to better baselines, and the results might not actually be that significant. If you spend a little bit more time plugging away at the the alternatives, 


Pieter: I know this might be a little too detailed for some of our audience, but I'm really curious as you did this work, Ross. The devil's in the details you uncovered, the details that matter. What are the details of the training set up? The details of the augmentation that matter the most for that to do really well?


Ross: Well, the key augmentations in the “ResNet strikes back” paper that I think really make a difference is used extensively or applied quite heavily…[unintelligible]...So the augmentations being something like, say, rotation translations skew also some color augmentations like inverting the image completely. The color channels solar is and pasteurize, which shift the bits and the image around almost to the point where you're like, that is doesn't look anything like the like the original bird that was in the picture. So it applies very heavy augmentations and you train for four more epochs. So you run through your data set many more times with much, much harder to recognize images, but the same optimization problem and the the model doesn't over fit as quickly. It actually learns better. It's a learns more robustly. Just having seen all these really, really challenging versions of the picture cut, mix and mix up are related in. That mix is taking pieces of one image in your batch and flipping it with other images in the batch and then modifying the actual target labels to say, Well, you've got a little bit of a bird and a little bit of the dog, so we're going to have both of those activated instead of just one hard and mix up is similar, except it's the whole image. It's overlaying one on top of the other. So it's like blending two images and the corresponding labels. There's the theoretical details on those and how they work. It's in the in the papers. But in theory, the changes, the optimization landscape and somehow make it easier to learn again with deep learning. Some of the theory is like, Well, is it correct or does it just work? It's hard to say sometimes. And then also, random racing is another one that's kind of deployed with those, and it basically takes parts of the image and just blocks them out with noise. And those were all combined into him in a way that wasn't really, to my knowledge or at least done before. So all of those those are all separate papers and separate implementations. Different researchers, different code bases had different versions of them. And in Tim, I replicated all of them and kind of spent a lot of time making sure they integrated and worked well together and tweaked a couple details here and there that seemed to improve the training, especially trying not to disturb the image. Just like the image statistics you're mean and the standard deviation of the input images. Some of the like, for instance, the random racing were also cut out is another a similar augmentation. We'll just use a black image, so it'll erase part of the image with the black box. But that can change if you're doing that for every image, it will potentially change the mean if you're black is not lined up with the mean of your your dataset, or it will change the standard deviation of the input images, which could impact the batch norm running medians and variance stats and like prevent the model from converging well, especially later in training, I found. 


Pieter: So it's very interesting you make effect of the input almost or often entirely unrecognizable to humans. Yet the network trains better by being forced to also make sense of those new image inputs that are variations. Wild variations on the original Dana. Yeah. And anything on the training side of things that was really important to get the maximum performance. 


Ross: There's sort of the overall training. Training longer. So the original ResNet is a very common learning schedule. Drop your learning rate by one tenth every 30 passes through your data set for a total of 90 passes. For four, it was not in the air strikes back. We had several recipes and the longest was 600. And I've done some 800 and 900000 runs where you can still squeak out a few more fractions of a percent. But it's definitely diminishing returns at that point. And then the schedule of it was just cosine annealing in that. So of course, I'm learning right schedule. So nice kind of hands off schedule that tends to perform quite well, although I'm sure there are other options that could be deployed there as well. But definitely, I think it's an improvement on the on the step. 


Pieter: Now, I think bigger picture wise, it's really interesting what you did there because and I think it's part of a bigger trend actually in our research community, it's still a smaller trend maybe than most people wish it was is done. Traditionally, people love to focus on one specific detail effectively of a new on that architecture or of a, you know, data augmentation or something very specific because that's how research tends to move the fastest. In many ways, you think about one specific new idea and you test out specific idea, but what you did here, you essentially brought together many of the ideas that were put forward independently in the past and showed how they can work together and together can actually do much better than anything that had been done before with resonant architectures, putting it all the way to the state of the art. Even though people thought that residents were not state of the art any more. Now you show actually they're still state of the art. You just need to bring in all these ideas that have been in individual papers. Bring it together. And now through your open source code base, everybody can just use it going forward, which is really phenomenal for for progress as a community, I think. 


Ross: Yeah, definitely. I mean, I had the luxury of, I guess, time to agree the degree to be able to explore that combination without a conference deadline or paper quotas or targets to meet. I could just build, explore, test on my own timeline. And I guess that allowed me to to do that where as many researchers are definitely focused on. OK, we got to come up with something new, get a paper out, meet the jobs deadline, meet the eye clear deadline and it becomes harder to have. You don't have as much time or freedom to explore, combining past ideas and results. 


Pieter: I'd like to know since you took your time in Russia, curious your path from going from your first Kaggle competitions that you participated in to? I mean, today. I mean, when did you start? How many years have you been doing this at this point? 


Ross: That's a good question. It kind of gradually, I guess, phased in to this current mode where I'm quite focused on TIMM, the Kaggle challenges. We're a little bit on and off mixed with some angel investing activity and traveling back then. TIMM as a library on GitHub, I think 2019 is roughly when it started for real. And that's been sort of, I guess, the basis point for further growing the the interest in the formal version of TIMM. It was a very loose collection of code and models scattered across multiple computers before that point. And so, yeah, I would say 2019 is definitely the the starting point for for TIMM proper. And since over the past couple of years, that's been I've been definitely focused on two percent of my kind of coding time on on on Tim. 


Pieter: And I'm curious if today somebody wanted to follow your path and get into A.I. on their own. You know, in an independent way, not necessarily joining a research organization. Would you still recommend starting with Kaggle? And more generally, what would you recommend in terms of how to get started? 


Ross: I would definitely still recommend Kaggle as a way to get started and also to get involved in the community and meet people and learn it's become a much bigger. Like, there's way more people. All competing there now, I don't know. Having not competed recently, I don't know how the sense of community is these days, but it's definitely worth looking at. There's other machine learning challenges out there, other platforms, I think crowd. I had some interesting, especially reinforcement learning based ones. Those looked pretty exciting with with GitHub and Twitter, especially. I think you just got to get out there, start following people interacting in different forums, different social media where other researchers are discussing ideas and just get involved. I still think there's definitely a lot of room to participate in, build something new and go the path that I did in terms of natural language. And also, I guess now image the illusory AI group, like that's an impressive, pretty open, open source collective of researchers that's doing some really exciting things with their own variants of reproduce Jeff beauty. And then in terms of open source organizations, hugging face has been impressive in terms of staying quite open, invisible. But you are doing it yourself. I still think that's definitely possible. You just have to you have to make people aware of what you're building and keep plugging away at it. And eventually, I think you'll get some followers if you're building something that that's useful. 


Pieter: Now, I imagine one of the biggest challenges when doing this is that so much is happening in AI, especially today. But even when you started, so much was happening. And wouldn't it be easy to feel like, you know, it's far out and it's hard to catch up, it's hard to become a meaningful contributor. And I'm curious, how did you do that when you started out? How did you just, you know, essentially sometimes not worry about it and just kept going and knew that you were making progress and we're going to get there? 


Ross: Well, the thing that goes back to sort of my the way that I measured my progress or set my goals was one bit at a time. Just get something useful out there to find a new result, build a new model, make some progress, satisfy. I'm a I'm a very harsh critic of myself and so satisfying myself is hard. And so that's always been like the the thing that drives me forward is like, I want to be proud of what I've built, and maybe I'm somewhat unique in that, but I can iterate like that and keep myself motivated by plugging away at hard problems until I achieve something. And that then drives me forward to to do more and build on that. 


Pieter: As you have TIMM, the, you know, torture image models, library on GitHub and more and more people are using it. I can't imagine that you're also starting to see not just researchers use it, but maybe startups, bigger companies using it, not just for research, but for new applications. So I'm curious, are you seeing some new computer efficient applications emerge that are exciting to you? 


Ross: I've definitely seen researchers like Tim references popping up in in different research papers, which is always exciting to see in terms of new and novel applications. I'm drawing a blank. I'm sure there are some I just don't have them in my my head at that at the time 


Pieter: because it seems like it could actually feed your angel investing in principle. You could see people who do interesting things with with your library. 


Ross: Yeah, that does. There, there were some random emails that I got from some startups that were like, Hey, we saw TIMM, or there was a I had a real implementation of opposing it. Google released a thing that tracks the human pose. And they made it available through the web browser version of TensorFlow. And so this company contacted me, and they're doing some yoga exercise tracking startup and they're like, Are you interested? I was like, That's a pretty cool idea, actually. But yeah, I'm not sure if that's like up my alley. But yeah, it's really neat to see that you're using something I built for an idea like that. 


Pieter: Now, when you zoom out and think about the future of AI, what do you think are some of the exciting things coming our way in the next few years, maybe even longer, five to 10 years? 


Ross: Well, recently, I guess the really large language models have been pretty exciting from there. The combination of that with vision. So the dolly and clip being able to feed a model, a description of something and then have it render an image that is sometimes like very striking and realistic or very interesting interpretation of the text like that is, I think, incredibly cool. So nearest term, I think in that in that vein, there's going to be some really neat art generative applications for for images and potentially when with little more compute, we'll see some cool generative video video clips from text descriptions going down the road. I mean, everyone's vision of what will be possible in five to 10 years is quite different. I'm not sure I have any vision myself that will be realized or heard. Just I feel there's going to be some big inflection points where we're going to make rapid leap at some point. Right now, it feels like we're making progress, but maybe we're a little bit stuck there. We need some new ideas to really push things forward to the point where we have models that generalize or do better than modeling probabilities and predictions. The word understand is often associated with these large models because it looks like they understand. But I mean, do they really is our understanding really just more data and more neurons? Or is there something more to it? It's really hard to say so maybe by just scaling up the current approaches, will we'll hit something exciting and then when we see it will realize that, oh yeah, that's it. Or we could be missing some key ingredients, and it's hard to say which that is, but there's definitely those two camps. I often see debates and back and forth on Twitter and whatnot, so I haven't set myself firmly in either camp. I'm more open to see what evolves, and in the meantime, I'm going to keep working away and trying to keep on top of everything. 


Pieter: Well, Ross, I think we covered everything that we had laid out here. Is there something that you had in mind that would be fun to cover, but we didn't hear you? 


Ross: I mean, are other open source organizations. I think I already mentioned Luther and hugging face briefly. I think it's maybe important to highlight how important contributing to like releasing code is. I mean, even Google and Facebook are remarkable in that right now. So much is being put out there that people like me can can work with and build on. And organizations like Hugging Face are also very much working in the open for for for most of their efforts. And also maybe something else like data erosion that has been hugely influential. It's amazing how much it's still used a decade on from the original creation and. All of Tim and PyTorch, him as models would not have been possible because without that dataset existing going forward, there's a lot of there's much larger data sets that are often in the hands of the private enterprises like the Googles and Facebooks of the world that people like myself don't have access to. So I am, I guess, concerned about going having where the next Amazon is, who's going to make it, will it be open and accessible for everyone? And I think that could be harmful if we don't see more developments that are done in the open and available to all like Amazon. That was I don't know if those were interesting talking points. 


Pieter: They are kind of wondering if maybe you know how to tear them up or if maybe Henry is able to just splice what you already said into the conversation as this? Henry, do you want to opine on that for a moment? Well, Ross, this was absolutely amazing. Thanks so much for being on the show.

Ross: Yeah, thanks so much for having me. It was it was a really fun time. 

bottom of page