Wednesday, June 10, 2015

It has been some time... Or what I have been up to...

Since my last post, I think I have given up on posting blog entries for a while. It was a pretty busy time frame though: Moving to Seattle for Microsoft, finishing up a couple of final papers and writing for 2 book projects. Plus I think I did more paper reviews than I had done in my entire grad school time, pretty time consuming stuff those reviews... And of course these are somewhat side projects, for the past 2 years I have been working as a software engineer under Bing Ads providing metrics and data insight for core Bing Ads teams. It has been an awesome experience on its own right so far, but it deserves another post to share some ideas about academia and industry.

Outside of work, most of my time was dedicated to book projects that I was lucky enough to be involved in. The first book that I had the chance to co-author, with amazing collaborators Tim Menzies, Leandro Minku, Fayola Peters and Burak Turhan, is called "Sharing Data and Models in Software Engineering" (here is the Amazon link: http://bit.ly/shrngDtMdls). I would like to think that different parts of this book speak to different levels of comfort with various data mining and machine learning techniques. The book is mainly divided into 4 parts. The first two parts start with high level discussions of how to approach a data science project, what factors to consider in terms of user engagement and data collection, then it continues with a brief tutorial on common concepts such as instance and feature selection, algorithms for prediction and clustering and so on. The other two parts deal with more advanced material such as transfer learning, active learning and data privacy. The whole book is a nice overhaul of the practical research we have been doing in the past couple of years. Particularly for data scientists and researchers with a focus on software engineering data, the book can provide a nice revisit of fundamental concepts as well as some more advanced ideas to try out on their data sets.

Machine Learning and Data Science Conference
at Microsoft, explaining the steps, ideas and pitfalls
 that led to our chapter  in
"The Art and Science of Analyzing Software Data"
Another project that I was lucky to take part is somewhat a sequel to the first book, which is called "The Art and Science of Analyzing Software Data: Analysis Patterns" (the Amazon link: http://bit.ly/analysisPatterns). This book is the result of a joint effort from various prominent research groups in the junction of data science and software engineering and it was edited by Christian Bird, Tim Menzies and Thomas Zimmermann. Besides awesome folks involved in this project, what makes this book really cool is that it is a distillation of various research groups' approaches to data science. In other words, it was a reflection time for the researchers to step back and ask: "Hey, what were we doing all along in all these projects? What are the patterns that we instinctively follow, can we distill this information into a formal structure and share?" My contribution to this book is being a co-author of a chapter with the amazing co-authors: Ayse Basar Bener, Ayse Tosun Misirli, Bora Caglayan and Gul Calikli. We have worked together in multiple data science projects in a number of different companies and successfully delivered tangible business outcomes for these companies. In our chapter entitled "Lessons Learned from Software Analytics in Practice", we share the patterns that worked for us in multiple data science projects. Of course there were a ton of pitfalls that we learned along the way, which are probably as important as the patterns to be aware of. In this chapter we also share these pitfalls and our recommendations as to how we have avoided them in the past.

In case you are embarking on a career in data science or you want to see how other research groups in software engineering have approached data science projects, you may also want to take a look at these books.



10 comments:

  1. Excellent Sharing. You have done great job. I gathered lots of new information. . Devops Online Training | Data Science Online Training

    ReplyDelete
  2. Hi, I am really happy to found such a helpful and fascinating post that is written in well manner. Thanks for sharing such an informative post.R Programming Online Training | Hadoop Online Training

    ReplyDelete
  3. Hi, I am really like this blog post. We write this blog very good manner.Thank you so much such a fantastic post.
    Selenium Training in Chennai
    Selenium Course in Chennai
    Selenium Training in Velachery

    ReplyDelete
  4. hai i am isabella,this post very helpful for me.Thank you so much for sharing...... Dot Net Training in Anna nagar

    Dot Net Training in Chennai

    ReplyDelete
  5. hi welcome to this blog. really you have post an informative blog. it will be really helpful to many peoples. thank you for sharing this blog.
    selenium training in chennai

    ReplyDelete
  6. hi welcome to this blog. really you have posted an informative blog. it will be really helpful to many peoples.
    android training in chennai

    ReplyDelete
  7. hi welcome to this blog. really you have post an informative blog. it will be really helpful to many peoples. thank you for sharing this blog.
    java training in chennai

    ReplyDelete
  8. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas. Data Science Training in Chennai

    ReplyDelete