Publications

2018

                                          

Cross-Project Code Clones in GitHub Empirical Software Engineering, 2018

Within-Ecosystem Issue Linking: A Large-Scale Study of Rails” Proceedings of the 7th ACM/IEEE International Workshop on Software Mining

On the Naturalness of Proofs Proceedings ESEC/FSE (NIER Track) 2018 

Whom Are You Going to Call?: Determinants of @-Mentions in GitHub Discussions arXiv:1806.08457, 2018

One Size Does Not Fit All: An Empirical Study of Containerized Continuous Deployment Workflows, FSE 2018

A Survey of Machine Learning for Big Code and Naturalness ACM Computing Surveys, 51(4) 2018 pdf

Mining Semantic Loop Idioms IEEE Transactions on Software Engineering 2018 44(7) pdf

Modern Food Foraging Patterns: Geography and Cuisine Choices of Restaurant Patrons on Yelp IEEE Transactions on Computational Social Systems, vol. 5, no. 2, pp. 508-517, June 2018. doi: 10.1109/TCSS.2018.281965

A Clustering-based Approach for Mining Dockerfile Evolutionary Trajectories SCIS

Determinants of quality, latency, and amount of Stack Overflow answers about recent Android APIs PLOS One, March 2018, https://doi.org/10.1371/journal.pone.0194139

2017

                                          

Are Deep Neural Networks the Best Choice for Modeling Source Code?  ESEC/FSE 2017 pdf

Recovering Clear, Natural Identifiers from Obfuscated JavaScript Names.  ESEC/FSE 2017 pdf

Some From Here, Some From There: Cross-Project Code Reuse in GitHub MSR 2017 pdf ACM Distinguished Paper Award

A Large Scale Study of Programming Languages and Code Quality in Github CACM, 60 (10), p.91-100, 2017
CACM also ran a Technical Perspective article about our article. It is available here.

How Do Software Engineering Practices Change Following Adoption of Continuous Integration?  32nd IEEE/ACM International Conference on Automated Software Engineering, 2017

Perceived Language Complexity in GitHub Issue Discussions and Their Effect on Issue Resolution 32nd IEEE/ACM International Conference on Automated Software Engineering, 2017

Social synchrony on complex networks IEEE Transactions on Cybernetics, PP(99), p.1-12, 2017

2016

                                          

Tracing distributed collaborative development in apache software foundation projects Empirical Software Engineering, 2016, doi:10.1007/s10664-016-9463-3

Initial and Eventual Software Quality Relating to Continuous Integration in GitHub arXiv:1606.00521, 2016

Converging Work-Talk Patterns in Online Task-Oriented Communities PLOS One 11(5): e0154324. doi:10.1371/journal.pone.0154324, 2016

Stochastic Actor-Oriented Modeling for Studying Homophily and Social Infuence in OSS Projects, (accepted), ESE 2016  pdf

Belief and Evidence in Empirical Software Engineering, (accepted), ICSE 2016 pdf

On the “Naturalness” of buggy code (accepted), ICSE 2016 pdf

The Sky is Not the Limit: Multitasking on GitHub Projects (accepted), ICSE 2016

2015

                                          

On the “Naturalness” of buggy code
Unpublished, on ArXiv

Developer Migration in the GitHub Ecosystem ESEC/FSE 2015

Quality and Productivity Outcomes Relating to Continuous Integration in GitHub  ESEC/FSE 2015

CACHECA: A Cache Language Model Based Code Suggestion Tool. ICSE 2015 Demonstration Track pdf

Gender and Tenure Diversity in Github Teams. CHI 2015  pdf

Assert Use in GitHub Projects. ICSE 2015  pdf

Wait For It: Determinants of Pull Request Evaluation Latency on GitHub MSR 2015 pdf

Will they like this? Evaluating Code Contributions With Language Models  MSR 2015 pdf

New Initiative: Naturalness of Software. ICSE 2015 NIER Track pdf  Winner, Best Paper Award

A Large Scale Study of Programming Languages and Code Quality in Github. The Version currently in ACM DL has been updated, see this pdf version for errata/details. We have requested an update to the ACM DL version.

2014

                                          

On the Localness of Software FSE 2014  pdf

The Plastic Surgery Hypothesis FSE 2014 pdf

Panning Requirement Nuggets in Stream of Software Maintenance Tickets FSE 2014

Focus-Shifting Patterns of OSS Developers and Their Congruence with Call Graphs FSE 2014 pdf

A Large Scale Study of Programming Languages and Code Quality in Github” FSE 2014 pdf

Comparing Static bug finders and Statistical defect prediction ICSE 2014  pdf  DATA

How Social Q&A Sites are Changing Knowledge Sharing in Open Source Software Communities CSCW 2014 

2013

                                          

Using and Asking: APIs Used in the Android Market and Asked About in StackOverflow SocInfo2013 pdf

Sample Size vs. Bias in Defect PredictionESEC/FSE 2013  pdf

Asking for (and about) Permissions Used by Android Apps  MSR 2013

 Dual Ecological Measures of Focus in Software Development. ICSE 2013  pdf  Winner, ACM SIGSOFT Distinguished Paper Award

How, and Why Process Metrics are BetterICSE 2013  pdf 

2012

                                          

To what extent could we detect field defects? ASE 2012  pdf 

When Would This Bug Get Reported? ICSM 2012  pdf 

MIC Check: A Correlation Tactic for ESE DataMSR 2012  pdf

On the “Naturalness” of software, Appeared in ICSE 2012  pdf (Expanded Version!)

Recalling the Imprecision of Cross-Project Defect Prediction> Appeared in FSE 2012 pdf

Cohesive and Isolated Development with Branches, Appeared in  FASE 2012

Clones: what is that smell?Accepted to Springer-Verlag International Journal on Empirical Software Engineering. pdf 

2011

                                          

Got Issues? Do New Features and Code Improvements Affect Defects? WCRE 11  pdf

 Ecological Inference in Empirical Software Engineering. ASE 2011  pdf  Winner, ACM SIGSOFT Distinguished Paper and ASE 2011 Best Paper Awards.

BugCache for Inspections : Hit or Miss? SIGSOFT FSE 2011  pdf

Don’t Touch My Code! Examining the Effects of Ownership on Software Quality. SIGSOFT FSE 2011   pdf

A Simpler model of software readability. MSR 2011  pdf

Operating System Compatibility Analysis of Eclipse and Netbeans Based on Bug Data. MSR 2011 Mining Challenge

Ownership, Experience and Defects: a fine-grained study of Authorship. ICSE 2011  pdf

An Empirical Study on the Influence of Pattern Roles on Change-Proneness accepted to Empirical Software Engineering Journal Springer-Verlag, 2011. pdf

2010

                                          

The missing links: bugs and bug-fix commits. SIGSOFT FSE 2010  pdf

Validity of Network Analyses in Open Source Projects. MSR 2010  pdf

Clones: What is that Smell? MSR 2010  pdf  Winner, Best Paper Award, MSR 2010

Thex: Mining Metapatterns in Java. MSR 2010  pdf

2009

                                          

Putting it All Together: Using Socio-Technical Networks to Predict Failures ISSRE 2009.

Fair and Balanced? Bias in bug-fix Datasets” SIGSOFT FSE 2009  pdf

Promises and Perils of Mining Git MSR 2009 pdf

Modeling and verifying a broad array of network properties Europhysics Letters (EPL), 2009  pdf

Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista,  ICSE 2009  pdf   Winner, ACM SIGSOFT Distinguished paper award