0 Comments Posted in:

On a project I am working on there is a growing number of files that are in Source Control but are not actually referenced by any .csproj files. I decided to write a quick and dirty command line program to find these files, and at the same time learn a bit of LINQ to XML.

During the course of my development, I ran into a couple of tricky issues. First was how to combine some foreach loops into a LINQ statement, and second was to construct the regex for source file matching. Both I guess I could have solved myself with a bit of time reading books, but I decided to throw them out onto Stack Overflow. Both were answered within a couple of minutes of asking. I have to say this site is incredible, and rather than treating it as a last resort for questions I have reached the end of my resources on, I am now thinking of it more like a super-knowledgeable co-worker who you can just ask a quick question and get a pointer in the right direction.

Here's the final code. I'm sure it could easily be turned into one nested LINQ query and improved on a little, but it does what I need. Feel free to suggest refactorings and enhancements in the comments.

using System.Text;
using System.IO;
using System.Xml.Linq;
using System.Text.RegularExpressions;

namespace SolutionChecker
{
    public class Program
    {
        public const string SourceFilePattern = @"(?<!\.g)\.cs$";

        static void Main(string[] args)
        {
            string path = (args.Length > 0) ? args[0] : GetWorkingFolder();
            Regex regex = new Regex(SourceFilePattern);
            var allSourceFiles = from file in Directory.GetFiles(path, "*.cs", SearchOption.AllDirectories)
                                 where regex.IsMatch(file)
                                 select file;
            var projects = Directory.GetFiles(path, "*.csproj", SearchOption.AllDirectories);
            var activeSourceFiles = FindCSharpFiles(projects);
            var orphans = from sourceFile in allSourceFiles                          
                          where !activeSourceFiles.Contains(sourceFile)
                          select sourceFile;
            int count = 0;
            foreach (var orphan in orphans)
            {
                Console.WriteLine(orphan);
                count++;
            }
            Console.WriteLine("Found {0} orphans",count);
        }

        static string GetWorkingFolder()
        {
            return Path.GetDirectoryName(typeof(Program).Assembly.CodeBase.Replace("file:///", String.Empty));
        }

        static IEnumerable<string> FindCSharpFiles(IEnumerable<string> projectPaths)
        {
            string xmlNamespace = "{http://schemas.microsoft.com/developer/msbuild/2003}";
            
            return from projectPath in projectPaths
                   let xml = XDocument.Load(projectPath)
                   let dir = Path.GetDirectoryName(projectPath)
                   from c in xml.Descendants(xmlNamespace + "Compile")
                   let inc = c.Attribute("Include").Value
                   where inc.EndsWith(".cs")
                   select Path.Combine(dir, c.Attribute("Include").Value);
        }
    }
}
Want to learn more about LINQ? Be sure to check out my Pluralsight course More Effective LINQ. I'm also speaking on LINQ at Techorama Netherlands 2018 on 3rd October, so I'd love to see you there if you can make it.
Vote on HN

Comments

Comment by Joel

Using a Hashset as the result of FindCSharpFiles() provides a significant speed improvement. Our 1000+ file project is analyzed much much faster.

Comment by Mark H

thanks for the tip Joel. I haven't really got into the hashset class as our project is on .NET 2.0 still. Looks like its a useful class.