Check out documentation for git-import:
./clickhouse git-import --help
Then the tool can be run directly inside the git repository.
It will collect data like commits, file changes and changes of every
line in every file for further analysis.
It works well even on largest repositories like Linux or Chromium.
Example of a trivial query:
SELECT author AS k, count() AS c FROM line_changes WHERE
file_extension IN ('h', 'cpp') GROUP BY k ORDER BY c DESC LIMIT 20
Example of some non-trivial query - a matrix of authors, how much code
of one author is removed by another:
SELECT k, written_code.c, removed_code.c,
round(removed_code.c * 100 / written_code.c) AS remove_ratio
FROM (
SELECT author AS k, count() AS c
FROM line_changes
WHERE sign = 1 AND file_extension IN ('h', 'cpp')
AND line_type NOT IN ('Punct', 'Empty')
GROUP BY k
) AS written_code
INNER JOIN (
SELECT prev_author AS k, count() AS c
FROM line_changes
WHERE sign = -1 AND file_extension IN ('h', 'cpp')
AND line_type NOT IN ('Punct', 'Empty')
AND author != prev_author
GROUP BY k
) AS removed_code USING (k)
WHERE written_code.c > 1000
ORDER BY c DESC LIMIT 500
Changing the content from an html page to a shell script based on user-agent is a pretty bad abuse of HTTP. Why not at least require `-H 'Accept: text/x-shellscript'`? Or be more basic and give the script its own URL
These are totally legit concerns, while the behaviour of the site has been around for quite sometimes and many ClickHouse installation script may have them so we will keep it for backward compatibility, we will add the usual install.sh url later and start sharing them more often.
(Pull request is in ... it should be deployed on Monday and you can use https://clickhouse.com/install.sh ). Love the feedbacks, please keep them coming!
Download clickhouse: curl https://clickhouse.com/ | sh
Check out documentation for git-import: ./clickhouse git-import --help
Then the tool can be run directly inside the git repository. It will collect data like commits, file changes and changes of every line in every file for further analysis. It works well even on largest repositories like Linux or Chromium.
Example of a trivial query:
SELECT author AS k, count() AS c FROM line_changes WHERE file_extension IN ('h', 'cpp') GROUP BY k ORDER BY c DESC LIMIT 20
Example of some non-trivial query - a matrix of authors, how much code of one author is removed by another:
SELECT k, written_code.c, removed_code.c, round(removed_code.c * 100 / written_code.c) AS remove_ratio FROM ( SELECT author AS k, count() AS c FROM line_changes WHERE sign = 1 AND file_extension IN ('h', 'cpp') AND line_type NOT IN ('Punct', 'Empty') GROUP BY k ) AS written_code INNER JOIN ( SELECT prev_author AS k, count() AS c FROM line_changes WHERE sign = -1 AND file_extension IN ('h', 'cpp') AND line_type NOT IN ('Punct', 'Empty') AND author != prev_author GROUP BY k ) AS removed_code USING (k) WHERE written_code.c > 1000 ORDER BY c DESC LIMIT 500