Since the news articles themselfes are copyrighted to the original publisher, access to this corpus can't be made public. If you want access, drop a short Email to griesshaber@hdm-stuttgart.de
.
Download the newest release here
Label | Documents (EN) | Documents (DE) |
---|---|---|
Ausland | 18460 | 28033 |
Uncategorized | 330991 | 577217 |
Ignore | 511 | 3684 |
Politik | 10133 | 72905 |
Wirtschaft | 3355 | 56768 |
Technologie | 3675 | 30890 |
Aktuell | 2566 | 2383 |
Finanzen | 192 | 13340 |
Sport | 4076 | 24932 |
Lokal | 3039 | 19251 |
Lifestyle | 15364 | 24655 |
Kultur | 1957 | 12079 |
Sonstiges | 27030 | 56224 |
Total | 421349 | 922361 |
String
String
String
String
String
Date
String
[String]
String
String
String
String
String
String
[String]
String
[[URL]]
[[URL]]
String
@misc{Griesshaber2017,
author = {Grie{\ss}haber, Daniel},
title = {Multilingual News Corpus},
year = {2017},
publisher = {GitLab HdM Stuttgart},
journal = {Git repository},
howpublished = {\url{https://gitlab.mi.hdm-stuttgart.de/griesshaber/german-news-corpus}}
}
Daniel Grießhaber (Twitter/GitHub).