Persian Dependency Treebank

Persian Dependency Treebank (PerDT)


The Persian Dependency Treebank is a collection of approximately 30,000 Persian sentences with syntactic and morphological annotations, useful for natural language processing and computational linguistics.





- Mohammad Sadegh Rasooli, Manouchehr Kouhestani, and Amirsaeid Moloodi. (2013). Development of a Persian Syntactic Dependency Treebank. In The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlanta, USA.


- Dadegan Research Group. (2012). Persian Dependency Treebank, Version 1.0, Annotation Manual and User Guide. Tehran, I.R. Iran: Supreme Council of Information and Communication Technology.


Related Tools:


Dadegan Search - An online tool for exploring the Persian Dependency Treebank and the Valencey Lexicon for Persian Verbs


MST parser implementation in C#


Persian Dependency Treebank Normalization Script - A simple python program which changes verbs like «گفته می‌شود» to «گفته_می‌شود» in the Persian Dependency Treebank. (This changes the Treebank data to the Standard CONLL Format in which white spaces are not allowed.)



(Contact us for more information about the treebank and for inquiries about receiving the data.)