Doctran
Doctran is a python package. It uses LLMs and open-source NLP libraries to transform raw text into clean, structured, information-dense documents that are optimized for vector space retrieval. You can think of
Doctran
as a black box where messy strings go in and nice, clean, labelled strings come out.
Installation and Setupโ
pip install doctran
Document Transformersโ
Document Interrogatorโ
See a usage example for DoctranQATransformer.
from langchain_community.document_loaders import DoctranQATransformer
Property Extractorโ
See a usage example for DoctranPropertyExtractor.
from langchain_community.document_loaders import DoctranPropertyExtractor
Document Translatorโ
See a usage example for DoctranTextTranslator.
from langchain_community.document_loaders import DoctranTextTranslator