SQL Database
Aboutβ
The SQLDatabaseLoader
loads records from any database supported by
SQLAlchemy, see SQLAlchemy dialects for the whole list of supported
SQL databases and dialects.
For talking to the database, the document loader uses the SQLDatabase utility from the LangChain integration toolkit.
You can either use plain SQL for querying, or use an SQLAlchemy Select
statement object, if you are using SQLAlchemy-Core or -ORM.
You can select which columns to place into the document, which columns
to place into its metadata, which columns to use as a source
attribute
in metadata, and whether to include the result row number and/or the SQL
query expression into the metadata.
What's insideβ
This notebook covers how to load documents from an SQLite database, using the SQLAlchemy document loader.
It loads the result of a database query with one document per row.
Prerequisitesβ
#!pip install langchain langchain-community sqlalchemy termsql
Populate SQLite database with example input data.
!termsql --infile=./example_data/mlb_teams_2012.csv --head --csv --outfile=example.sqlite --table=payroll
Nationals|81.34|98
Reds|82.2|97
Yankees|197.96|95
Giants|117.62|94
Braves|83.31|94
Athletics|55.37|94
Rangers|120.51|93
Orioles|81.43|93
Rays|64.17|90
Angels|154.49|89
Tigers|132.3|88
Cardinals|110.3|88
Dodgers|95.14|86
White Sox|96.92|85
Brewers|97.65|83
Phillies|174.54|81
Diamondbacks|74.28|81
Pirates|63.43|79
Padres|55.24|76
Mariners|81.97|75
Mets|93.35|74
Blue Jays|75.48|73
Royals|60.91|72
Marlins|118.07|69
Red Sox|173.18|69
Indians|78.43|68
Twins|94.08|66
Rockies|78.06|64
Cubs|88.19|61
Astros|60.65|55
||
Basic usageβ
from pprint import pprint
from langchain_community.document_loaders import SQLDatabaseLoader
loader = SQLDatabaseLoader(
"SELECT * FROM payroll LIMIT 2",
url="sqlite:///example.sqlite",
)
documents = loader.load()
pprint(documents)
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98'),
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97')]
Specify which columns are content vs. metadataβ
Use the page_content_mapper
keyword argument to optionally customize how to derive
a page content string from an input database record / row. By default, all columns
will be used.
Use the metadata_mapper
keyword argument to optionally customize how to derive
a document metadata dictionary from an input database record / row. By default,
document metadata will be empty.
import functools
# Configure built-in page content mapper to include only specified columns.
row_to_content = functools.partial(
SQLDatabaseLoader.page_content_default_mapper, column_names=["Team", "Wins"]
)
# Configure built-in metadata dictionary mapper to include specified columns.
row_to_metadata = functools.partial(
SQLDatabaseLoader.metadata_default_mapper, column_names=["Payroll (millions)"]
)
loader = SQLDatabaseLoader(
"SELECT * FROM payroll LIMIT 2",
url="sqlite:///example.sqlite",
page_content_mapper=row_to_content,
metadata_mapper=row_to_metadata,
)
documents = loader.load()
pprint(documents)
[Document(page_content='Team: Nationals\nWins: 98', metadata={'Payroll (millions)': 81.34}),
Document(page_content='Team: Reds\nWins: 97', metadata={'Payroll (millions)': 82.2})]
Those examples demonstrate how to use custom functions to define arbitrary mapping rules by using Python code.
def page_content_mapper(row: sa.RowMapping, column_names: Optional[List[str]] = None) -> str:
return f"Team: {row['Team']}"
def metadata_default_mapper(row: sa.RowMapping, column_names: Optional[List[str]] = None) -> Dict[str, Any]:
return {"team": row['Team']}
Specify column(s) to identify the document sourceβ
Use the source_columns
option to specify the columns to use as a "source" for the
document created from each row. This is useful for identifying documents through
their metadata. Typically, you may use the primary key column(s) for that purpose.
loader = SQLDatabaseLoader(
"SELECT * FROM payroll LIMIT 2",
url="sqlite:///example.sqlite",
source_columns=["Team"],
)
documents = loader.load()
pprint(documents)
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'source': 'Nationals'}),
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'source': 'Reds'})]
Enrich metadata with row number and/or original SQL queryβ
Use the include_rownum_into_metadata
and include_query_into_metadata
options to
optionally populate the metadata
dictionary with corresponding information.
Having the query
within metadata is useful when using documents loaded from
database tables for chains that answer questions using their origin queries.
loader = SQLDatabaseLoader(
"SELECT * FROM payroll LIMIT 2",
url="sqlite:///example.sqlite",
include_rownum_into_metadata=True,
include_query_into_metadata=True,
)
documents = loader.load()
pprint(documents)
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'row': 0, 'query': 'SELECT * FROM payroll LIMIT 2'}),
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'row': 1, 'query': 'SELECT * FROM payroll LIMIT 2'})]