SQL Database

About

The SQLDatabaseLoader loads records from any database supported by SQLAlchemy, see SQLAlchemy dialects for the whole list of supported SQL databases and dialects.

For talking to the database, the document loader uses the SQLDatabase utility from the LangChain integration toolkit.

You can either use plain SQL for querying, or use an SQLAlchemy Select statement object, if you are using SQLAlchemy-Core or -ORM.

You can select which columns to place into the document, which columns to place into its metadata, which columns to use as a source attribute in metadata, and whether to include the result row number and/or the SQL query expression into the metadata.

What's inside

This notebook covers how to load documents from an SQLite database, using the SQLAlchemy document loader.

It loads the result of a database query with one document per row.

Prerequisites

#!pip install langchain langchain-community sqlalchemy termsql

Populate SQLite database with example input data.

!termsql --infile=./example_data/mlb_teams_2012.csv --head --csv --outfile=example.sqlite --table=payroll

Nationals|81.34|98
Reds|82.2|97
Yankees|197.96|95
Giants|117.62|94
Braves|83.31|94
Athletics|55.37|94
Rangers|120.51|93
Orioles|81.43|93
Rays|64.17|90
Angels|154.49|89
Tigers|132.3|88
Cardinals|110.3|88
Dodgers|95.14|86
White Sox|96.92|85
Brewers|97.65|83
Phillies|174.54|81
Diamondbacks|74.28|81
Pirates|63.43|79
Padres|55.24|76
Mariners|81.97|75
Mets|93.35|74
Blue Jays|75.48|73
Royals|60.91|72
Marlins|118.07|69
Red Sox|173.18|69
Indians|78.43|68
Twins|94.08|66
Rockies|78.06|64
Cubs|88.19|61
Astros|60.65|55
||

Basic usage

from pprint import pprint

from langchain_community.document_loaders import SQLDatabaseLoader

loader = SQLDatabaseLoader(
    "SELECT * FROM payroll LIMIT 2",
    url="sqlite:///example.sqlite",
)
documents = loader.load()

API Reference:SQLDatabaseLoader

pprint(documents)

[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98'),
 Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97')]

Specify which columns are content vs. metadata

Use the page_content_mapper keyword argument to optionally customize how to derive a page content string from an input database record / row. By default, all columns will be used.

Use the metadata_mapper keyword argument to optionally customize how to derive a document metadata dictionary from an input database record / row. By default, document metadata will be empty.

import functools

# Configure built-in page content mapper to include only specified columns.
row_to_content = functools.partial(
    SQLDatabaseLoader.page_content_default_mapper, column_names=["Team", "Wins"]
)

# Configure built-in metadata dictionary mapper to include specified columns.
row_to_metadata = functools.partial(
    SQLDatabaseLoader.metadata_default_mapper, column_names=["Payroll (millions)"]
)

loader = SQLDatabaseLoader(
    "SELECT * FROM payroll LIMIT 2",
    url="sqlite:///example.sqlite",
    page_content_mapper=row_to_content,
    metadata_mapper=row_to_metadata,
)
documents = loader.load()

pprint(documents)

[Document(page_content='Team: Nationals\nWins: 98', metadata={'Payroll (millions)': 81.34}),
 Document(page_content='Team: Reds\nWins: 97', metadata={'Payroll (millions)': 82.2})]

Those examples demonstrate how to use custom functions to define arbitrary mapping rules by using Python code.

def page_content_mapper(row: sa.RowMapping, column_names: Optional[List[str]] = None) -> str:
    return f"Team: {row['Team']}"

def metadata_default_mapper(row: sa.RowMapping, column_names: Optional[List[str]] = None) -> Dict[str, Any]:
    return {"team": row['Team']}

Specify column(s) to identify the document source

Use the source_columns option to specify the columns to use as a "source" for the document created from each row. This is useful for identifying documents through their metadata. Typically, you may use the primary key column(s) for that purpose.

loader = SQLDatabaseLoader(
    "SELECT * FROM payroll LIMIT 2",
    url="sqlite:///example.sqlite",
    source_columns=["Team"],
)
documents = loader.load()

pprint(documents)

[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'source': 'Nationals'}),
 Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'source': 'Reds'})]

Enrich metadata with row number and/or original SQL query

Use the include_rownum_into_metadata and include_query_into_metadata options to optionally populate the metadata dictionary with corresponding information.

Having the query within metadata is useful when using documents loaded from database tables for chains that answer questions using their origin queries.

loader = SQLDatabaseLoader(
    "SELECT * FROM payroll LIMIT 2",
    url="sqlite:///example.sqlite",
    include_rownum_into_metadata=True,
    include_query_into_metadata=True,
)
documents = loader.load()

pprint(documents)

[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'row': 0, 'query': 'SELECT * FROM payroll LIMIT 2'}),
 Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'row': 1, 'query': 'SELECT * FROM payroll LIMIT 2'})]

About​

What's inside​

Prerequisites​

Basic usage​

Specify which columns are content vs. metadata​

Specify column(s) to identify the document source​

Enrich metadata with row number and/or original SQL query​

Was this page helpful?