Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add SemanticScholarToolkits to integrate Semantic Scholar to camel #1493

Merged
merged 22 commits into from
Feb 6, 2025
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d5ee024
Add SemanticScholarToolkits to integrate Semantic Scholar to camel
Jan 23, 2025
ed6571c
Import the class SemanticScholarToolkit to _init_.py of toolkits
Jan 23, 2025
96fe5b4
Try to properly format the signature and docstring
Jan 24, 2025
f760e39
Try to properly format the signature and docstring
Jan 24, 2025
55c67cf
Merge branch 'master' into SemanticScholarToolkit
harryeqs Jan 24, 2025
e5ad6f0
Re-formatted the toolkit file, add example and test file
Jan 25, 2025
b475269
Re-formatted the toolkit file
Jan 25, 2025
9ac017c
Re-foramtted the toolkits file
Jan 25, 2025
27afcef
Re-foramtted the toolkits file
Jan 25, 2025
0d52bf1
Re-foramtted the toolkits file
Jan 25, 2025
cfd7450
Re-foramtted the toolkits file
Jan 25, 2025
121b2e7
Re-foramtted the toolkits file
Jan 25, 2025
da00e4a
Re-foramtted the toolkits file
Jan 25, 2025
d8e7094
Re-foramtted the toolkits file
Jan 25, 2025
e7c65b0
An example of semanticscholar_toolkit
Jan 25, 2025
d81207d
Re-formatted semanticscholar_toolkit
Jan 25, 2025
5570dc9
feat: Integrate Semantic Scholar into Camel
Jan 25, 2025
c901858
Merge branch 'master' into SemanticScholarToolkit
harryeqs Jan 31, 2025
ab9f157
Merge branch 'master' into SemanticScholarToolkit
harryeqs Feb 3, 2025
f655e24
Improve error handling, adding support for request failures and inval…
renxinxing123 Feb 5, 2025
fb96134
The corresponding test file has been updated to correctly mock and va…
renxinxing123 Feb 5, 2025
55e4413
Merge branch 'master' into SemanticScholarToolkit
Wendong-Fan Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions camel/toolkits/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
from .stripe_toolkit import StripeToolkit
from .video_toolkit import VideoDownloaderToolkit
from .dappier_toolkit import DappierToolkit
from .semanticscholar_toolkit import SemanticScholarToolkit

__all__ = [
'BaseToolkit',
Expand Down Expand Up @@ -77,4 +78,5 @@
'MeshyToolkit',
'OpenBBToolkit',
'DappierToolkit',
'SemanticScholarToolkit',
]
257 changes: 257 additions & 0 deletions camel/toolkits/semanticscholar_toolkit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========

import json
from typing import List

import requests

from camel.toolkits import FunctionTool
from camel.toolkits.base import BaseToolkit


class SemanticScholarToolkit(BaseToolkit):
"""A toolkit for interacting with the Semantic Scholar
API to fetch paper and author data."""
Comment on lines +25 to +26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring format

Suggested change
"""A toolkit for interacting with the Semantic Scholar
API to fetch paper and author data."""
r"""A toolkit for interacting with the Semantic Scholar
API to fetch paper and author data.
"""


def __init__(self):
"""Initializes the SemanticScholarToolkit."""
self.base_url = "https://api.semanticscholar.org/graph/v1"

def fetch_paper_data_title(
self,
paperTitle: str,
fields: str = """title,abstract,authors,year,citationCount,
publicationTypes,publicationDate,openAccessPdf""",
Comment on lines +35 to +36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have better way for the fields input

) -> dict:
r"""Fetches a SINGLE paper from the Semantic Scholar
API based on a paper title.

Args:
paperTitle (str): The title of the paper to fetch.
fields (str): A comma-separated list of fields to include
in the response (default includes title, abstract, authors, year,
citation count, publicationTypes, publicationDate, openAccessPdf).

Returns:
dict: The response data from the API or error
information if the request fails.
"""
url = f"{self.base_url}/paper/search"
query_params = {"query": paperTitle, "fields": fields}
response = requests.get(url, params=query_params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better if we implement error handling here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AveryYay ! Thanks for your suggestion! I noticed that the code already includes error handling for the case where the response status code is not 200, and it returns an error message accordingly.

Could you clarify if you're suggesting a different type of error handling?

Copy link
Collaborator

@AveryYay AveryYay Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the status code works if requests.get() successfully returns. Adding try-except could prevents crashes if the request fails due to connectivity problems. There could also be some case where the response isn't a valid JSON.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation @AveryYay ! I've updated the SemanticScholarToolkit to improve error handling, adding support for request failures and invalid JSON responses. And the corresponding test file has been updated to correctly mock and validate error responses.

if response.status_code == 200:
return response.json()
else:
return {
"error": (
f"Request failed with status code {response.status_code}"
),
"message": response.text,
}

def fetch_paper_data_id(
self,
paperID: str,
fields: str = """title,abstract,authors,year,citationCount,
publicationTypes,publicationDate,openAccessPdf""",
) -> dict:
r"""Fetches a SINGLE paper from the Semantic Scholar
API based on a paper ID.

Args:
paperID (str): The ID of the paper to fetch.
fields (str): A comma-separated list of fields to
include in the response (default includes title, abstract,
authors, year, citation count, publicationTypes,
publicationDate, openAccessPdf).

Returns:
dict: The response data from the API or error information
if the request fails.
Comment on lines +85 to +86
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring format

Suggested change
dict: The response data from the API or error information
if the request fails.
dict: The response data from the API or error information
if the request fails.

"""
url = f"{self.base_url}/paper/{paperID}"
query_params = {"fields": fields}
response = requests.get(url, params=query_params)
if response.status_code == 200:
return response.json()
else:
return {
"error": (
f"Request failed with status code {response.status_code}"
),
"message": response.text,
}

def fetch_bulk_paper_data(
self,
query: str,
year: str = "2023-",
fields: str = """title,url,publicationTypes,
publicationDate,openAccessPdf""",
) -> dict:
r"""Fetches MULTIPLE papers at once from the Semantic Scholar
API based on a related topic.
Args:
query (str):
The text query to match against the paper's title
and abstract.
For example, you can use the following operators
and techniques to construct your query:
Example 1:
((cloud computing) | virtualization)
+security -privacy This will match papers
whose title or abstract contains "cloud"
and "computing", or contains the word
"virtualization". The papers must also
include the term "security" but exclude
papers that contain the word "privacy".
year (str): The year filter for papers (default is "2023-").
fields (str): The fields to include in the response
(e.g., 'title,url,publicationTypes,publicationDate,
openAccessPdf').
Returns:
dict: The response data from the API or
error information if the request fails.
"""
url = f"{self.base_url}/paper/search/bulk"
query_params = {"query": query, "fields": fields, "year": year}
response = requests.get(url, params=query_params)
if response.status_code == 200:
return response.json()
else:
return {
"error": (
f"Request failed with status code {response.status_code}"
),
"message": response.text,
}

def fetch_recommended_papers(
self,
positive_paper_ids: List[str],
negative_paper_ids: List[str],
fields: str = """title,url,citationCount,authors,
publicationTypes,publicationDate,openAccessPdf""",
limit: int = 500,
save_to_file: bool = False,
) -> dict:
r"""Fetches recommended papers from the Semantic Scholar
API based on the positive and negative paper IDs.

Args:
positive_paper_ids (list): A list of paper IDs (as strings)
that are positively correlated to the recommendation.

negative_paper_ids (list): A list of paper IDs (as strings)
that are negatively correlated to the recommendation.

fields (str): The fields to include in the response
(e.g., 'title,url,citationCount,authors,publicationTypes,
publicationDate,openAccessPdf').

limit (int): The maximum number of recommended papers to return.
Default is 500.

save_to_file (bool): If True, saves the response data to a file
(default is False).

Returns:
dict: A dictionary containing recommended papers sorted by
citation count.
"""
url = "https://api.semanticscholar.org/recommendations/v1/papers"
query_params = {"fields": fields, "limit": str(limit)}
data = {
"positivePaperIds": positive_paper_ids,
"negativePaperIds": negative_paper_ids,
}

try:
response = requests.post(url, params=query_params, json=data)
response.raise_for_status()

if response.status_code == 200:
papers = response.json()

# Optionally save the data to a file
if save_to_file:
with open('recommended_papers.json', 'w') as output:
json.dump(papers, output)
return papers

else:
return {
"error": "Request failed with status code "
f"{response.status_code}"
}
except requests.exceptions.RequestException as e:
return {"error": str(e)}

def fetch_author_data(
self,
ids: List[str],
fields: str = "name,url,paperCount,hIndex,papers",
save_to_file: bool = False,
) -> dict:
r"""Fetches author information from the Semantic Scholar
API based on author IDs.

Args:
ids (list): A list of author IDs (as strings) to fetch
data for.

fields (str): A comma-separated list of fields to include
in the response (default includes name, URL, paper count,
hIndex, and papers).

save_to_file (bool): If True, saves the response data to
a file (default is False).

Returns:
dict: The response data from the API or error information if
the request fails.
"""
url = f"{self.base_url}/author/batch"
query_params = {"fields": fields}
data = {"ids": ids}
try:
response = requests.post(url, params=query_params, json=data)
response.raise_for_status()
response_data = response.json()

# Optionally save the data to a file
if save_to_file:
with open('author_information.json', 'w') as output:
json.dump(response_data, output)

return response_data
except requests.exceptions.RequestException as e:
return {"error": str(e)}

def get_tools(self) -> List[FunctionTool]:
r"""Returns a list of FunctionTool objects representing the
functions in the toolkit.

Returns:
List[FunctionTool]: A list of FunctionTool objects
representing the functions in the toolkit.
"""
return [
FunctionTool(self.fetch_paper_data_title),
FunctionTool(self.fetch_paper_data_id),
FunctionTool(self.fetch_bulk_paper_data),
FunctionTool(self.fetch_recommended_papers),
FunctionTool(self.fetch_author_data),
]
Loading
Loading