Skip to content

Conversation

@jbrockmendel
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
  • If I used AI to develop this pull request, I prompted it to follow AGENTS.md.

Ideally we'd do the check for needs_nano_unit at a lower level and avoid effectively re-parsing the string, but the existing parse_timedelta_string function is a tough nut to crack.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jorisvandenbossche jorisvandenbossche added this to the 3.0 milestone Nov 26, 2025
@jorisvandenbossche jorisvandenbossche added Timedelta Timedelta data type Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels Nov 26, 2025
# TODO: more performant way of doing this check?
if ival % 1000 != 0:
return True
return re.search(r"\.\d{9}", item) or "ns" in item or "nano" in item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 9 correct here? Shouldn't 7 be enough to have something below microseconds?

(although it is not entirely clear what this is exactly used for, will first check the rest of the PR, but some additional docstring might be useful)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is this meant for something that ends on 0's to still infer as nanosecond?

i.e.

>>> pd.Timedelta("1.123456000 seconds").unit
'ns'

i.e. that could be microseconds (no loss of data), but because the string included the nanos, we infer as nanosecond unit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose 7 digits should then also still count as nanosconds?

Right now we get:

>>> pd.Timedelta("1.1234561 seconds").unit
'ns'
>>> pd.Timedelta("1.1234560 seconds").unit
'us'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. 7 digits should be enough. will update

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

def test_groupby_timedelta_median():
# issue 57926
expected = Series(data=Timedelta("1D"), index=["foo"])
expected = Series(data=Timedelta("1D"), index=["foo"], dtype="m8[ns]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why ns, but it is the other PR that will preserve the unit of the Timedelta object when converting to an array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the dtype here preserves the current dtype for expected. The other PR will change the dtype for df["timedelta"] below, so an update will be needed after one of them gets merged

"ufunc 'divide' cannot use operands",
"Invalid dtype object for __floordiv__",
r"unsupported operand type\(s\) for /: 'int' and 'str'",
r"unsupported operand type\(s\) for /: 'datetime.timedelta' and 'str'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, how is this caused by the changes here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was only on the dev builds and when box=True so we are dividing by np.array(["1"], dtype=object), so i suspect that when td.to_timedelt64() has a "us" unit it tries casting to a pytimedelta to operate

@mroeschke mroeschke mentioned this pull request Nov 26, 2025
@jbrockmendel
Copy link
Member Author

Updating this to also handle the array case (now that #63018 is merged) breaks a bunch of tests. I spent some time today updating those expecteds, will hopefully finish that process tomorrow.

# TODO: more performant way of doing this check?
if ival % 1000 != 0:
return True
return re.search(r"\.\d{7}", item) or "ns" in item or "nano" in item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return re.search(r"\.\d{7}", item) or "ns" in item or "nano" in item
return re.search(r"\.\d{7}", item) or "ns" in item or "nano" in item.lower()

(or add or "Nano" in item)

?

I was checking the Timedelta tests for the other PR, and seeing that we allow title case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'm still chasing down the failures that are caused when we add this to the array inference. there are some actual bugs that uncovered.

Comment on lines +620 to +624
isinstance(two_hours, Timedelta)
and two_hours.unit == "ns"
and box_with_array is not pd.array
):
expected = expected.as_unit("ns")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because with a TimedeltaArray, the inplace += is actually inplace preserving the dtype, while for the other objects it is just doing rng = rng + two_hours under the hood?

Is that an intentional difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about that to the test (it is not obvious as a reader that this is the reason for that check) ?

@jorisvandenbossche
Copy link
Member

The docstrings seem to have uncovered a bug:

>>> pd.to_timedelta(["1 days 06:05:01.00003"])
TimedeltaIndex(['1 days 06:05:01.000030'], dtype='timedelta64[us]', freq=None)
>>> pd.to_timedelta(["1 days 06:05:01.00003", "15.5us"])
TimedeltaIndex(['0 days 00:01:48.301000030', '0 days 00:00:00.000015500'], dtype='timedelta64[ns]', freq=None)

The parsed integer for the first element is not cast from micro to nano when the second element infers it needs nanos (or it should just reparse the full array?)

@jbrockmendel
Copy link
Member Author

(or it should just reparse the full array?)

Looks like it is reparsing the full array, but not respecting the "creso" arg when we do. Should now be fixed.

@jorisvandenbossche
Copy link
Member

Hmm, some conda issue. Restarted, hopefully will pas now

That last commit before your latest update also has one remaining mypy failures, not sure if that is fixed now:

   /home/runner/work/pandas/pandas/pandas/core/indexes/timedeltas.py:239:56 - error: Cannot access attribute "unit" for class "NaTType"
    Attribute "unit" is unknown (reportAttributeAccessIssue)

>>> pd.to_timedelta(np.arange(5), unit="D")
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'],
dtype='timedelta64[ns]', freq=None)
dtype='timedelta64[us]', freq=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dtype='timedelta64[us]', freq=None)
dtype='timedelta64[ns]', freq=None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants